Performance comparison of many implementations of Bron–Kerbosch algorithms.
This project is maintained by ssomers
Performance comparison of many implementations to solve one particular computational problem, to compare the effects of algorithm complexity, programming languages, library choices, and parallelism.
The algorithms implemented are three variants of the Bron-Kerbosch algorithm to find all maximal cliques in a graph. Some algorithm variants (IK_*) are described in the 2008 paper by F. Cazals & C. Karande, “A note on the problem of reporting maximal cliques”, Theoretical Computer Science, 407 (1): 564–568, doi:10.1016/j.tcs.2008.05.010.
It originated as a fork of cornchz/Bron-Kerbosch. Compared to the original project, the code is:
All charts below show the amount of time spent on the same particular Windows machine with a 6 core CPU, all on the same predetermined random graph, with error bars showing the minimum and maximum over 5 or 3 samples. Order of a graph = number of vertices, size of a graph = number of edges.
A random graph is easy to generate and objective, but not ideal to test the performance of the algorithm itself, because when you’re doing something useful looking for maximal cliques, the actual data likely comes in cliques, some of which are near-maximal and cause the heartaches described in the paper.
Let’s first get one thing out of the way: what does some local optimization yield in the simplest, naive Bron-Kerbosch algorithm, in Python and Rust. Is this premature optimization or low hanging fruit?
Clique from the call stack, instead of passing it around on the heap.
Basically showing off Rust’s ability to guarantee, at compile time, this can be done safely.We gain as much as through switching to the best performing programming language
Therefore, all the other implementations will contain similar tweaks.
As announced in the previous paragraph, we mostly implement locally optimized ½ versions of these. In particular, we write out the first iteration separately, because in that first iteration the set of candidate vertices starts off being huge, with every connected vertex in the graph, but that set doesn’t have to be represented at all because every reachable vertex is a candidate until excluded.
These are all single-threaded implementations (using only one CPU core).
Let’s implement Ver3-GP exploiting parallellism (using all CPU cores). How does Ver3 operate?
flowchart TD
a[degeneracy order]
b[first iteration]
c[nested iteration]
d[nested iteration]
e[nested iteration]
f[nested iteration]
g[nested iteration]
h[nested iteration]
a --> b
b --> c
b --> d
c --> e
c --> f
c --> g
c --> h
d --> e
d --> f
d --> g
d --> h
We already specialized the first iteration in Ver2, and Ver3 changes the order in the first iteration to the graph’s degeneracy order. So we definitely write the first iteration separately. Thus an obvious way to parallelize is to run 2 + N tasks in parallel:
Ways to implement parallelism varies per language:
All algorithms work heavily with sets. Some languages allow picking at compile time among various generic set implementations.
std::collections::BTreeSetstd::collections::HashSet, a wrapper around a version of hashbrown, in particular 0.11.0 in Rust 1.58.0HashSet from crate hashbrown 0.12FnvHashSet from crate fnv 1.0.7std::collections::Vec (obviously, this can only work well on small graphs)
In very sparse graphs, only BTreeSet allows Ver1 to scale up.
std::setstd::unordered_setstd::vector (obviously, this can only work well on small graphs)
To obtain these results:
Perform:
cd python3
(once) python -m venv venv
venv\Scripts\activate.bat
(once or twice) pip install --upgrade mypy ruff pytest hypothesis matplotlib
ruff check . --exclude "venv*"
mypy .
pytest
python -O test_maximal_cliques.py
To obtain these results:
Perform:
cd rust
(once) cargo install cargo-edit
(sometimes) rustup update
(sometimes) cargo upgrade && cargo update
cargo clippy --workspace
cargo test --workspace
cargo run --release
To obtain these results:
Perform:
cd go
go vet ./...
go test ./...
go test ./Stats -fuzz=Stats1 -fuzztime=1s
go test ./Stats -fuzz=Stats2 -fuzztime=2s
go test ./Stats -fuzz=StatsN -fuzztime=5s
go test ./BronKerbosch -fuzz=DegeneracyOrder -fuzztime=20s
go run main.go
Optionally, on MSYS2:
PATH=$PATH:$PROGRAMFILES/go/bin
go test -race ./BronKerbosch
To obtain these results:
Perform:
To obtain these results:
Perform:
build it, something akin to:
call "%ProgramFiles%\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvars64.bat"
mkdir build
cd build
cmake .. -A x64 -DCMAKE_CXX_STANDARD=20 -DBUILD_TESTING=ON
cmake --build . --config Release
cmake --build . --config Debug
ctest --progress --config Release
ctest --progress --config Debug
..\cppcoro relative to Bron-Kerbosch):
CppcoroDirTo obtain these results:
Perform:
Python and Rust publish results to detail_* files automatically, the others need a push:
python python3\publish.py go 100 10k 1M
python python3\publish.py csharp 100 10k 1M
python python3\publish.py cpp 100 10k 1M
python python3\publish.py java 100 10k 1M
And finally, generate report images:
python python3\publish.py