Artifact Review Summary: Kauri: Scalable BFT Consensus with Pipelined Tree-Based Dissemination and Aggregation
Artifact Details
Badges Awarded
Artifact Available | Artifact Functional |
Description of the Artifact
Authors’ Description
Kauri is a BFT communication abstraction that leverages dissemination/aggregation trees for load balancing and scalability while avoiding the main limitations of previous tree-based solutions, namely, poor throughput due to additional round latency and the collapse of the tree to a star even in runs with few faults while at the same time avoiding the bottleneck of star based solutions.
Features
Kauri extends the publicly available implementation of HotStuff (https://github.com/hot-stuff/libhotstuff) with the following additions:
-
Tree Based Dissemination and Aggregation equally balancing the message propagation and processing load among the internal nodes in the tree.
-
BLS Signatures: Through BLS signatures the bandwidth load of the system is reduced significantly and signatures may be aggregated at each internal node.
-
Extra Pipelining: Additional pipelining allows to offset the inherent latency cost of trees, allowing the system to perform significantly better even in high latency settings.
Run Kauri
At the moment only bls signatures are supported. To run HotStuff with libsec signatures, this can be done by running vanilla Hotstuff at https://github.com/hot-stuff/libhotstuff.
Summary of Reviewers’ Descriptions
Kauri modifies HotStuff to avoid leader bandwidth bottlenecks using dissemination / aggregation trees. It uses a novel pipelining technique for scalability, and improves HotStuff’s throughput without significant damage to latency.
The artifact includes a modified version of a HotStuff codebase featuring the Kauri alterations. It also includes experiment scripts which can setup a docker swarm, run experiments based on a simple configuration file, and report throughput and latency.
Overall setup was easy, bet several key issues remain. Specifically, the artifact can only run HotStuff-BLS and Kauri experiments varying a limited set of parameters, and tracking aggregates of throughput and latency. This is not sufficient to reproduce all of the data in the paper.
Envrionment(s) Used for Testing
Reviewer A Used
Cloudlab with 5 c6525-100g instances in the utah cluster running Ubuntu 20.04. Technical specifications include:
attribute | value |
---|---|
dom0mem | 8192M |
hw_cpu_bits | 64 |
hw_cpu_cores | 24 |
hw_cpu_hv | 1 |
hw_cpu_sockets | 1 |
hw_cpu_speed | 2800 |
hw_cpu_threads | 2 |
hw_mem_size | 131072 |
processor | AMD EPYC 7402P |
Reviewer B Used
3 different clusters on grid5000, all running Ubuntu 20.04:
cluster | gros | chiclet | dahu |
---|---|---|---|
nodes | 7 | 7 | 14 |
cpu | Intel Xeon Gold 5220 | AMD EPYC 7301 | Intel Xeon Gold 6130 |
architecture | Cascade Lake-SP | Zen | Skylake |
frequency | 2.20GHz | 2.20 GHz | 2.10 GHz |
cpu / node | 1 | 2 | 2 |
cores / cpu | 18 | 16 | 16 |
RAM | 96 GiB | 128 GiB | 192 GiB |
ethernet | 2 x 25Gbps | 2 x 25 Gbps | 10 Gbps |
Step-By-Step Instructions to Excercise the Artifact
Reviewers followed the instructions in the artifact README.
Setup
To summarize the instructions in the artifact README, on each cluster, the reviewer:
- downloaded the latest version of the artifact with git onto each machine
- built a Kauri docker image on each machine
- selected one cluster machine as the control machine, initiated a docker swarm
- added all the docker images on all the cluster machines to the swarm, and
- created a network within the docker swarm
The reviewers had no problems following artifact instructions for these steps.
Reviewers were also able to adjust the number of kauri replicas in
kauri.yaml
as described in the instructions.
Running Experiments
As explained in the instructions, the experiments
file outlines the parameters to be used in each experiment.
Each line represents one experiment (they will be run sequentially), and each runs 5 times by default.
Reviewers were able to run each experiment only once by altering runexperiment.sh.
The file format is:
type, fanout pipeline-depth pipeline-lat latency bandwidth : number of internals : number of total : suggested physical machines
['bls','10','6','10','100','25','1000']:11:89:5
# HotStuff has fanout = N
['bls','100','0','10','100','25','1000']:11:89:5
Note that HotStuff experiments are simply those where the fanout equal the number of servers (no tree structure will be used).
The default experiments file now contains the parameters for replicating much of the data in figures 6, 7, and 9. To replicate data form figure 5, reviewer B used the same parameters as the third test of figure 6, but with varying pipeline depth:
['bls','10','1','10','100','25','1000']:11:89:5
['bls','10','2','10','100','25','1000']:11:89:5
['bls','10','3','10','100','25','1000']:11:89:5
['bls','10','4','10','100','25','1000']:11:89:5
['bls','10','5','10','100','25','1000']:11:89:5
['bls','10','6','10','100','25','1000']:11:89:5
['bls','10','7','10','100','25','1000']:11:89:5
['bls','10','8','10','100','25','1000']:11:89:5
To run a batch of experiments, on the control node, run runexperiment.sh.
It’s probably best to run this in tmux
or similar, as it runs for a long time, and prints out valuable output.
Specifically, it produces output similar to:
2021-08-17 14:14:43.546142 [hotstuff proto] x now state: <hotstuff hqc=affd30ca8f hqc.height=2700 b_lock=22365a13f8 b_exec=63c209503b vheight=27xx tails=1>
2021-08-17 14:14:43.546145 [hotstuff proto] Average: 200"
Where hqc.height=2700
presents the last finalized block. Considering the 5 minute interval, that results in 2700/300 blocks per second. Considering the default of 1000 transactions pr block, that results in 2700/300*1000 = 9000 ops per second. The value next to “average” represents the average block latency.
How The Artifact Supports The Paper
Available
The artifact, which includes an implementation of Kauri, is available at (https://github.com/Raycoms/Kauri-Public)
Functional
Reviewers were able to run instances of Kauri’s BFT consensus using the runexperiment.sh outlined above.
Some Results Reproduced
The paper presents a lot of measurement results, both with Kauri and with control systems. Reviewers were able to reproduce several of these results. It’s easiest to consider the results by figure:
Figure 5
At least at low pipeline-depths, reviewer results are similar (but not identical) to the paper. Reviewer B found that throughput does increase with pipeline-depth, reaching a maximum around depth 5 or 6, with less dramatic trend than was in the paper. Some results below:
250Kb (1000 tx per block) - throughput measured in blocks per 5-minute period.
1 pipelining stretch - expect 300
884 - Dahu with latency 657
2 pipelining stretch - expect 900
1378 - Dahu with latency 636
3 pipelining stretch - expect 1260
1768 - Dahu with latency 667
4 pipelining stretch - expect 1650
2166 - Dahu with latency 679
5 pipelining stretch - expect 2010
2603 - Dahu with latency 677
6 pipelining stretch - expect 2400
2705 - Dahu with latency 758
7 pipelining stretch - expect 2500
2683 - Dahu with latency 867
8 pipelining stretch - expect 2500
2703 - Dahu with latency 976
Figure 6
The reviewers were able to replicate the Figure 6 experiments for 100 processes, and their results agree with the paper. One reviewer was able to replicate figure 6 results for 150 and 200 nodes as well.
Figure 7
Reviewers were able to replicate the Kauri line from Figure 7 (the artifact was not set up for non-Kauri experiments).
Figure 8
The latencies reviewer B measured for Kauri in Figure 9’s Kauri (h=2) experiments reflect the general downward trend in Kauri’s latency shown in Figure 8, although not the specific values. In particular, the trend is much stronger (Kauri’s latency is higher at low bandwidth and lower at high bandwidth than the figure shows). (The artifact was not set up for non-Kauri experiments)
Figure 9
On low throughput Kauri datapoints, reviewers were able to reproduce Figure 9 results (for Kauri and HotStuff-BLS). Specifically, results were:
(throughput measured in blocks per 5-minute period)
Fig 9 (and sort-of Fig 8)
RTT=100, block size = 1000
Throughput measured in blocks per 5 minutes
Kauri (h=2, fanout=10)
25Mb (pipeline-depth = 3)
Expected Throughput 2700 Latency 590
Chiclet: Throughput 2471 Latency 477
50Mb (pipeline-depth = 4)
Expected Throughput 5700 Latency 490
Chiclet: Throughput 4179 Latency 353
100Mb (pipeline-depth = 6)
Expected Throughput 10800 Latency 400
Chiclet: Throughput 6952 Latency 296
1000Mb (pipeline-depth = 8)
Expected Throughput ???? Latency 350
Chiclet: Throughput 9840 Latency 269
Kauri (h=3, fanout=5)
25Mb (pipeline-depth = 4)
Expected Throughput 5700 Latency 600
Chiclet: Throughput 2889 Latency 505
50Mb (pipeline-depth = 6)
Expected Throughput 10800 Latency 510
Chiclet: Throughput 4719 Latency 437
100Mb (pipeline-depth = 8)
Expected Throughput ???? Latency ???
Chiclet: Throughput 6579 Latency 404
1000Mb (pipeline-depth = 8)
Expected Throughput ???? Latency ???
Chiclet: Throughput 6941 Latency 378
HotStuff BLS
25Mb
Expected Throughput 270 Latency 1090
Dahu: Throughput 60 Latency 998 # must be a fluke
Chiclet: Throughput 275 Latency 1074
Chiclet: Throughput 273 Latency 1074
50Mb
Expected Throughput 540 Latency 580
Dahu: Throughput 521 Latency 569
Chiclet: Throughput 515 Latency 575
100Mb
Expected Throughput 750 Latency 350
Dahu: Throughput 858 Latency 344
Chiclet: Throughput 840 Latency 350
1000Mb
Expected Throughput 1800 Latency 200
Dahu: Throughput 1760 Latency 168
Chiclet: Throughput 1636 Latency 180
Additional Notes and Resources
Several adjustments were made to the artifact during (and after) the review period. Hopefully, these will make it easier for future users to reproduce more results.