Conflux 共识层的设计与实现
The Conflux consensus layer processes all incoming blocks received from the synchronization layer, produces the total order of blocks based on the Conflux GHAST consensus algorithm, and invokes the underlying transaction execution engine to run transactions in the determined order. It provides the information necessary to assist block generator to prepare the block skeleton of new blocks. It also notifies the transaction pool about processed transactions so that the pool can make better transaction selection decisions.
This document is to provide a high-level overview for readers who want to understand the rust implementation of the Conflux consensus layer (in directory core/src/consensus). For more implementation details, see inlined comments in the code. For more information about the Conflux consensus algorithm, see Conflux Protocol Specification and Conflux paper (https://arxiv.org/abs/1805.03870).
Design Goals
The consensus layer has the following design goals.
-
Process new blocks in the background following the consensus algorithm consistently.
-
We want to minimize the memory usage of each block in the consensus graph. Even with the checkpoint mechanism, the graph will contain 300K-500K blocks in the normal case and more than 1M blocks when facing liveness attacks. This may stress the memory.
-
We want to process each block fast. Because full/archive nodes have to process every block from the original genesis when they catch up with the network from scratch, fast block process is important to keep the catch up period short.
-
Robust against potential attacks. Malicious attackers may generate bad blocks at arbitrary positions in the TreeGraph.
Structures and Components
ConsensusGraph
ConsensusGraph
(core/src/consensus/mod.rs) is the main struct of the
consensus layer. The synchronization layer constructs ConsensusGraph
with a
BlockDataManager
which stores all block metadata information on disk.
ConsensusGraph::on_new_block()
is the key function to send new blocks to the
ConsensusGraph
struct to process. It also provides a set of public functions
to query the status of blocks/transactions. This should be the main interface
with which other components interact.
ConsensusGraphInner
ConsensusGraphInner
(core/src/consensus/consensus_inner/mod.rs) is the inner
structure of ConsensusGraph
. ConsensusGraph::on_new_block()
acquires the
write lock of the inner struct at the start of the function. The rest are
query functions that only acquire read locks.
The internal structure of ConsensusGraphInner
is fairly complicated.
Generally speaking, it maintains two kinds of information. The first kind of
information is the state of the whole TreeGraph, i.e., the current pivot
chain, timer chain, difficulty, etc.. The second kind of information is
the state of each block (i.e., ConsensusGraphNode
struct for each block).
Each block corresponds to a ConsensusGraphNode
struct for its information.
When it first enters ConsensusGraphInner
, it will be inserted into
ConsensusGraphInner::arena : Slab<ConsensusGraphNode>
. The index in the
slab will become the arena index of the block in ConsensusGraphInner
. We use
the arena index to represent a block internally instead of H256
because it is
much cheaper. We will refer back to the fields in ConsensusGraphInner
and
ConsensusGraphNode
when we talk about algorithm mechanism and their
implementations.
ConsensusNewBlockHandler
ConsensusNewBlockHandler
(core/src/consensus/consensus_inner/consensus_new_block_handler.rs) contains a
set of routines for processing a new block. In theory, this code could be part
of ConsensusGraphInner
because it mostly manipulates the inner struct.
However, these routines are all subroutine of the on_new_block()
and the
consensus_inner/mod.rs is already very complicated. We therefore decided to put
them into a separate file.
ConsensusExecutor
ConsensusExecutor
(core/src/consensus/consensus_inner/consensus_executor.rs)
is the interface struct for the standalone transaction execution thread.
ConsensusExecutor::enqueue_epoch()
allows other threads to send an execution
task to execute the epoch of a given pivot chain block asynchronously. Once the
computation finishes, the resulting state root will be stored into
BlockDataManager
. Other threads can call
ConsensusExecutor::wait_for_result()
to wait for the execution of an epoch if
desired. In the current implementation, ConsensusExecutor
also contains the
routines for the calculation for block rewards, including
get_reward_execution_info()
and its subroutines.
ConfirmationMeter
ConfirmationMeter
(core/src/consensus/consensus_inner/confirmation_meter.rs)
conservatively calculates the confirmation risk of each pivot chain block. Its
result will be useful for the storage layer to determine when it is safe to
discard old snapshots. It can also be used to serve RPC queries about block
confirmation if we decide to provide such RPC.
AnticoneCache and PastsetCache
AnticoneCache
(core/src/consensus/anticone_cache.rs) and PastsetCache
(core/src/consensus/pastset_cache.rs) are two structs that implement customized
caches for data structures in ConsensusGraphInner
. In the implementation of
the inner struct, we need to calculate and store the anticone set and the past
set of each block. However, it is not possible to store all of these sets in
memory. We therefore implement cache style data structures to store sets for
recently inserted/accessed blocks. If an anticone/past set is not found in the
cache, we will recalculate the set in the current inner struct implementation.
Important Algorithmic Mechanisms
There are several important algorithmic mechanisms in the Conflux Consensus Layer. Here we will talk about them from the implementation aspect. See XXX for the algorithmic reasoning behind them.
Pivot Chain and Total Order
The basic idea of the Conflux consensus algorithm is to first make everyone agree on a pivot chain. It then expands the total order from the pivot chain to cover all blocks with a topological sort. As long as the pivot chain does not change/reorg, the total order of blocks will stay the same, so does the derived order of transactions.
Comparing with Bitcoin/Ethereum, the consensus in Conflux has two key differences:
-
almost every block will go into the total order, not just the agreed pivot chain.
-
The transaction validity and the block validity are independent. For example, a transaction is invalid if it was included before or it cannot carry out due to insufficient balance. Such invalid transactions will become noop during the execution. However, unlike Bitcoin and Ethereum blocks containing such transactions will not become invalid.
In ConsensusGraphInner
, the arena index of the current pivot chain blocks are
stored in order in the pivot_chain[]
vector. To maintain it, we calculate the
lowest common ancestor (LCA) between the newly inserted block and the current best
block following the GHAST rule. If the fork corresponding to the newly inserted
block for the LCA ended up to be heavier, we will update the pivot_chain[]
from
the forked point.
Timer Chain
Blocks whose PoW quality is timer_chain_difficulty_ratio
times higher than the target
difficulty are timer blocks. The is_timer
field of the block will be set to
True. The consensus algorithm then finds the longest timer block chain (more
accurately, with greatest accumulated difficulty) similar to the Bitcoin
consensus algorithm of finding the longest chain. The arena index of this
longest timer chain will be stored into timer_chain[]
.
The rationale of the timer chain is to provide a coarse-grained measurement of
time that cannot be influenced by a malicious attacker. Because timer blocks
are rare and generated slowly (if timer_chain_difficulty_ratio
is properly
high), a malicious attacker cannot prevent the growth of the timer chain unless
it has the majority of the computation power. Therefore how many timer chain
blocks appear in the past set of a block is a good indication about the latest
possible generation time of the block. We compute this value for each block and
store it in timer_chain_height
field of the block.
Weight Maintenance with Link-Cut Tree
To effectively maintain the pivot chain, we need to query the total weight of a
subtree. Conflux uses a Link-Cut Tree data structure to maintain the subtree
weights in O(log n). The Link-Cut Tree can also calculate the LCA of any two nodes
in the TreeGraph in O(log n). The weight_tree
field in ConsensusGraphInner
is the link-cut tree that stores the subtree weight of every node. Note that
the implementation of the Link-Cut Tree is in the utils/link-cut-tree
directory.
Adaptive Weight
If the TreeGraph is under a liveness attack, it may fail to converge under one
block for a while. To handle this situation, the GHAST algorithm idea is to
start to generate adaptive blocks, i.e., blocks whose weights are redistributed
significantly so that there will be many zero weight blocks with a rare set of
very heavy blocks. Specifically, if the PoW quality of an adaptive block is
adaptive_heavy_block_ratio
times of the target difficulty, the block
will have a weight of adaptive_heavy_block_ratio
; otherwise, the block will
have a weight of zero. This effectively slows down the confirmation
temporarily but will ensure the consensus progress.
Because adaptive weight is a mechanism to defend against rare liveness attacks,
it should not be turned on during the normal scenario. A new block is adaptive
only if: 1) one of its ancestor blocks is still not the dominant subtree
comparing to its siblings, and 2) a significantly long period of time has passed
between the generation of that ancestor block and the new block (i.e., the
difference of timer_chain_height
is sufficiently large). ConsensusGraphInner::adaptive_weight()
and its subroutines implement the algorithm to determine whether a block is
adaptive or not. Note that the implementation uses another link-cut-tree
adaptive_tree
as a helper. Please see the inlined comments for the
implementation details.
Partial Invalid
Note that the past set of a new block denotes all the blocks that the generator of the new block observes at the generation time. Therefore, from the past set of a new block, other full nodes could determine whether it chooses the correct parent block and whether it should be adaptive or not.
The Conflux consensus algorithm defines those blocks who choose incorrect
parents or fill in incorrect adaptive status as partial invalid blocks. For a
partial invalid block, the partial_invalid
field will be set to True. The
algorithm requires the partial invalid blocks being treated differently from
the normal blocks in three ways:
-
All honest nodes will not reference directly or indirectly partial invalid blocks until a significant period of time. This time period is measured with the
timer_chain_height
and the difference has to be more thantimer_chain_beta
. Yes, it means that if another otherwise perfectly fine block referencing the partial invalid block, both of these two blocks will not be referenced for a while. -
Partial invalid blocks will have no block reward. They are extremely unlikely to get any reward anyway because of their large anticone set due to the first rule.
-
Partial invalid blocks are excluded from the timer chain consideration.
To implement the first rule, the on_new_block()
routine in
ConsensusNewBlockHandler
is separated into two subroutine
preactivate_block()
and activate_block()
. preactivate_block()
compute and
determine whether a block is partial invalid or not, while activate_block()
fully integrate a block into the consensus graph inner data structures. For
every new block, the field active_cnt
tracks how many inactive blocks it
references. A block is inactive if it references directly or indirectly a
partial invalid block. activate_block()
will be called on a block only when
active_cnt
of the block becomes zero. The field activated
denotes whether a
block is active or not. For partially invalid blocks, their activation will be
delayed till the current timer chain height of the ledger is timer_chain_beta
higher than the invalid block. Newly generated blocks will not reference any
inactive blocks, i.e., these inactive blocks are treated as if they were not in
the TreeGraph.
Anticone, Past View, and Ledger View
In order to check the partial invalid status of each block, we need to operate under the past view of the block to determine its correct parent and its adaptivity. This is different from the current state of the TreeGraph or we call it the ledger view, i.e., all blocks in the anticone and the future set of the block are excluded. Because we process blocks in topological order, the future set of a new block is empty. We therefore need to eliminate all anticone blocks only.
compute_and_update_anticone()
in ConsensusNewBlockHandler
computes the
anticone set of a new block. Note that because the anticone set may be very
large, we have two implementation level optimizations. First, we represent the
anticone set as a set of barrier nodes in the TreeGraph, i.e., a set of
subtrees where each block in the subtrees is in the anticone set. Second, we
will maintain the anticone set of the recently accessed/inserted blocks
only. When checking whether a block is valid in its past view or not (e.g., in
adaptive_weight()
and in check_correct_parent()
), we first cut all barrier
subtrees from the link-cut weight trees accordingly to get the state of the
past view. After the computation, we restore these anticone subtrees.
Check Correct Parent
To check whether a new block chooses a correct parent block or not, we first
compute the set of blocks inside the epoch of the new block assuming that
the new block is on the pivot chain. We store this set to the field
blockset_in_own_view_of_epoch
. We then iterate over every candidate block in
this set to make sure that the chosen parent block is better than it.
Specifically, we find out the two fork blocks of the candidate block and the
parent block from their LCA and make sure that the fork of the parent is
heavier. This logic is implemented in check_correct_parent()
in
ConsensusNewBlockHandler
.
Note that blockset_in_own_view_of_epoch
may become too large to hold
consistently in memory as well. Especially if a malicious attacker tries to
generate invalid blocks to blow up this set. The current implementation will
only periodically clear the set and only keep the sets for pivot chain blocks.
Note that for pivot chain blocks, this set will also be used during the
transaction execution.