Blockchain for Enterprise
上QQ阅读APP看书,第一时间看更新

What is a Merkle tree?

Before we get into what a Merkle root in blocks of blockchain is, let's understand the structure of blockchain. A block is made up of two parts; the first part is the block header and the second part is the set of transactions of that block. The block header contains information such as the previous block hash (it's actually a hash of the previous block's header), timestamp, Merkle root, and information related to achieving consensus. 

At the time of sync, while downloading a block a node downloads the block header and the block's transactions. Now, how would the receiving node know that these transactions are actually part of that block and are in the correct order? Every block is identified by a unique hash, but the block hash is not part of the block header and is uniquely calculated by every node after downloading the block; therefore we cannot use the idea of a block hash. Instead, we can rely on something like a transactions hash; a hash stored in the block header, which is calculated by combining all transactions and hashing it. This idea will work perfectly, and we can detect if any transaction is missing or extra transactions included, or if transactions are in the correct order. 

Well, a Merkle root is an alternative to the transactions hash approach but provides another major advantage: it allows the network to have light nodes. We can, of course, have blockchain implemented without Merkle root, but if there is a need for light nodes in the network, then Merkle roots are required to be used. A light node is one that only downloads block headers but no transactions but still it should be able to provide all APIs to the client. For example: a smartphone cannot have the full blockchain as it could be very large in size; therefore, we can install a light client in smartphones. 

Let's first understand what a binary Merkle tree is with respect to blockchain. A hash tree or Merkle tree is a tree in which every leaf node is a hash of a transaction, and every non-leaf node is a hash of the hashes of its child nodes. Hash trees allow efficient and secure verification of which transactions are part of the block. Every blocks forms it's own Merkle tree. A Merkle tree is called a binary Merkle tree when every parent has two children. Binary Merkle trees are what is used in blockchain. Here is an example of a binary Merkle tree:

In the preceding diagram, at first the individual hash of every transaction is calculated. Then, they are grouped into two. And then, a hash of the two hashes is calculated for each pair. This process will continue until we have a single hash called the Merkle root. In case there are odd numbers of transactions, the last transaction is duplicated to make the total number of transactions even. 

Now, at the time of downloading a complete block, the block header, and transactions of the block, a node can verify whether the set of transactions is correct or not by forming the binary Merkle tree and checking that the generated Merkle root is the same as the one included in the block header. Of course, this can be done without a Merkle tree, as discussed previously.

A light node can take advantage of the Merkle tree to serve requests to the client. For example, a light node can make a request to a full node asking if a particular transaction is committed in a block or not, and the full node replies with the block number and Merkle proof if the transaction is committed in a block. A light node cannot just believe the full node if the full node provides a block number, therefore the full node also provides Merkle proof. To understand what a Merkle proof is, let's take the preceding diagram and a scenario where the light node asks the full node if TxD is committed or not. Now, the full node returns the block number along with a sub-tree, which is HABCD, HAB, HCD, HC, and HD. This sub-tree is the Merkle proof. Now, the light client can take this Merkle proof and verify it. Verification will include looking at whether the Merkle proof is constructed correctly and whether the Merkle root of the Merkle proof is the same as the Merkle root present in the block header of the block that the full node claimed the transaction was in. You must be wondering, what if a full node claims that the transaction is not committed even after it's committed? In this case, the only way to tackle this issue is to request multiple full nodes, and it's unlikely all of them will lie. This functionality cannot be achieved without Merkle trees.

Ethereum blockchain is more complicated. Now, suppose an Ethereum light node wants to know the balance of an account, read data from a smart contract, find the gas estimation for a transaction, and so on, then with this simple transactions binary Merkle will not be able to provide this functionality. So, in Ethereum, every block header contains not just one Merkle tree, but three trees for three kinds of objects:

  • Transactions
  • Transaction receipts (essentially, pieces of data showing the effect of each transaction)
  • State

As we now have three trees, let's take an advanced query example that a light node would make to a full node. The query is pretend to run this transaction on this contract. What would the transaction receipt and new state be? This is handled by the state tree, but the way that it is computed is more complex. Here, we need to construct what can be a Merkle state transition proof. Essentially, it is a proof that makes the claim: if you run transaction T on the state with root S, the result will be a state with root S', with transaction receipt R. To compute the state transaction proof, the full node locally creates a fake block, sets the state to S, and pretends to be a light node while applying the transaction. That is, if the process of applying the transaction requires the light node to determine the balance of an account, the light node makes a balance query. If the light node needs to check a particular item in the storage of a particular contract, the light node makes a query for that, and so on. The full node responds to all of its own queries correctly, but keeps track of all the data that it sends back. The full node then sends the light node the combined data from all of these requests as a proof. The light client then undertakes the exact same procedure, but using the provided proof as its database instead of making requests to the full node and if its result is the same as what the full node claims, the light client accepts output to be the one full node claims to be.

For state, Ethereum uses the Merkle Patricia tree instead of the binary tree. For the state tree, the situation is more complex. The state in Ethereum essentially consists of a key-value map, where the keys are addresses and the values are account balance, nonce, code, and storage for each account (where the storage is itself a tree). To learn how the Merkle Patricia tree works, visit https://easythereentropy.wordpress.com/2014/06/04/understanding-the-ethereum-trie/.

In enterprise blockchains, there is no use of light clients as the nodes represent an enterprise, and enterprises have infrastructure to run full nodes.