Bitcoin's Mempool

2021-07-12

Most everyone who has heard of Bitcoin has heard of the blockchain: Bitcoin's famed datastructure that when coupled with Proof-of-Work gives rise to its properties as a digital cash system. Fewer people have heard of Bitcoin's mempool, an equally important datastructure that gives rise to Bitcoin's security model and ultimately contributes to its effeciency and resiliency. In this post, we will briefly cover the important functions of the mempool, as well as some improvements that will hopefully be coming in the next several years.

What is the Mempool?

Mempool is a portmanteau of "memory pool", and it is the datastructure where pending Bitcoin transactions live until they are included in a block. Remember, Bitcoin blocks are limited in size, so it is quite common for pending transactions to not make it into the most recently created block. On top of that, because there is no central clearing house for transactions, network connectivity issues could mean that a transaction does not propogate quickly enough to a given miner, and a block can be created without a transaction that normally would make it into a block. The mempool, while often discussed in the abstract as if there was a single pool of Bitcoin transactions, is not globally determined like the blockchain is and is unique to every node on the network. Each node can determine how large their mempool is, how many transactions they wish to keep track of, and what transaction acceptance and forwarding policies they wish to implement.

What does a Standard Mempool on a Standard Node Look Like?

By default when installing Bitcoin Core the Mempool is limited to 300 MB in size. This mempool is kept in memory and updated continuously when new transactions are forwarded by peers or when new blocks are found on the network. This default mempool will only forward transactions that have a minimum fee rate of 1 sat/vbyte, and will prune the least fee dense transactions from memory when the mempool grows larger than the allocated 300 MB.

A good resource to look at when trying to understand the mempool and its importance is mempool.space. The maintainers of the project keep a well connected node online that monitors the network and provides analytics along with a digestible presentation of the information. Although mempool.space shows the mempool as if it were a globally shared state, it is just another node on the network. Indeed, you can download the software yourself and run the same analytics on your own node if you want!

What does the Mempool Accomplish on the Network?

The mempool seems, at first, to be mundane in its operation. However, it performs several very important jobs on the Bitcoin network that improve both efficiency and resiliency.

Block Propogation

A Bitcoin block is mined approximately every ten minutes and weighs about one megabyte. Although one megabyte every ten minutes sounds like a small amount of data, in Bitcoin this one megabyte must be sent as fast as possible to every miner and node on the planet. If blocks propogate slowly, miners will be working on stale information and could create a competing block at the same height while the first is in transit. Attackers could exploit network inefficiencies to trick unsuspecting users into accepting money they will never actually recieve. And perhaps worst of all, it create a large centralizing pressure on miners to join well connected pools of hash power that will not suffer from network delays. Moreover, it is advantageous to pre validate transactions before a block arrives, so no extra CPU power has to validate the thousands of transactions in a block when it arrives. Block propogation is deeply important, and so every effort was taken to allow Bitcoin to send blocks as fast as possible to every corner of the globe, even over unreliable or slow internet connections.

How does the mempool help? Well, the vast majority of a block is composed of transactions, most of which are propogated at random intervals between the arrival of new blocks in the chain. The key insight is that there is no need to send information a node already has! So, when a block is propogated from one node to another, the data inside the block is not actually sent. Instead, a sketch is sent that includes the transaction ids of all the transactions in the block. If a node has already heard of all the transactions in a block, it uses the mempool to find those transactions and insert them into their proper spaces to recreate the block using only minimal network connectivity at the time of block transmission. It can request more information if it is missing some of the transactions, but in practice this is relatively rare. By caching and pre-validating transactions before a block arrives, the mempool can effectively stretch block transmission and validation costs across the full ten minute interblock time, making small home nodes more practical and affordable, thus preserving Bitcoin's decentralized properties.

Transaction Fee Estimation

Since there are more transactions pending approval than fit into a Bitcoin block, there exists a market for transaction fees that prioritize inclusion in a block. Miners use the mempool to select the most fee dense transactions (ie, the ones that pay the most sats/vbyte), but users must use the mempool to do the opposite. By observing new transactions entering the mempool, users can get a good picture of how in demand the network's resources are, and can price their transactions accordingly. Some transactions only need to be confirmed within a day or maybe can wait a week (imagine purchasing a house), while some businesses will want transactions to clear in the next block. By seeing what volume of transactions are moving across the network and what fee rates people are willing to pay, individuals and businesses that want to send or move Bitcoin can more accurately price their transactions. In the future, Bitcoin will no longer be subsidizing miners with new coin issuance, and instead miners will rely entirely on transaction fees to conduct their business. The mempool not only helps users price transactions, but also gives miners assurances of market conditions for their hashpower.

Today it is not uncommon for the mempool to clear, especially on the weekends. It can often save you money to use the network when demand is low on these occasions. It is not expected that this condition will last forever as Bitcoin adoption continues to trend upward and all effeciencies are squeezed from transaction use.

DDoS Protection

In any peer-to-peer network, there is always the risk of Distributed Denial of Service Attacks, or DDoS attacks. When connecting to random peers on the internet, it is always a risk that your peer is malicious, and may take advantage of the opportunity to waste your computer's resources such as disk IO or network bandwidth. In a worst case scenario, it could even crash your computer or otherwise freeze it out of performing any useful work while under attack. If not designed correctly, an attacker could theoretically attack to many thousands or millions of peers and start strategically taking important peers offline in order to influence the overall behavior of the network. In a system like Bitcoin that controls hundreds of billions of dollars, this could be especially profitable and dangerous.

The mempool helps protect the network by gatekeeping the behavior of peers and protecting your computer. Because of Proof-of-Work, it is very difficult to create spam Bitcoin blocks, but it is trivial to create spam transactions. The mempool creates standard rules that prevent a malicious peer from flooding your computer with valueless transactions that would eat up CPU and bandwith to transmit and validate. For example, transactions do not have to have a fee to be included in a Bitcoin block, but they must include a minimum 1 sat/vbyte fee to be included in a default mempool. If you send updates to this transaction, they must increase the fee by a certain amount. If the mempool is full, the minimum fee rate dynamically increases, again to prevent spam. The worst an attacker can do is use up the max size of the mempool, which is 300 megabytes by default. But because of the minimum feerate, it would be costly to do so as the network would eventually eat up the attacker's money as transaction get confirmed.

Large Bitcoin nodes, such as those for businesses or miners, often will change the rules around how the mempool protects them in order to capture more value from the mempool, but for standard Bitcoin users the system has worked well to keep DDoS attacks expensive and the network running smoothly.

What Improvements are Coming to the Mempool?

There are a variety of projects that are working on improving the mempool. As noted earlier, it is not uncommon for the mempool to be relatively empty, as transaction volume on Bitcoin has not yet scaled to global adoption size. As Bitcoin adoption grows and the value settled on the network increases, improvements will have to be made to keep the network resilient and efficient. We can group the coming improvements into two large categories, but as the mempool's importance grows, more ideas will have to be tested and implemented. Luckily, since each mempool is independant, it is relatively easy to deploy improvements incrementally across the network (as opposed to the improvements to transaction format which require 100% consensus).

Transaction Relay

On the transaction relay side, there are two relatively important proposals. The first is known as Erlay which replaces the current method Bitcoin nodes use to send transactions to the network with a more efficient scheme that sends less data. This is already good (in the sense that less network bandwith will be used), but it also allows nodes to be better connected to the wider network which can aid privacy and thwarts several theoretical targeted attacks. The second is a proposal called Dandelion that improves the privacy of transaction broadcasting across the network by creating a secret propogation path for a transaction to be sent along before blooming into the wider network. Erlay is closer to deployment than Dandelion, but I expect both to be in use on the network within the next five years. While neither directly impact the mempool itself, the mempool relies on transaction propogation and any improvements to transaction propogation make the mempool better at its job. Thus, this can be thought of as an indierct improvement.

Optimizations

The newest proposals address long term inefficiencies that exist in the current implementation of the mempool. When new blocks are sent across the network, the mempool must be updated to reflect the network. Transactions that are now invalid must be thrown away, transactions that were included in the block must be moved out, and new transactions must be fetched from peers. As stated earlier, the mempool has often been empty throughout Bitcoin's history, and so this code has not seen as much love as will be needed in the future. Better testing and mocking frameworks must be implemented and tested, better designs to speed updates in the mempool, ideas to shrink the minimum size of the mempool, improving privacy, and more. There are so many ideas and proposals and lines of code to vette, it can be overwhelming. Overall, the mempool has been underutilized and will experience much change in the coming years.

Conclusion

The mempool serves deeply important functions in the Bitcoin network today, and will only become more important tomorrow. The mempool works well today, but with continual improvements, it will become a hub of realtime financial information for the monetary network powering the globe.