Aptos Review 3 — Aptos TPS and Parallel Execution

Aptos TPS and Parallel Execution

A New Approach for Better Throughput.

As one of the most fundamental key performance indicators (KPIs) for blockchain performance, throughput indicates how many transactions a blockchain can process in a certain amount of time. The standard unit of throughput is transactions per second (TPS).

Early blockchains are notorious for their low throughput. For instance, Bitcoin can process 7 TPS, and Ethereum can do 20 TPS, which is surpassed by Web2 payment networks like VISA which process around 3,000 TPS on average and up to 24,000 TPS at peak hours.

To improve blockchain throughput, various aspects of blockchain protocols need redesigns, especially the consensus algorithm.

Improvement in consensus algorithm

At the core of blockchain protocols is distributed ledger technology. Nodes participating in a protocol must communicate and decide what records to keep and in what order. Earlier blockchains like Bitcoin operate on Proof-of-Work (PoW). To generate a block, all participating nodes try to guess the magic number (nonce) that meets the target hash of that block. The node that finds the number first becomes the validator and earns token rewards. In proof-of-work systems, only a tiny portion of the computational power is used in verifying, executing, and recording transactions; most of the computational power is spent on guessing nonces in the puzzle game of hashing.

Green: Verifying, executing, and recording transactions. Red: Guessing the magic number (nonce)

Most of the current blockchains have switched to Proof-of-Stake (PoS). In PoS consensus algorithms, a participating node’s eligibility to record blocks depends on the number of tokens it stakes. The more native tokens it stakes to the protocol, the more likely it’ll be selected as the validator in block generation. Nodes no longer need to guess the magic number and, thus, can dedicate all the computational efforts to verifying, executing, and recording transactions.

Green: Verifying, executing, and recording transactions. Yellow: Communication cost among nodes. Gray: Idle.

However, despite the paradigm shift from POW to POS, blockchains may confront other challenges in transaction execution. In crypto transactions, all participating nodes have to produce and verify the transactions in a block in the recorded order to find the global state of the system after these transactions alter it.

As consensus algorithms become much more efficient, a blockchain’s TPS is capped by its throughput. Since Ethereum introduced smart contracts, the transactions on most blockchains have increased substantially with more sequential computations and data read-writes involved. That’s why new-generation blockchains start looking for alternative ways to maximize their throughputs.

Separation of roles

One way to speed up transactions is using a better CPU and more RAM. However, since blocks produced on a network need to be verified by multiple nodes, this will increase hardware requirements and make the system more centralized.

Flow, built by Dapper Labs, developed a novel blockchain design that distributes different roles to nodes. Some transactions require deterministic, heavy-duty computation, and these tasks are done by a smaller group of nodes with premium hardware equipment. Others require very little computation that are non-deterministic, which require a larger group of nodes with average hardware equipment.

Distributing critical tasks to more powerful hardware can increase the overall performance of a blockchain by 3–10 times. Nevertheless, with parallel computation, the blockchain can expect a performance rate 10–1000 times faster.


One way to achieve parallel computation is state sharding. Sharding breaks the state space into N parts, each operating independently while remaining open to cross-shard communications. Since each shard can run transactions alone, N shards roughly increase the system’s total throughput by N times.

There are downsides of sharding, though. First, cross-shard communications can be slow and costly. This is because information must go through the main beacon chain to be passed to another shard chain for the subsequent transaction. Secondly, it adds another layer of complexity to coding. Developers should build their dApps with shard chains’ design in mind.

Parallel execution

Another way to achieve parallel computation is parallel execution. That is achieved by identifying independent transactions and executing them simultaneously. Two transactions are separate if they don’t read from or write to the same data. No matter which transaction the blockchain executes first, the result will be the same. We can run trades safely and parallelly, with different CPU cores or GPUs.

The resource model makes parallel execution even more feasible. Instead of storing data as variables under the contract account, Solana, Flow, Aptos, and Sui store their data in separate storages under users’ accounts, accounts, or objects. This reduces the chance of malfunctions in data access and makes parallel execution much more effective.

Although these blockchains are all based on the resource model, they operate in slightly different ways. Solana and Sui are similar to each other in transaction execution, although data is stored as accounts on Solana, and as objects on Sui. dApps need to specify which accounts or objects they want to interact with in their function calls, so the VM can run an analysis and see which transactions are independent.

However, Aptos takes a novel approach. It leverages a Software Transactional Memory (STM) variant called Block-STM. In Block-STM, the transactions in every block follow a preset order and are divided between processor threads during execution. To achieve good performance, the blockchain assumes no dependencies between nodes and executes all the transactions but keeps records of memory locations modified by the transactions. After all transaction results are validated, if a transaction is found to access memory locations modified by preceding transactions, it will be invalidated. The aborted transaction is then re-executed, and the process repeats until all transactions are executed.

Compared to Solana and Sui, Aptos’ approach provides a better developer experience because there’s no need to specify which data to access in a transaction. According to Aptos’ reports, Block-STM and other mechanisms can boost cross-node communications to help the network achieve an impressive 160K TPS. That’s why it has become one of the most anticipated blockchains recently.


  1. Block-STM: How we execute over 160k TPS on Aptos blockchain
  2. The Case for Parallel Processing Chains
  3. Flow: Separating Consensus and Compute
  4. Exploring Cross-Shard Communication in Eth2.0