How to Build a High-Performance Blockchain

Source: Aptos Labs
Since the advent of computing technology, engineers and researchers have been continuously exploring how to push computing resources to the performance limit, aiming to maximize efficiency while minimizing the latency of computing tasks. The two pillars of high performance and low latency have always shaped the development of computer science, influencing a wide range of fields from CPUs, FPGAs, and database systems to more recent artificial intelligence infrastructure and blockchain systems. In the pursuit of high performance, pipeline technology has become an indispensable tool. Since the introduction of pipeline technology in the IBM System/360 in 1964 [1], it has been a core of high-performance system design, driving key discussions and innovations in the field.
Pipeline technology is not only applied to hardware but also widely used in the database field. For example, Jim Gray introduced the pipeline parallelism approach in his work "High-Performance Database Systems" [2]. This method breaks down complex database queries into multiple stages and runs them simultaneously, thus improving efficiency and performance. Pipeline technology is equally vital in the field of artificial intelligence, especially in widely used deep learning frameworks like TensorFlow. It utilizes data pipeline parallelism to process data preprocessing and loading, ensuring a smooth flow of data for training and inference, making AI workflows faster and more efficient [3].
Blockchain is no exception. Its core function is similar to a database, handling transactions and updating the state, but it adds the challenge of Byzantine fault-tolerant consensus. The key to improving blockchain throughput (transactions per second) and reducing latency (time to finality) lies in optimizing the different stages—ordering, execution, submission, and transaction synchronization—during interactions under high loads. This challenge is particularly crucial in high-throughput scenarios where traditional designs struggle to maintain low latency.
To explore these concepts, let's consider a familiar analogy: the automobile factory. Understanding how the assembly line has revolutionized manufacturing can help us grasp the evolution of the blockchain pipeline—and why next-generation designs like Zaptos [8] are pushing blockchain performance to new heights.
From Automobile Factory to Blockchain
Imagine you are the owner of an automobile factory with two main goals:
· Maximize throughput: Assemble as many cars as possible every day.
· Minimize latency: Reduce the build time of each car.
Now, consider three types of factories:
Simple Factory
In a simple factory, a group of versatile workers systematically assembles a car. One worker assembles the engine, the next worker installs the wheels, and so on—producing only one car at a time.
The issue? Some workers often wait idle, leading to an overall low production efficiency because no one is working on different parts of the same car simultaneously.
Ford Factory
Enter the Ford assembly line[4]! Here, each worker focuses on a single task. The car moves along a conveyor belt, and as each car passes through, a dedicated worker adds their part.
The result? Multiple cars are at different assembly stages simultaneously, and all workers are busy. Throughput increases significantly—but each car still needs to go through each worker sequentially, meaning the delay per car remains the same.
Magic Factory
Imagine a magic factory where all workers can work on a single car simultaneously! No longer needing to move the car from one station to the next, each part of the car is built simultaneously.
The outcome? The car is assembled at a record speed, with every step happening in sync. This is the ideal scenario to address throughput and latency issues.
Alright, enough about car factories—what about blockchain? As it turns out, designing a high-performance blockchain is not so different from optimizing an assembly line.
Blockchain as a Car Factory
In blockchain, processing a block is akin to assembling a car. The analogy goes as follows:
· Worker = Validator Resource
· Car = One Block
· Assembly Task = Consensus, Execution, and Submission stages
Just as in a simple factory where only one car is processed at a time, if a blockchain were to handle only one block at a time, it would result in underutilization of resources. In contrast, modern blockchain designs aim to emulate the Ford assembly line—processing multiple blocks in different stages simultaneously. This is where pipeline technology shines.
Evolution of Blockchain Pipelines
Traditional Architecture: Sequential Blockchain
Imagine a blockchain that processes blocks sequentially. Validators need to:
1. Receive block proposals.
2. Execute blocks to update the blockchain state.
3. Proceed with achieving consensus on that state.
4. Persist the state to the database.
5. Initiate the consensus for the next block.
Where is the problem?
· Execution and submission are in the critical path of the consensus process.
· Each consensus instance needs to wait for the previous one to complete before starting.
This setup is akin to factories of the pre-Ford era: workers (resources) often idle as they focus on only one block (car) at a time. Unfortunately, many existing blockchains still fall into this category, leading to low throughput and high latency.
Aptos: Parallelizing Performance
Diem introduced a pipeline architecture that decouples execution and submission from the consensus phase, with the consensus phase itself also adopting a pipeline design.
· Asynchronous Execution and Submission [5]: Validators first agree on a block, then execute the block based on the parent block's state. Once validated by a quorum of validators, the state is persisted to storage.
· Pipeline Consensus (Jolteon[6]): New consensus instances can start before the previous one completes, akin to a moving assembly line.
This enhancement allows different blocks to be in different stages simultaneously, increasing throughput and significantly reducing block times to just 2 message delays. However, Jolteon's leader-based design may lead to bottlenecks as the leader can become overloaded during transaction dissemination.
Aptos further optimizes the pipeline through Quorum Store[7], a mechanism that decouples data distribution from consensus. Quorum Store no longer relies on a single leader to broadcast large data blocks in the consensus protocol but separates data distribution from metadata ordering, allowing validators to asynchronously and concurrently distribute data. This design leverages the total bandwidth of all validators, effectively eliminating leader bottlenecks in consensus.

Visualization: How Quorum Store balances resource utilization in leader-based consensus protocols.
Thus far, the Aptos blockchain has built the "Ford Factory" of blockchains. Just as Ford's assembly line revolutionized car manufacturing—different cars in different stages simultaneously—Aptos processes different blocks in different stages concurrently. Each validator's resources are fully utilized, ensuring no part of the process remains idle. This clever arrangement has led to a high-throughput system, making Aptos a robust platform for efficiently and scalably handling blockchain transactions.

Illustration: Pipelined Processing of Sequential Blocks in the Aptos Blockchain. Validators can pipeline process different stages of sequential blocks to maximize resource utilization and increase throughput.
While throughput is crucial, end-to-end latency—the time from transaction submission to final confirmation—is equally important. For applications such as payments, decentralized finance (DeFi), and gaming, every millisecond counts. Many users have experienced delays during high-traffic events because each transaction must sequentially pass through a series of stages: client-full node-validator communication, consensus, execution, state validation, submission, and full node synchronization. Under high load, stages like execution and full node synchronization introduce additional latency.

Illustration: Pipeline Architecture of the Aptos Blockchain. The diagram shows client Ci, full node Fi, and validator Vi. Each box represents a stage a transaction block in the blockchain must go through from left to right. The pipeline consists of five stages: consensus (including dissemination and ordering), execution, validation, submission, and full node synchronization.
It's like a Ford factory: while the assembly line maximizes overall throughput, each car still needs to pass through each worker sequentially, resulting in longer completion times. To truly push blockchain performance to the limit, we need to build a "magic factory" where these stages run in parallel.
Zaptos: Towards Optimal Blockchain Latency
Zaptos[8] further reduces latency through three key optimizations without sacrificing throughput.
· Optimistic Execution: Reducing pipeline latency by starting execution immediately upon receiving a block proposal. Validators promptly add the block to the pipeline and speculatively execute after the parent block completes. Full nodes, upon receiving the proposal from the validator, also perform optimistic execution to validate the state proof.
· Optimistic Submission: Writing state to storage immediately after block execution—even before state validation. When validators eventually validate the state, only minimal updates are needed to complete the submission. If a block ultimately remains unsorted, its optimistically submitted state is rolled back for consistency.
· Fast Verification: Validators expedite verification by concurrently sending validation messages at the final consensus round, starting early verification of the executed block's state without waiting for consensus completion. This optimization significantly reduces pipeline latency by one round in common scenarios.

Illustration: Parallel Pipeline Architecture of Zaptos. Stages other than consensus are effectively hidden within the consensus stage, reducing end-to-end latency.
Through these optimizations, Zaptos effectively hides the latency of other pipeline stages within the consensus stage. Thus, if a blockchain adopts an optimal latency consensus protocol, the overall blockchain latency can also reach an optimum!
Talk is Cheap, Show Me the Data
We evaluated Zaptos' end-to-end performance through geographically distributed experiments, with Aptos as the high-performance baseline. For more details, refer to the paper [8].
On Google Cloud, we simulated a globally decentralized network consisting of 100 validators and 30 full nodes distributed across 10 regions, using commercial-grade machines similar to Aptos deployment.
Throughput-Latency

Figure: Common performance characteristics of Zaptos and Aptos blockchains.
The above figure compares the relationship between end-to-end latency and throughput of the two systems. Both exhibit a gradual latency increase as the load increases, with sharp spikes at maximum capacity, but Zaptos consistently demonstrates more stable latency before reaching peak throughput, reducing latency by 160 milliseconds under low load and over 500 milliseconds under high load.
Impressively, Zaptos achieves sub-second latency at 20k TPS in a production-level mainnet environment—this breakthrough makes real-world applications requiring speed and scalability a possibility.
Latency Breakdown

Figure: Latency breakdown of the Aptos blockchain.

Figure: Latency breakdown of Zaptos.
The latency breakdown charts detail the duration of each stage for validators and full nodes in the pipeline. Key insights include:
· Up to 10k TPS: Zaptos' overall latency is nearly equivalent to its consensus latency, as optimistic execution, authentication, and optimistic commit stages are effectively "hidden" within the consensus stage.
· Above 10k TPS: Due to increased optimistic execution and full node synchronization time, non-consensus stages become more significant. Nevertheless, Zaptos significantly reduces overall latency by overlapping most stages. For example, at 20k TPS, the baseline total latency is 1.32 seconds (consensus 0.68 seconds, other stages 0.64 seconds), while Zaptos is 0.78 seconds (consensus 0.67 seconds, other stages 0.11 seconds).
Conclusion
The evolution of blockchain architecture parallels the transformation in manufacturing—from simple sequential workflows to highly parallelized assembly lines. Aptos's assembly line approach has significantly increased throughput, while Zaptos goes further, reducing latency to sub-second levels, all while maintaining high TPS. Just as modern computing architectures leverage parallelism to maximize efficiency, blockchain must continuously optimize its design to eliminate unnecessary delays. By comprehensively optimizing the blockchain pipeline to achieve minimal latency, Zaptos paves the way for real-world blockchain applications that require speed and scalability.
References
[1] Gene M. Amdahl, Gerrit A. Blaauw, and Frederick P. Brooks. 1964. "Architecture of the IBM System/360." IBM Journal of Research and Development. https://doi.org/10.1147/rd.82.0087
[2] David DeWitt, and Jim Gray. 1992. "Parallel Database Systems: The Future of High Performance Database Systems." Communications of the ACM. https://doi.org/10.1145/129888.129894
[3] Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin et al. 2016. "TensorFlow: a System for Large-Scale Machine Learning." In 12th USENIX symposium on operating systems design and implementation (OSDI). https://arxiv.org/abs/1605.08695
[4] The Moving Assembly Line and the Five-Dollar Workday. https://corporate.ford.com/articles/history/moving-assembly-line.html
[5] Zekun Li, and Yu Xia. 2021. DIP-213 - Decoupled Execution. https://github.com/diem/dip/blob/7dc44ee57bb7efe76559f05dcc6851d97e2d3149/dips/dip-213.md
[6] Rati Gelashvili, Lefteris Kokoris-Kogias, Alberto Sonnino, Alexander Spiegelman, and Zhuolun Xiang. 2022. "Jolteon and Ditto: Network-Adaptive Efficient Consensus with Asynchronous Fallback." In International conference on financial cryptography and data security (FC). https://arxiv.org/abs/2106.10362
[7] Quorum Store: How Consensus Horizontally Scales on the Aptos Blockchain. https://medium.com/aptoslabs/quorum-store-how-consensus-horizontally-scales-on-the-aptos-blockchain-988866f6d5b0
[8] Zhuolun Xiang, Zekun Li, Balaji Arun, Teng Zhang, and Alexander Spiegelman. 202 2025. "Zaptos: Towards Optimal Blockchain Latency." arXiv preprint arXiv:2501.10612. https://arxiv.org/abs/2501.10612
This article is from a submission and does not represent the views of BlockBeats.
You may also like

Ray Dalio's new article: The world is entering a war cycle

IOSG: When Fintech Meets Crypto Native: The Next Decade of Digital Finance

They knew in advance that Trump would tweet about a ceasefire, entered with $20k, and exited with $400k.

The biggest bottleneck in DeFi development

CZ Memoir Released: Reveals a Large Amount of Industry Insider Information, Prompting Intense Rebuttal from Xu Mingxing

a16z: After securities are on the blockchain, why will intermediary institutions be replaced by code?

XRP Tokyo Is Here: What We Learn and What’s Next for XRP Price
Key Takeaways: Ripple’s 2025 XRP Tokyo event highlights a projected $33 trillion on-chain stablecoin volume by 2026. Significant…

Solana’s Future: Navigating the $285M Hack, Rug Pulls, and Milei Libra Scandal
Key Takeaways: Multiple Crises: Solana faces a $285 million hack, allegations of rug pulls, and the Milei Libra…

BTC USD Faces Tension: Markets React to Trump’s Dire Warning
Key Takeaways: Bitcoin’s price drops sharply below $70,000 amid geopolitical tensions, playing off Trump’s dramatic 8 PM ultimatum…

Bitcoin Price Surge: Ceasefire Sparks Optimism Hits $71K
Key Takeaways: After the US-Iran ceasefire announcement, Bitcoin surged beyond $71,000, marking its highest in a month. A…

Ethereum Price Forecast: Record $180 Billion Stablecoin Supply Marks Buyers’ Return
Key Takeaways: Ethereum’s stablecoin supply has surged to a record $180 billion, marking a 150% increase over the…

Emerging Evidence Links Argentina’s Milei to LIBRA Crypto Scandal
Key Takeaways: Evidence unveiled by Argentina’s federal prosecutors links President Javier Milei to the LIBRA token through call…

US Spot Bitcoin ETFs See Surge as BTC Nears $70K; LiquidChain and Layer-3 DeFi Rise
Key Takeaways: U.S. spot Bitcoin ETFs absorbed $471 million in a single day, moving BTC closer to the…

Bitcoin Price Prediction: Decoupling from Tech Stocks, Shaped by Geopolitics and AI Turmoil
Key Takeaways: Bitcoin is decoupling from tech stocks as geopolitical tensions and AI crises reshape the market, currently…

Chaos Labs Departure Leaves Aave Without Risk Management Amidst Governance Conflict
Key Takeaways: Aave, with a $50 billion TVL, is currently operating without a risk manager due to Chaos…

Grayscale Ethereum ETF Staking: A New Catalyst for $5,700?
Key Takeaways: Grayscale’s Ethereum Staking ETF introduces a yield-bearing structure that could significantly reshape investor sentiment. Ethereum’s price…

Polygon Crypto Enhances Finality Through the Giugliano Hardfork
Key Takeaways: Polygon’s Giugliano hardfork is operational on the mainnet, effectively reducing transaction finality by 2 seconds. The…

Senate’s Three-Week Deadline: Ripple XRP and the CLARITY Act’s Critical Moment
Key Takeaways: The Senate Banking Committee’s decision on the CLARITY Act in late April could define XRP’s future…
