Cross regional latency impact on Transactions with CockroachDB
Cockroachdb ensures data consistency across regions by distributing copies of data across regions and writes are propagated to a quorum of the copies before a commit is acknowledged to a client. In action this is referred to as consensus which is based on the Raft consensus protocol. Among the copies, one copy coordinates the Raft operations for read and writes and depending on the location of the consensus leader to the client, the time to achieve quorum before the acknowledgment to the client will incur additional latency. In this writeup, we will explore some possible read and write paths from a client to understand transaction latency. Details of quorum reads and writes can be found here.
Below is a representation of node to node round trip latency across 3 regions in east1, east4 and central1 in GCE.
Below is the latency of read and write single key transactions across regions depending on the client initiating to a local node in their region where the key range leader/leaseholder range is either in or out of the same region.
The transaction duration from a client in east1 to read a leader in central1 is 33 ms where the single read is 1ms of the total time and the remainder is the round trip network time. Reads are serviced by the lease holder without a consensus across replicas and node/region traversal is easy to understand. The transaction duration from a client in East4 to write data to a leader in east1 is 53 ms which includes the network round trip of the east4 to east1 leader and the write and network time from the leader in east1 to reach quorum writes across the replicas. The price to maintain consistency across regions is the network latency across regions to maintain consensus across replicas. The benefit of the latency delivers a recovery point objective of zero or a guaranteed consistent read after write even in the event of failure. Cockroach has add features like pipelining and parallel commits to reduce this network latency for transactions which access data over many ranges in parallel when possible.