Deep Dive into FinTech Use Case
MR settings with failure testing
Based on my previous article on trade processing, this blog will explore in detail the workload using CockroachDB MR features with some fault injection analysis. The goal is to explore how regional survivability can be achieved for a multi-regional application where reads and writes are performant in a normal and failure state without loss of data.
let us look at the schema and workload below
pretty simple schema, where root orders are added and executed with a trail of order execution fulfillment stored in activities. Although this is a simple representation to a real world trade processing workload, it is core transactional behavior and a good starting point for this illustration.
The above illustration represents a CockroachDB stretch cluster spanning google regions east1, east4 and central1 with active clients executing trades in east1 and east4 where the clients typically execute orders in the same region as the Cockroach database nodes. The order execution, order header and history details will have the lease holders for the rows associated with their home region. By setting the region by row affinity explicitly, hence setting the crdb_region for each table in the schema, this will ensure low latency reads and writes and set the rows proper home region even when home region db nodes and/or applications might not be available. So when things resume normal operations and home regions are accessible to clients they will find their trade local to their region for optimal read and write latencies.
Now let us look at the application
In each region trades are streamed into applications which send 3 or more transactions for an order lifecycle where step 1 is inserting a trade with 2 activity rows and steps 2 and 3 are executions for the order with logging activity for the parent order. Each trade can have multiple executions to fulfill an order. Notice in the schema above include the crdb_region which sets the home region for the row and this is also used in the partition of the table — we will come back to this. Any order originated in the region will have a parent child relationship through the schema with all rows belonging to the home region which ensures low latency reads and writes.
In this application, order id is prefixed with a region identifier which is used by the database to set the home region for the row and avoid checking the existence of the row in another region enabling a fast write and later reads.
Notice the DDL of the tables where we have explicitly set home region values based on the prefix found in the order id.
Let’s examine the Database settings
View the orders table, we can see Cockroach based on regional by row setting, automatically partitioned by crdb_region column
We can see for the order@primary index which is the IOT, that the table is partitioned by 3 values ‘us-east1’, ‘us-east4’ and ‘us-central1’, and also all the indexes are partitioned similarly. We can also see the range values in each partition and where they are located
Any rows with an ‘E1’ prefix in order id, has a home region in us-east1 and ‘E2’ has a home region in us-east4. The following is the layout of the orders index on date
Now Let’s fail a region and observe behavior based on the default Regional by row settings
Failing region us-east1, with the above default zone configuration settings for a partition
versus
Results in the following after nodes in east1 are down
The first configuration has a slightly undesirable effect of placing leases in us-central1 where It is never expected that any clients run workloads.
While the second configuration has an optimal effect of placing leases which were in us-east1 to us-east4 where It is expected that additional clients from the east1 location will be routed to the application servers in us-east4 to process orders while achieve low latency reads and writes.
In either case, failing back does have the desired result of remastering us-east1 orders to us-east1 along with any use-east1 orders processed in us-east4 during failure.