Tenant Isolation with CockroachDB
How to isolate tenant data and SQL processing to specific nodes
A few of my customers require the ability to isolate a busy tenant from other tenants and I proposed using CockroachDB locality settings to achieve this goal while maintaining cross-regional data protection to ensure an always-on service in the face of a node or regional and datacenter failures.
I will use multi-tenancy approaches within a schema and with databases, then use the CockroachDB locality and zone configuration features to isolate data and workloads and observe the impact of heavy workloads.
First, let’s look at the cluster, we are deployed across 3 regions on Google with 4 nodes in each region where I have tagged those nodes with a locality setting. Although I have assigned one node per tenant identifier, I could have added more nodes per region per tenant id.
The goal is to end up with tenant data placed on specific nodes for data isolation and have SQL clients connect to those nodes and access data without impacting other tenants on other nodes. For example, the tenants for TPC-C warehouses 0 to 33 will only use CPU cycles on Nodes N1,N5 and N9.
Using TPC-C schema, I will treat the warehouse identifier as the pivot point to distinguish tenants using a range partition strategy.
Notice each entity has a %_w_id which makes it easy to partition across all entities except for item which will be treated differently. The following commands partition the tables and indexes based on ranges of warehouse identifiers.
In CockroachDB, each data structure is sorted by the key and then sharded and distributed. Using a partition column and zone configuration, we can further control how rows are stored across shards or more precisely ranges.
Inspecting the customer DDL, each table and index is an IOT sorted by warehouse id, district id and customer id. It also has a secondary index sorted by warehouse id, district id, customer last name and first name. Typically these 2 structures will be split based on range size. By using partitioning and zone zone configs, the ranges will also be split by partition id.
Now let’s inspect how the customer table is stored using the show ranges command. Notice all rows with keys starting with /0/.. to /33 are stored on nodes with a locality tenant=A, and keys with /33 to /66 are stored on nodes with a locality tenant=B and so on. We have isolated 3 tenants in the TPC-C schema to just nodes A, B and C across all 3 regions in my cluster. So, I have isolated tenants data to nodes.
Next, let’s run workloads to these nodes starting with Tenants on Nodes designated to Tenant A. Using a client in the east1 region, I have launched the TPC-C workload for warehouses 0 to 33.
Notice the Utilization indicates the the nodes which hold all the ranges and replicas for for tenant A are used N1, N5 and N9.
Now let’s look at the impact when we have multiple tenants running workloads from a client perspective. Set data placement for all other tenants to Tenant D and set lease preferences.
lets observe running a banking workload Tenant D.
Starting Workload for Tenant A, we can see below the banking application latency is unchanged while an aggressive workload is running on nodes n1,n5 and n8.
So, We are able to isolate tenant workloads by using the following approach
- Segregation of data with table / index partitioning, schema [ not used in this writeup] or databases
- node locality settings to identify nodes
- Zone configurations to map data to nodes
- JDBC connection strings to specific nodes from application
Now, we have method for tenant isolation. Some other knobs which can be used to limit the impact of workloads are the following
Thanks