When and how to use query parallelism with CockroachDB.
Statement parallelism is a familiar concept with most monolithic databases used to speed up operations which act on a large set of data. For example, analytic queries and ETL/batch jobs transforming hundreds of thousands of rows. Today, CockroachDB does not support a framework where a user can control the degree of statement parallel execution but rather supports an auto parallelism method under the distributed process engine. This works well for statements working with smaller sets of data. In this story, I will review using an old method of user controlled parallelism across statements over the same data versus a single statement as an alternative method for handling bulk updates. CockroachDB has enhancements to improve read performance such as Distributed and Vectorized query execution. Also note, statement parallelism is different from transactional parallelism which is another performance enhancement.
Rules to follow when running parallel DML statements with a distributed database operating at a serializable isolation level.
- Use parallelism when the row count mutations exceed 250 to 500k rows on an object with multiple indexes. If the object has no secondary indexes, the threshold can be 1 MM rows or above.
- Parallel statements should not be used for small row transformations.
- Parallelism requires sufficient CPU, Memory and IO capabilities to perform well. Running 20 threads on a 4 VCPU cluster will have diminishing returns and impact normal intraday workloads.
- Run batches which require parallel processing during maintenance windows and windows of low system usage.
- Set based IAS [INSERT INTO SELECT], UPDATES, UPSERTS and DELETES should be short transactions with parallel statements operating across disjointed keys.
- Use implicit transactions when possible.
- With delete/archive operations, use arrays to load the keys which you plan to modify in a read only transaction which will help to reduce contention and build parallel DML with each thread operating on disjointed keys as a separate transaction.
- DML, which requires atomicity across a few objects, should use set based processing such as CTE with DML.
Let’s take a look at an example of updating a history table comparing a single statement of 4 MM rows versus 7 MM. The update looks like
And a UI view of the update plan
And it completes under 132 seconds.
Notice the execution history indicates the statement time for one execution without any retries. Retries are perhaps one of the most recurring issues with long running transactions followed by low parallelization.
Now let’s look at updating a larger data set, 7MM plus rows.
The statement duration is exceeding a few boundaries such as TransactionRetryWithProtoRefreshError for various reasons which basically is related to contention or the transaction has run too long. Looking at trace we see
TransactionRetryError: retry txn (RETRY_COMMIT_DEADLINE_EXCEEDED — txn timestamp pushed too much; deadline exceeded by 11m49.648774902s
In this case, it seems related to hitting the close timestamp setting in CockroachDB and even fixing this issue would still suffer from a lack of improved parallelization for this amount of data. Notice the number of retries below indicate 5 retries.
Following all the rules I specified above, running 4 parallel updates over disjointed keys, we were able to complete updating 7MM rows in 2 minutes 10 seconds which was roughly the same amount of time as the smaller 4MM row update.
In conclusion, if you want to achieve better throughput for batch processsing on CockroachDB beyond the default auto parallelism following the best practices above you can achieve higher throughput data transformation without having to deal with retries or long running slower single statement processing.