part 1: query performance
I often find myself explaining the negative effects of running a database on a system at high utilization and the usual response is surprising which has something to do with getting a return on investment at 100%. It is short sighted as the impact has many costly ramifications such as the following
- Higher Utilization in most cases may overwork the CPU(s) and increase the CPU response time or Latency which in turns impacts performance of the kernel and user applications such as database queries.
- High utilization across many nodes, will be less responsive if failure occurs as few resources will be available for the workload and after resources became available they will have to be used to recover data before returning to a state of servicing all user requests
- High utilization indicates poor sizing or capacity planning
- Creates issues when troubleshooting performance
In this article, I would like to dive into the first as it pertains to cockroachdb and query performance. Users submitting query requests to the database will be serviced by a cpu and higher number of requests will keep the cpu busy. From a time perspective and on simplified configuration of a single cpu, if one request takes 1ms to be serviced another parallel request will have to wait an additional 1ms for a turn to be serviced. Now let’s look at a real world example where I have removed situations where other waits occur such as latency due to cpu execution, cpu latency and network latency.
My Server has 2 cpu or 4 execution threads so will be able to handle more concurrent requests at a time — all other things such as IO, memory and network are sufficient and should not pose a bottleneck creating waits.
I am running a cockroach database on the server with PID = 3942
The system has no other workload running
Nothing on the run queue which is indicated under the procs header r and the system is pretty much 100% idle as no user requests such as SQL is submitted. Only cockroachdb background jobs running to maintain the database.
The following schema and query will be used to submit requests to the database.
Each inquiry requests all session data for a particular session id and the database very efficiently uses an index to find the row. The typical response of this query on an idle system is 1.4 ms from client to database and back and the time slice on the database is usually 200–300 microseconds. Running this workload with a concurrency of 1 results in an average execution of 1.4ms and a p95 of 1.8 and p99 of 1.9 with a pMax of 32.5.
209k executions over 300s where each inquiry ranges across 5000 possible sessions. Our CockroachDB UI displays the database time for this query is 224 microseconds on average.
Over the period of 300s we can see the sustained throughput 700 queries per second with a system utilization of of 25% with SQL latency of 224.7 µs.
And vmstat indicates the same utilization where the run queue is below the available core capacity and there are no blocked processes.
During the workload, the system performed well as it handled all inquiries without any unexpected waits. If we inspect schedstat we can determine cpu responsiveness during the workload and after it has completed at 12:28:17 which indicates little change in response time. The chart below indicates cpu% and cpu latency.
Since the system was 25% busy at 690 TPS, one could conclude that this system could support 2800 TPS. Well let us test the theory and scale from 1 user to 4 concurrent users. We see the TPS reaches around 2000 and the latency p99 observed before client is 442 microseconds [ µs ].
As the test scales up to 4 concurrent users, we an see the util% increasing with a run queue steadily increasing to > 4 which matches the number of cores.
And we can see higher latency in the cpu at various stages
But the result is only 2059 TPS with higher response times where the p99 borders 490 microseconds. I have confirmed plans have not changed overtime so we are consistently using the same path to data. Let’s now take a look at the running workload scaling to 16 concurrent users.
As the workload scales up with more concurrent users and connections we see that at a certain point the system starts to respond with more latency due to cpu utilization. Essentially the cores can handle around 12–16 concurrent requests at a time well. Beyond that additional requests will be queued waiting for their turn on the cpu and the SQL latency will raise. So, at 25% CPU utilization the average SQL response was 1.4 ms and the result of scaling cpu utilization to a peak of 80% service latency multiplied to 10–13x. Next time, I will discuss how this utilization impacts availability.