Troubleshooting Hot Ranges CockroachDB
Sometimes on a CockroachDB cluster, You may find a hot node where the node has a set of busy ranges or one very busy range creating a workload imbalance such as seen below.
On node 3, (n3) is servicing 12,360 requests to lease holders on the node consistently over the few last hours. We can see the node is CPU bound, recalling my article on CPU utilization this is not ideal for performance.
The CockroachDB Console publishes a hot ranges report which comes in handy to help pinpoint busy ranges. However, on a large cluster, this can be a little cumbersome.
What is more interesting are the top hot ranges which can be inspected by writing a script to parse the _status/raft page under the debug page of the CockroachDB console.
Using the script from the repo, we can find the following:
Hot Ranges
Hot Nodes
We can see from above, the busiest ranges on node 3 are in the following order
1, 8360
5, 3767
30, 7519
33, 3764
42, 7558
Now, let’s find out what table, ranges and keys are being hit the hardest using
WITH
cte
AS (
SELECT
*
FROM
(
VALUES
(1, 8360),
(5, 3767),
(30, 7519),
(33, 3764),
(42, 7558)
)
AS a (rank, rangeid)
),
ranges
AS (SELECT * FROM [SHOW RANGES FROM DATABASE DB])
SELECT
*
FROM
ranges, cte
WHERE
cte.rangeid = ranges.range_id
ORDER BY
cte.rank;
We can also look further to see if this is a primary key or secondary indexes by querying crdb_internal.ranges_no_leases
I had to update the last query to add range 8580 which became the new top range due to 8360 splitting from load. Range 8580 is index activities_activity_time_idx which is an index of activity time /Table/55/6/2022–03–21T01:41:35.000361Z/”.. which is basically adding data to one range instead of across all the indexes ranges. We can fix this by changing the index to a hash shard method.
I wrote this article to illustrate the process of how to find hot ranges by exploring Console endpoints and some SQL statements with handy commands such as show ranges and crdb_internal tables. Fairly soon CockroachDB will add a hot ranges visual page to the dbconsole!
adding a single command to get the above details easier !!
export mydb=defaultdb;export list=` python hotR.py ranges | head -25 | sed -e ‘s/://g’ | awk ‘{print “(“$1”,”$2")”}’| xargs| sed -e ‘s/ /,/g’` ; echo “WITH cte AS ( SELECT * FROM ( VALUES “ $list “ ) AS a (rank, rangeid)), ranges AS (SELECT * FROM [SHOW RANGES FROM DATABASE “$mydb”]) SELECT * FROM ranges, cte WHERE cte.rangeid = ranges.range_id ORDER BY cte.rank;”>with.sql;cockroach sql — insecure < with.sql
table_name | start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities | rank | rangeid
— — — — — — — — — — — — -+ — — — — — — — — — — — -+ — — — — — — — — — — — -+ — — — — — + — — — — — — — -+ — — — — — — — + — — — — — — — — — — — — — — — — — — -+ — — — — — + — — — — — — — — — — — — — — — — — — — — -+ — — — + — — — — —
equipment | NULL | /”\x14j`”/”s\t0\x1e6" | 38 | 325.247547 | 1 | cloud=local,region=local,zone=local | {1} | {“cloud=local,region=local,zone=local”} | 7 | 38
equipment | /”6RM%a.f”/”” | /”[5#/.P;”/”e/8" | 39 | 370.841418 | 1 | cloud=local,region=local,zone=local | {1} | {“cloud=local,region=local,zone=local”} | 10 | 39
equipment| /”\x14j`”/”s\t0\x1e6" | /”6RM%a.f”/”” | 41 | 362.657327 | 1 | cloud=local,region=local,zone=local | {1} | {“cloud=local,region=local,zone=local”} | 19 | 41