Query Performance: Joins and Indexes
Using the Join Order Benchmark
Query performance is impacted by a great many things. For complex queries, Join ordering is one of the main contributors to performance. Ideally, a plan should start with the join that eliminates the most data to minimize the amount of data carried into the following joins. When a join that produces a lot of data is performed early in the execution, then subsequent stages will need to process large amounts of data for longer than necessary, increasing the time and resources needed for the query. A Cost Based optimizer will evaluate the cost of ordering by examining cardinality estimates which are part of statistics and access paths and produce the lowest cost plan for optimal performance.
Just a reminder, join ordering is the order determined either by a Cost Based Optimizer or using rule based ordering in which the tables are joined together in a multi-table SQL statement. In this Blog I would take a short tour of the Cockroach Cost based optimizations for Joins and provide some examples of how well it works for Complex Joins. CockroachDB introduced the cost based optimizer in 2018, which unlocked all query surface areas and since the optimizer has matured to compute efficient query plans.
The optimizer initially transforms a SQL statement into a normalized form removing any redundancies in the SQL statement and performs rule-based transformations to improve performance. This process accounts for the data distribution of all the tables determining the correct path to data.
Next the CBO proceeds into an expansion and ranking phase leading to an initial plan. During these phases the plan evolves into alternative paths to data including indexes and join methods where each path is represented with cost using column statistics. This leads to the plan with the lowest cost as the final query plan which leads to a physical plan phase and is finalized to a distributed query execution. The below flow represents the phases which the optimizer progresses to a final execution plan.
The following illustrates the process in more detail and you can refer to the sql_query_planning
Rfc for more details.
Each one of these phases can be explored using explain commands. Adding additional parameters such as VERBOSE, OPT and DISTSQL can be used to highlight the above phases. Next up, we will explore the maturity of the Optimizer by exploring the Join Order Benchmark which consists of 113 analytic queries which consist of 3–16 joins averaging 8 across a schema of 21 tables. After loading the tables, the below table details the tables and the row counts.
Executing the 113 queries seirally, the total end to end time is 1487 seconds end to end. The most rows processed by any query is 43.8 million scanned with the averaging over 2 million rows scanned per execution.
The above graph shows the default execution times of queries. Running some queries in parallel illustrates the capability of the cluster ( 3 nodes @ 4 vcpu, 16 GB ram and local SSD) capability of handling a workload which is roughly .3 QPS and we can see the reason is due to CPU overhead. It is easy to conclude that these queries may not be optimal.
The selected queries are 10a (7J), 13c(9J), 16d(8J), 19c(10J), 22a(11J), 26a(12J), 2a(5J), 33a(14J) ,6a(5J) and 8d(7J).
Let’s look at 2 of the large queries, 22a
SELECT MIN(cn.name) AS movie_company,
MIN(mi_idx.info) AS rating,
MIN(t.title) AS western_violent_movie
FROM company_name AS cn,
company_type AS ct,
info_type AS it1,
info_type AS it2,
keyword AS k,
kind_type AS kt,
movie_companies AS mc,
movie_info AS mi,
movie_info_idx AS mi_idx,
movie_keyword AS mk,
title AS t
WHERE cn.country_code != ‘[us]’
AND it1.info = ‘countries’
AND it2.info = ‘rating’
AND k.keyword IN (‘murder’,
‘murder-in-title’,
‘blood’,
‘violence’)
AND kt.kind IN (‘movie’,
‘episode’)
AND mc.note NOT LIKE ‘%(USA)%’
AND mc.note LIKE ‘%(200%)%’
AND mi.info IN (‘Germany’,
‘German’,
‘USA’,
‘American’)
AND mi_idx.info < ‘7.0’
AND t.production_year > 2008
AND kt.id = t.kind_id
AND t.id = mi.movie_id
AND t.id = mk.movie_id
AND t.id = mi_idx.movie_id
AND t.id = mc.movie_id
AND mk.movie_id = mi.movie_id
AND mk.movie_id = mi_idx.movie_id
AND mk.movie_id = mc.movie_id
AND mi.movie_id = mi_idx.movie_id
AND mi.movie_id = mc.movie_id
AND mc.movie_id = mi_idx.movie_id
AND k.id = mk.keyword_id
AND it1.id = mi.info_type_id
AND it2.id = mi_idx.info_type_id
AND ct.id = mc.company_type_id
AND cn.id = mc.company_id;
With the following plan which has an execution time of 20.386.
After examining this plan, adding a few indexes to remove full table scans with predicate apply phases should reduce the amount of data scan and possibly reorder the joins for this query. The next Query 26a
SELECT MIN(chn.name) AS character_name,
MIN(mi_idx.info) AS rating,
MIN(n.name) AS playing_actor,
MIN(t.title) AS complete_hero_movie
FROM complete_cast AS cc,
comp_cast_type AS cct1,
comp_cast_type AS cct2,
char_name AS chn,
cast_info AS ci,
info_type AS it2,
keyword AS k,
kind_type AS kt,
movie_info_idx AS mi_idx,
movie_keyword AS mk,
name AS n,
title AS t
WHERE cct1.kind = ‘cast’
AND cct2.kind LIKE ‘%complete%’
AND chn.name IS NOT NULL
AND (chn.name LIKE ‘%man%’
OR chn.name LIKE ‘%Man%’)
AND it2.info = ‘rating’
AND k.keyword IN (‘superhero’,
‘marvel-comics’,
‘based-on-comic’,
‘tv-special’,
‘fight’,
‘violence’,
‘magnet’,
‘web’,
‘claw’,
‘laser’)
AND kt.kind = ‘movie’
AND mi_idx.info > ‘7.0’
AND t.production_year > 2000
AND kt.id = t.kind_id
AND t.id = mk.movie_id
AND t.id = ci.movie_id
AND t.id = cc.movie_id
AND t.id = mi_idx.movie_id
AND mk.movie_id = ci.movie_id
AND mk.movie_id = cc.movie_id
AND mk.movie_id = mi_idx.movie_id
AND ci.movie_id = cc.movie_id
AND ci.movie_id = mi_idx.movie_id
AND cc.movie_id = mi_idx.movie_id
AND chn.id = ci.person_role_id
AND n.id = ci.person_id
AND k.id = mk.keyword_id
AND cct1.id = cc.subject_id
AND cct2.id = cc.status_id
AND it2.id = mi_idx.info_type_id;
Also notice, this query plan can be optimized with some indexes.
After reviewing the entire schema and reviewing 113 plans, we can find the following indexes should help reduce the data scans and should reorder queries to be more efficient.
The Method I used to perform this index analysis was the following [roughly 10 minutes]
- Ran each query sequentially, with explain plans and again with execution times. Which was recorded in the above graph.
- Since Cockroach adds index recommendations in Explains or using any of the UI recommendations from the insights page or table page (found below).
- Sorted all index recommendations with sort -U and reviewed each one applying my knowledge of the queries and schemas.
- Add the final index recommendations from above.
- Rerun Step 1.
Although I did not use the DBconsole recommendations introduced in V22.1 and V22.2, below are the insights page which can be used for isolated query tuning paths. General, index recommendations are gathered and displayed on the Schema insights page.
The following is an example of an index recommendation.
After applying all index changes, CockroachDB engine will add new statistics to the tables which have schema changes. After running all the queries and iterating on a few more schema changes and query rewriting we have the following results which display the blue baseline without indexes compared to the latest changes in orange and finally gray.
Proceeding to run the same concurrent workload, we can see the system is able to handle more Queries per second
With a better utilization of the cluster.
Of the 113 queries, I found only 2 queries required rewriting to get proper join ordering. When rewriting queries, I suggest moving to ansi joins and applying predicates only after the Where clause. Query 16a was rewritten from
SELECT MIN(an.name) AS cool_actor_pseudonym,
MIN(t.title) AS series_named_after_char
FROM aka_name AS an,
cast_info AS ci,
company_name AS cn,
keyword AS k,
movie_companies AS mc,
movie_keyword AS mk,
name AS n,
title AS t
WHERE cn.country_code =’[us]’
AND k.keyword =’character-name-in-title’
AND t.episode_nr >= 50
AND t.episode_nr < 100
AND an.person_id = n.id
AND n.id = ci.person_id
AND ci.movie_id = t.id
AND t.id = mk.movie_id
AND mk.keyword_id = k.id
AND t.id = mc.movie_id
AND mc.company_id = cn.id
AND an.person_id = ci.person_id
AND ci.movie_id = mc.movie_id
AND ci.movie_id = mk.movie_id
AND mc.movie_id = mk.movie_id
To
SELECT MIN(an.name) AS cool_actor_pseudonym,
MIN(t.title) AS series_named_after_char
from
keyword k join movie_keyword mk on mk.keyword_id = k.id
join title t on t.id = mk.movie_id
join movie_companies mc on mc.movie_id = mk.movie_id
join company_name cn on mc.company_id = cn.id
join cast_info ci on ci.movie_id = mc.movie_id
inner hash join name AS n on n.id = ci.person_id
inner hash join aka_name AS an on an.person_id = n.id
where
k.keyword =’character-name-in-title’ and cn.country_code =’[us]’
AND ci.movie_id = mk.movie_id
AND an.person_id = ci.person_id
AND ci.movie_id = mc.movie_id
AND ci.movie_id = t.id
AND t.id = mc.movie_id
Which resulted in correct join ordering reducing the query from 15 seconds to 2 seconds. Let’s Wrap up, after schema index optimizations and a few query rewrites to force proper join ordering the total end to end time to process all 113 queries reduced from 1487 seconds to 570 seconds. Also, concurrency tests after optimization increased from .3 QPS to 15 QPS for a cluster of 3 nodes with a total of 6 cores. All of this was achieved by simply optimizing the schema to support the 113 queries, gathering statistics and adding some query rewrites with ansi joins and a few examples of using proper hints to help the optimizer pick proper join orders…