Query Performance: Joins and Indexes

Andrew Deally
8 min readNov 27, 2022

Using the Join Order Benchmark

Query performance is impacted by a great many things. For complex queries, Join ordering is one of the main contributors to performance. Ideally, a plan should start with the join that eliminates the most data to minimize the amount of data carried into the following joins. When a join that produces a lot of data is performed early in the execution, then subsequent stages will need to process large amounts of data for longer than necessary, increasing the time and resources needed for the query. A Cost Based optimizer will evaluate the cost of ordering by examining cardinality estimates which are part of statistics and access paths and produce the lowest cost plan for optimal performance.

Just a reminder, join ordering is the order determined either by a Cost Based Optimizer or using rule based ordering in which the tables are joined together in a multi-table SQL statement. In this Blog I would take a short tour of the Cockroach Cost based optimizations for Joins and provide some examples of how well it works for Complex Joins. CockroachDB introduced the cost based optimizer in 2018, which unlocked all query surface areas and since the optimizer has matured to compute efficient query plans.

The optimizer initially transforms a SQL statement into a normalized form removing any redundancies in the SQL statement and performs rule-based transformations to improve performance. This process accounts for the data distribution of all the tables determining the correct path to data.

Next the CBO proceeds into an expansion and ranking phase leading to an initial plan. During these phases the plan evolves into alternative paths to data including indexes and join methods where each path is represented with cost using column statistics. This leads to the plan with the lowest cost as the final query plan which leads to a physical plan phase and is finalized to a distributed query execution. The below flow represents the phases which the optimizer progresses to a final execution plan.

The following illustrates the process in more detail and you can refer to the sql_query_planning

Rfc for more details.

Each one of these phases can be explored using explain commands. Adding additional parameters such as VERBOSE, OPT and DISTSQL can be used to highlight the above phases. Next up, we will explore the maturity of the Optimizer by exploring the Join Order Benchmark which consists of 113 analytic queries which consist of 3–16 joins averaging 8 across a schema of 21 tables. After loading the tables, the below table details the tables and the row counts.

Executing the 113 queries seirally, the total end to end time is 1487 seconds end to end. The most rows processed by any query is 43.8 million scanned with the averaging over 2 million rows scanned per execution.

The above graph shows the default execution times of queries. Running some queries in parallel illustrates the capability of the cluster ( 3 nodes @ 4 vcpu, 16 GB ram and local SSD) capability of handling a workload which is roughly .3 QPS and we can see the reason is due to CPU overhead. It is easy to conclude that these queries may not be optimal.

The selected queries are 10a (7J), 13c(9J), 16d(8J), 19c(10J), 22a(11J), 26a(12J), 2a(5J), 33a(14J) ,6a(5J) and 8d(7J).

Let’s look at 2 of the large queries, 22a

SELECT MIN(cn.name) AS movie_company,

MIN(mi_idx.info) AS rating,

MIN(t.title) AS western_violent_movie

FROM company_name AS cn,

company_type AS ct,

info_type AS it1,

info_type AS it2,

keyword AS k,

kind_type AS kt,

movie_companies AS mc,

movie_info AS mi,

movie_info_idx AS mi_idx,

movie_keyword AS mk,

title AS t

WHERE cn.country_code != ‘[us]’

AND it1.info = ‘countries’

AND it2.info = ‘rating’

AND k.keyword IN (‘murder’,

‘murder-in-title’,

‘blood’,

‘violence’)

AND kt.kind IN (‘movie’,

‘episode’)

AND mc.note NOT LIKE ‘%(USA)%’

AND mc.note LIKE ‘%(200%)%’

AND mi.info IN (‘Germany’,

‘German’,

‘USA’,

‘American’)

AND mi_idx.info < ‘7.0’

AND t.production_year > 2008

AND kt.id = t.kind_id

AND t.id = mi.movie_id

AND t.id = mk.movie_id

AND t.id = mi_idx.movie_id

AND t.id = mc.movie_id

AND mk.movie_id = mi.movie_id

AND mk.movie_id = mi_idx.movie_id

AND mk.movie_id = mc.movie_id

AND mi.movie_id = mi_idx.movie_id

AND mi.movie_id = mc.movie_id

AND mc.movie_id = mi_idx.movie_id

AND k.id = mk.keyword_id

AND it1.id = mi.info_type_id

AND it2.id = mi_idx.info_type_id

AND ct.id = mc.company_type_id

AND cn.id = mc.company_id;

With the following plan which has an execution time of 20.386.

After examining this plan, adding a few indexes to remove full table scans with predicate apply phases should reduce the amount of data scan and possibly reorder the joins for this query. The next Query 26a

SELECT MIN(chn.name) AS character_name,

MIN(mi_idx.info) AS rating,

MIN(n.name) AS playing_actor,

MIN(t.title) AS complete_hero_movie

FROM complete_cast AS cc,

comp_cast_type AS cct1,

comp_cast_type AS cct2,

char_name AS chn,

cast_info AS ci,

info_type AS it2,

keyword AS k,

kind_type AS kt,

movie_info_idx AS mi_idx,

movie_keyword AS mk,

name AS n,

title AS t

WHERE cct1.kind = ‘cast’

AND cct2.kind LIKE ‘%complete%’

AND chn.name IS NOT NULL

AND (chn.name LIKE ‘%man%’

OR chn.name LIKE ‘%Man%’)

AND it2.info = ‘rating’

AND k.keyword IN (‘superhero’,

‘marvel-comics’,

‘based-on-comic’,

‘tv-special’,

‘fight’,

‘violence’,

‘magnet’,

‘web’,

‘claw’,

‘laser’)

AND kt.kind = ‘movie’

AND mi_idx.info > ‘7.0’

AND t.production_year > 2000

AND kt.id = t.kind_id

AND t.id = mk.movie_id

AND t.id = ci.movie_id

AND t.id = cc.movie_id

AND t.id = mi_idx.movie_id

AND mk.movie_id = ci.movie_id

AND mk.movie_id = cc.movie_id

AND mk.movie_id = mi_idx.movie_id

AND ci.movie_id = cc.movie_id

AND ci.movie_id = mi_idx.movie_id

AND cc.movie_id = mi_idx.movie_id

AND chn.id = ci.person_role_id

AND n.id = ci.person_id

AND k.id = mk.keyword_id

AND cct1.id = cc.subject_id

AND cct2.id = cc.status_id

AND it2.id = mi_idx.info_type_id;

Also notice, this query plan can be optimized with some indexes.

After reviewing the entire schema and reviewing 113 plans, we can find the following indexes should help reduce the data scans and should reorder queries to be more efficient.

The Method I used to perform this index analysis was the following [roughly 10 minutes]

  1. Ran each query sequentially, with explain plans and again with execution times. Which was recorded in the above graph.
  2. Since Cockroach adds index recommendations in Explains or using any of the UI recommendations from the insights page or table page (found below).
  3. Sorted all index recommendations with sort -U and reviewed each one applying my knowledge of the queries and schemas.
  4. Add the final index recommendations from above.
  5. Rerun Step 1.

Although I did not use the DBconsole recommendations introduced in V22.1 and V22.2, below are the insights page which can be used for isolated query tuning paths. General, index recommendations are gathered and displayed on the Schema insights page.

The following is an example of an index recommendation.

After applying all index changes, CockroachDB engine will add new statistics to the tables which have schema changes. After running all the queries and iterating on a few more schema changes and query rewriting we have the following results which display the blue baseline without indexes compared to the latest changes in orange and finally gray.

Proceeding to run the same concurrent workload, we can see the system is able to handle more Queries per second

With a better utilization of the cluster.

Of the 113 queries, I found only 2 queries required rewriting to get proper join ordering. When rewriting queries, I suggest moving to ansi joins and applying predicates only after the Where clause. Query 16a was rewritten from

SELECT MIN(an.name) AS cool_actor_pseudonym,

MIN(t.title) AS series_named_after_char

FROM aka_name AS an,

cast_info AS ci,

company_name AS cn,

keyword AS k,

movie_companies AS mc,

movie_keyword AS mk,

name AS n,

title AS t

WHERE cn.country_code =’[us]’

AND k.keyword =’character-name-in-title’

AND t.episode_nr >= 50

AND t.episode_nr < 100

AND an.person_id = n.id

AND n.id = ci.person_id

AND ci.movie_id = t.id

AND t.id = mk.movie_id

AND mk.keyword_id = k.id

AND t.id = mc.movie_id

AND mc.company_id = cn.id

AND an.person_id = ci.person_id

AND ci.movie_id = mc.movie_id

AND ci.movie_id = mk.movie_id

AND mc.movie_id = mk.movie_id

To

SELECT MIN(an.name) AS cool_actor_pseudonym,

MIN(t.title) AS series_named_after_char

from

keyword k join movie_keyword mk on mk.keyword_id = k.id

join title t on t.id = mk.movie_id

join movie_companies mc on mc.movie_id = mk.movie_id

join company_name cn on mc.company_id = cn.id

join cast_info ci on ci.movie_id = mc.movie_id

inner hash join name AS n on n.id = ci.person_id

inner hash join aka_name AS an on an.person_id = n.id

where

k.keyword =’character-name-in-title’ and cn.country_code =’[us]’

AND ci.movie_id = mk.movie_id

AND an.person_id = ci.person_id

AND ci.movie_id = mc.movie_id

AND ci.movie_id = t.id

AND t.id = mc.movie_id

Which resulted in correct join ordering reducing the query from 15 seconds to 2 seconds. Let’s Wrap up, after schema index optimizations and a few query rewrites to force proper join ordering the total end to end time to process all 113 queries reduced from 1487 seconds to 570 seconds. Also, concurrency tests after optimization increased from .3 QPS to 15 QPS for a cluster of 3 nodes with a total of 6 cores. All of this was achieved by simply optimizing the schema to support the 113 queries, gathering statistics and adding some query rewrites with ansi joins and a few examples of using proper hints to help the optimizer pick proper join orders…

--

--