Saturday, December 31, 2011

A 3-way join that touches only indexes

Can an execution of 3-way join use only indexes and not touch table rows at all? If we have MyISAM tables it's just impossible. Yet with InnoDB tables it would be possible if we could exploit so called extended keys – the regular secondary keys extended by the components of the primary key. The fact is the InnoDB engine works fine if you pass a key extended by primary key components, and, it uses the key to the full length without trimming it up to the base key fields. In the result we have a more narrow search and numerous obvious benefits from it.

Let's see how extended keys could be employed by execution for the following query built over a DBT-3/TPC-H database with one added index defined on p_retailprice.
 select o_orderkey
from part, lineitem, orders
where p_retailprice > 2095 and o_orderdate='1992-07-01'
and o_orderkey=l_orderkey and p_partkey=l_partkey;

(The query asks for orderkeys of the orders on 1992-07-01 that ordered parts with retail price greater than 2095.)

The query could be executed by the following execution plan:
  1. Scan the entries of the index i_p_retailprice where p_retailprice>2095 and read p_partkey values from the extended keys.
  2. For each value p_partkey make an index look-up into the table lineitem employing index i_l_partkey and fetch the values of l_orderkey from the extended index.
  3. For each fetched value of l_orderkey append it to the date '1992-07-01' and use the resulted key for a index look-up by index i_o_orderdate to fetch the values of o_orderkey from the found index entries.
You can see that all access methods of this plan do not touch table rows.

What prevents the current MariaDB/MySQL optimizer from choosing such an apparently efficient plan? A trivial shortcoming of the optimizer: it does not consider extended keys when looking for possible index accesses to join a table.

Quite surprisingly this defect attracted my attention when I investigated the efficiency of index condition push-down (that, btw, exploits extended keys to the full measure) at the latest MySQL UC. Since it did not seem too difficult to fix this problem I decided to do it as soon as I came back from the conference. Indeed, it took me less than a week to produce a working variant that made the join optimizer, the range optimizer and min/max optimizations to be aware of extended keys. The implementation was fast and robust, but rather cumbersome since it used iterator classes to look through parts of the extended keys. It required quite a few changes in the server code.

Then we, at MP, became extremely busy with the first MariaDB 5.3 beta release. So it was only this fall that I managed to find some time for an alternative implementation. The new implentation just expanded the key definitions with additional key parts when filling the TABLE_SHARE structures by the info read from frm files. It allowed to keep the changes in the optimizer code minimal.

You can see this implementation in this tree on Launchpad. The patch was applied to the latest MariaDB 5.3 build. Yet, with a minor modifications it could be easily applied to any of the MySQL/MariaDB/PerconaServer or even Drizzle releases. When experimenting with the tree from Launchpad bear in mind that the optimizer switch must have the flag 'extended_keys' set to 'on' to enable the feature.

Were other people in the MySQL community also annoyed with the deficiency of the MySQL optimizer fixed by the patch? Yes, yes. See for example Domas's blog . So I expect quite a lot of interest towards the published patch. The patch has all chances to appear pretty soon in the first beta release of MariaDB 5.5 that is MariaDB 5.3.3-rc merged with the latest release of MySQL 5.5.

To intrigue you more I copy the EXPLAIN output returned by the patch for the above query:
MariaDB [dbt3sf10]> explain
-> select o_orderkey
-> from part, lineitem, orders
-> where p_retailprice > 2095 and o_orderdate='1992-07-01'
-> and o_orderkey=l_orderkey and p_partkey=l_partkey\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: part
type: range
possible_keys: PRIMARY,i_p_retailprice
key: i_p_retailprice
key_len: 9
ref: NULL
rows: 100
Extra: Using where; Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: lineitem
type: ref
possible_keys: PRIMARY,i_l_suppkey_partkey,i_l_partkey,i_l_orderkey,i_l_orderkey_quantity
key: i_l_partkey
key_len: 5
ref: dbt3sf10.part.p_partkey
rows: 15
Extra: Using index
*************************** 3. row ***************************
id: 1
select_type: SIMPLE
table: orders
type: ref
possible_keys: PRIMARY,i_o_orderdate
key: i_o_orderdate
key_len: 8
ref: const,dbt3sf10.lineitem.l_orderkey
rows: 1
Extra: Using index
3 rows in set (0.00 sec)