Tuesday, August 4, 2009

In Defense of MRR

A few days before OSCon Henrik Ingo, our newly hired COO, forwarded me a post of Mark Callaghan with the following plaint on the state of certain latest optimizations that had been introduced into the MySQL Server, namely, Multi-Range Read (MRR), Index Condition Pushdown (ICP) and Batched Key Access (BKA).

> I have seen descriptions for each of these features that describes
> them in isolation. Is there one page where they are described
> together? And if there isn't can I convince Sergey and Igor to write a
> new blog post?

Further in the post, Mark points to the slew of bugs allegedly in the implementation of ICP for the InnoDb engine that forced managers at Sun/Oracle to ask Sergey Petrunia, the author of the code, to turn the feature off , though for this engine only. The patch was submitted and the next alpha release of the Azalea version will appear with this optimization disabled. At the same time Sergey was asked to disable the implementation of the MRR interface for InnoDB too.
Why? Because at Sun they were not sure who the culprit was. They had a choir of unhappy customers (reported bugs #36981, #34591, #35080, #37415, #34590, #37208, #40992, #42580, #43617), some of whom blamed ICP, some – MRR, and some - both. Taking into account a subdued grumble of some undisputed authorities who first planned to port the BKA code into their product and then put the project on hold due to a mere fact that the MRR code was said to be “utterly crappy”, what would you do if you were responsible for making decisions? Right. Condemn both suspects to minimize the consequences of a mistake. None of them is worth a single tear of our customers. What? Are they are not even of the same kind? This makes them even more guilty. This makes them an organized gang. You are still not sure they are desperate criminals? Open the bug reports, read them. Do they make you cry too? Are you convinced at last?

Anyway, the verdict runs as follows:

[25 Jun 4:00] Paul DuBois Noted in 5.4.4 changelog.
The Multi-Range Read access method does not work reliably for InnoDB
and has been disabled for InnoDB tables.

End of the story. Justice reigns. The villains deserve capital punishment, but we are civilized people and we believe that properly applied corrective measures have a good chance of improving any scoundrel. As for now a full isolation of the convicts would be not only in the interest of good folks, but in the interest of the poor creatures as well.
At last all of us can exhale with a relief and rejoice the life where there are no bugs just because everything that could attract them has been mortified.

I used to know MRR for InndDb when he was a just a child. There was nothing wrong in how he was raising up. He was brought up together with his siblings, first with MRR for MyISAM, MRR for NDB Cluster, later with MRR for Falcon. That's true that they have different biological mothers, but all these children were conceived by the same father. Moreover the implementations of the MRR interface for MyISAM and for InnoDB were based on the same code and were engine independent. If there had been any genetic defect in the code it would have manifested itself for MyISAM as well.
The implementation of the MRR interface for MyISAM/InnoDb employs other handler functions such as index_read and rnd_pos. Either there are some subtleties or side effects in the implementations of these functions for InnoDB, or the accusations are false.

So I decided to spend some of my time to investigate the case on my own. First I looked through the bug reports.
The reports for bugs #36981, #34591, #35080, #37415 say that with the settings:

set optimizer_use_mrr='disable';
set engine_condition_pushdown=off;

no crashes are observed for the reported cases.

The report for bug #34590 is certain that the setting

set engine_condition_pushdown=off;

is enough to prevent crashing.

The analysis made by Paul DuBois for bug #40992 leads to the conclusion that only settings for engine_condition_pushdown matter when trying to cause a crash. It's worth noting that the test case for this bug resembles pretty much the one reported by Shane Bester for bug #36981.
The reports for bugs #37208, #42580, #43617 complain about wrong results when engine_condition_pushdown is 'on'. Once more the report for #42580 says that the problem appears when both optimizer_use_mrr and engine_condition_pushdown are set to default values.

After I had read all these reports, it became clear for me that there were no direct evidences of MRR's bad behavior, at least from the cases that were submitted.
To check that the MRR code had nothing to do the reported problems, I built the server from the latest mysql-azalea tree, reverted the patch from bug #45029, and ran the test cases with engine_condition_pushdown set to 'off'. All the test cases passed through without crashes and with correct results. (Of course I could run only those test cases that I was able to extract from bug database easily. These were the test cases for bugs: #36981, #34590, #42580, #43617.)

I continued my investigation. First I debugged the test case for bug #42580 that returned a wrong result and looked quite simple. After several attempts to force my way through the InnoDB native code finally I came to the offending lines in innobase/row/row0sel.c:

*(prebuilt->fetch_cache[prebuilt->n_fetch_cached] + offs) ^=
(*(remainder_buf + offs) & templ->mysql_null_bit_mask);

It was Sergey's code and the intention was to mask in the null bit of an index field in the internal InnoDB row cache. It could be properly done with the code like this:

null_byte= prebuilt->fetch_cache[prebuilt->n_fetch_cached]+offs;
(*null_byte)&= ~templ->mysql_null_bit_mask;
(*null_byte)|= (*(remainder_buf + offs) &
templ->mysql_null_bit_mask);

I applied the fix and all the wrong results (bugs #37208, #42580, #43617) went away.
The crashes remained though. Several hours of additional debugging for the test case of bug #36981 brought me to a really bad memory overwrite in the build_template function from innobase/handler/ha_innodb.cc. By a pure chance the overwrite did not cause a problem for my test case. The crash was caused by usage of wrong template structures for reading row fields from mysql buffers. The crash happened in the code that had been added by Sergey in the row_search_for_mysql function to evaluate conditions pushed down to indexes. This problem can not be easily fixed since with ICP we may need two arrays of prebuilt template structures when executing the SELECT FOR UPDATE queries or multi-UPDATE/DELETE queries: one for the fields of the scanning index, another for the fields of the clustering index.
As a temporary solution I could suggest to block usage of ICP for such queries. It can be done with the following code:

if (file->active_index == file->pushed_idx_cond_keyno &&
file->active_index != MAX_KEY &&
index == prebuilt->index)
do_idx_cond_push= need_second_pass= TRUE;

This code prevents crashes for all reported test cases.

Only two days of investigations (I literally spent only a week-end for it, and, I would have spent much less time if I were familiar a little bit with the InnoDB code) convinced me that MRR for InnoDB is absolutely innocent, while ICP for InnoDB, though being guilty of serious misdeeds, should not undergo any severe punishment, as it's quite naturally to expect some faults from such a young feature and we don't have to employ any penitentiary institutions to correct these deviations.
A different attitude to this misdemeanor would make me doubt that we are really supportive for young talents: we turn them down should they manifest some erratic behavior .

I want to be clear here. Stating that MRR is innocent, I don't want to say that the MRR code is absolutely clean. It's a relatively new code, so most probably it still contains serious bugs. Sergey has recently pointed me to the problem of unlocked gaps for InnoDB. A similar problem should exist for Index Merge. If it's resolved there, why can't the same solution be applied for MMR? If it's not resolved there, should we disable Index Merge for InnoDB as well?

And what about you? How do you find MRR/ICP? Guilty, or NOT guilty? Should the case be appealed?

On a side note, I would like to add that MRR and ICP for InnoDB are really smart optimizations. MRR for InnoDB allows us to accumulate primary keys for multiple lookups in a buffer before fetching data in a sequential manner. The more keys you accumulate, the less disk sweeps you need to fetch the data. ICP allows us to reject a row as soon as the condition over the index fields is not satisfied.
MRR and ICP can interplay, or can be used independently. They both can be very helpful for BKA . Moreover, in fact, it does not make too much sense to use BKA for InnoDB/MyISAM without MRR.
The three features put together can give you a boost of performance for join queries that involve many rows. Yet, this will be the subject of a separate blog.

Thursday, July 2, 2009

Newly born...

Yesterday I still worked for Sun Microsystems as a Principal Engineer & MySQL Sr. Architect. Today was my first day of work at Monty Program, Inc, a subsidiary of a tiny company established by Monty Widenius in February. Yet I didn't even make the top ten. If people want not to miss the train they have to hurry up.

Am I happy? Oh, yeah...
No more waking up with the question constantly drilling my mind: “What am I doing here?” What are all of us, MySQL Server developers, doing without Monty? Waiting for the time when all our options are vested? I can't . That's too long for me. I'm already too old to wait any more.
Besides, we've already lost at least 3 years. We have to do what we planned to do in 2005. We have to raise the Server to the level where any RDBMS that claims to be called mature should be.

So who is newly born? Me? In a way, yes. This is my second reincarnation for the history of MySQL, after the first one that happened in December 2002 .

Also newly born is this blog where I'm planning to share with you, from time to time, my observations on the Server development at Monty Program and on interesting patches in the new server code that other people contribute. However, I don't want to limit myself only to this topic. What else am I going to share with you? You'll see soon enough...