Notes of an optimizer reviewer: 2012

Has it ever happened to you, when attending a party or any other convention, to enter the hall, see a gang of nice people drinking, smiling, laughing, doing small talks, and you saying “hi” to them and in response getting just and empty stare somewhere above your head? And you then – trying to approach a guy looking quite familiar to you – stretch your hand to shake his hand just to watch him demonstratively turn his back to you? And so there you nervously looking around, and stealthily checking your back (no, no tail), and throwing a glance at your toes (still no hooves), and you passing your palm across your sweating forehead (no hints of horns). And “am I plagued?” comes to your head as you start regretting utterly that your appeared here at all, and you start wishing to be disintegrated, annihilated, completely blown out... You don't have to. Calm down. You are invisible. You are invisible for them. This is an Oracle convention, and you are not from Oracle. You do not exist. It's that simple.

Why am I getting so emotional, so mad? What's actually happened? What has forced me to express myself in the language that is not my native? Have I read this this blog? What's about it? The guy says “ I had demonstrated how subquery materialization, introduced in MySQL 5.6.5” , even though the feature was introduced as long ago as in MySQL 6.0 alpha in 2007? Well, big deal: lame wording, nothing more. The guy says about his latest feature: “I implemented”? I've checked: he really implemented it. Quite new code. The code is raw, contains obvious bugs,the first query you submitted from a test suite returned a badly chosen plan? Of course the code is raw (it's only RC after all), of course it contains bugs (any code contains bugs), and, of course you are extremely unlucky. The code exploits the same idea underneath as an unnamed implementation? Well, it's quite disputable. Besides “the same idea” with what? With that in MariaDB code? But MariaDB does not exist! Look at the blogs of MySQL developers. All of them. Look at the presentations from MySQL Connect. No mention of MariaDB. Ergo: it does not exists.

Dear MySQL developers from Oracle, the database engineers from Sun and my former colleagues from MySQL that still stay with Oracle (not too many of them though): Intentionally or unintentionally, with bad will or good will, you effectively destroy the house of FOSS (that is already pretty shaky due to its flimsy architecture) when you defiantly do not notice any other MySQL development, do not give any credit to the developers from the community. When I and my colleagues develop a new feature and give it to the community the only things I and my colleagues ask for is the acknowledgment from the community that I and my colleagues are considered as the authors of the feature. And the fact how much money Oracle spends on the development of its own features cannot deprive us of our authorship. So please, respect our authorship as we respect the authorship of anybody else, including yours. Otherwise there won't be any development community for MySQL. There might be some user community around Oracle's MySQL . This perhaps may be a goal for Oracle, but I doubt that it can be an attractive perspective for Percona, SkySQL, Galera and many other people from MySQL community.

I would like also to remind you about the culture of annotations that is indisputably acknowledged in open publications. Especially I remind about it for the engineers from Norway among whom are a few PhDs and who for sure are familiarized with this culture. Of course, theoretically speaking in any of my articles I can use the result of any theorem without giving any reference to the first publication. Yet, the chances are extremely low that the article will be published in any solid edition. And the chances are high that my next articles won't be accepted under some ridiculous pretext.

What if Oracle implicitly or explicitly prohibits the mention of the achievements of others in the blogs of its employees? Well, it's a hard question and at the moment I don't have any answer. I only know that a malicious corporate order cannot be taken as an excuse for my broken integrity. That's why, anticipating a very high probability of such kind of moral collisions, I preferred joining Monty Program AB instead of joining Oracle.

At the end I should apologize for Sergey Petrunia who published this blog several days ago and then went on vacation. In his blog, Sergey claimed that no implementation of Cost-based choice between Materialization and IN->EXISTS strategies could be found in MySQL 5.6.7. It turns out he was mistaken. His mistake is quite understandable though. He could not see any public commits for this feature, any traces of the feature in the official change log for the release. Anyway, unintentionally he published facts that were not true and he should have delivered his apologies, had he existed. But he doesn't. Neither do I.

In his comment for Vadim Tkachenko's post Baron Schwartz writes:


"...I speculate that when MySQL 5.6 is GA, the official MySQL from
Oracle will have a clear advantage over GA versions of MariaDB in
several common types of workloads. When will MariaDB based on MySQL
5.6 be released? I suppose that when MariaDB based on 5.5 is 
finished, we will have a rule of thumb that might be useful to
estimate the lag. I’d expect (but I could be wrong) that it will
take somewhat longer to port to 5.6, because unlike the 5.5 
codebase where Oracle, Percona, and Monty Program made a lot of
changes in somewhat disparate parts of the server, in MySQL 5.6
there will be a lot of changes that will potentially conflict —
in MySQL 5.6 there are extensive changes to the query optimizer
and the replication codebase, which have been changed a lot in
MariaDB as well. These changes might be difficult to merge."

Baron is known as a prominent advocate of the Oracle MySQL products. In this role he probably believes that the less credit he gives to the competing product MariaDB, the better. As a member of the MariaDB team, I, on my part, can hardly be expected to consider it as a fair position.

My intention is not to convince Baron that MariaDB deserves a more gracious attitude from his side, but rather to provide other members of the MySQL community with a guidance how the information on the state of the MySQL optimizer development at MariaDB and at Oracle could be extracted from the product source code.

I conventionally understand “optimizer” here as the piece of the MySQL server code that is responsible for :
- transformations of the submitted query to one with better execution time
- search for the best execution plan of the transformed query.
Usually new, more efficient algorithms to perform relational operations employed by query execution plans are also considered as the area of responsibility of the optimizer. The user oriented functionality directly related to the optimization process (like support of EXPLAIN) also belong to the domain of the optimizer.

Ok, I'm a member of the MySQL optimizer team since 2003, I'm tightly involved in the MariaDB optimizer development, so it's quite probable that I know a lot of details concerning the optimizer code in MariDB. But what about the optimizer development at Oracle? Can I be considered knowledgeable here? I think so, if you take into account the following circumstances:

the main bulk of the optimizer features of MySQL 5.6 were back-ported from ill-fated MySQL 6.0
these features were either written by my colleagues/me from Monty Program AB or were partly designed by me or were reviewed by me
the MySQL 5.6 code is still open (lp:mysql-server/trunk) and I regularly review the optimizer related fixes from there and back-port them to the MariaDB code if I find them appropriate
the MySQL optimizer features that were developed by Oracle proper are not too complex yet and can be accessed by any expert

So what are the main optimizer features of MariaDB5.3/MySQL 5.6? They are:

subquery semi-join optimizations (developed for MySQL-6.0 by Sergey Petrunia in 2007-2008)
subquery materialization (developed for MySQL-6.0 by Timour Katchaounov in 2007)
disk sweep implementation of the multi-range read (MRR) interface (developed for MySQL 6.0 by Sergey Petrunia in 2007)
batched key access join algorithm (developed for MySQL 6.0 by me, Igor Babaev, in 2007-2008)
index condition pushdown (ICP) (developed for MySQL 6.0 by Sergey Petrunia in 2007)
optimizations for derived tables: merge of derived tables, late materialization of derived tables/views, indexes for derived tables/views (incomplete development for MySQL 6.0 by Evgeny Potemkin was done in 2008-2009, see WL #4777)

The first five of these features were back-ported into MariaDB 5.3 code line by me and Sergey Petrunia in December 2009 – February 2010. At Oracle they were back-ported into MySQL 5.6 code line by Evgeny Potemkin by June 2010. Initial back-ports code-wise were almost identical. Yet when back-porting into MariaDB some re-engineering was performed.

After the back-port of optimizer features from MySQL 6.0, the MariaDB team decided to add:

null-aware check of IN/NOT non-correlated subquery predicates (MWL #68)
smart choice between materialization of not-semijoin IN subqueries and transformations of the IN subquery predicates into EXISTS subqueries (MWL #89)
inside-out execution of non-semijoin subqueries (MWL #90)
subquery cache for correlated subqueries (MWL #66)

We also planned to implement MRR for primary keys in InnoDB. Without it InnoDB primary keys could not be used for BKA and this compromised the performance of BKA join greatly (see MWL #121).

And of course, from the very beginning we planned to back-port the code of WL #4777 and to complete the development of the optimizations for derived tables (MWL #106).

We did not experience any problems with the implementations of MWL #68 and MWL #66. MWL #106 promised to be a not so easy project, but by May 2010 it was successfully completed. MWL #90 required some re-engineering work. Yet it was finished in more or less expected time.

Quite unexpectedly MWL #89 and MWL #121 turned out to be the tasks at which we stumbled badly (especially the first one). So badly that I had enough time to finish the task I had never planned for MariaDB 5.3: an implementation of the block-based hash join algorithm (MWL #128). This task required some re-engineering work for the other block-based join algorithms though.

Anyway, by the end of July 2011 (too much for a development cycle, anyone of us agrees upon this!) all optimizers features were in and we released MariaDB 5.3.0 beta. 7 months more and now we are ready to release 5.3.5 GA.

What about the back-ported optimizer features in MySQL 5.6?

April 2011,MySQL 5.6.2:
- MRR is released (yet, with no MRR for innodb primary keys)
- ICP is released
October 2011, MySQL 5.6.3:
- optimizations for derived tables are released
- BKA join algorithm is released
December 2011, MySQL 5.6.4:
- no new optimizer features
???, MySQL 5.6.5
- ???

Will subquery optimization appear in MySQL 5.6 eventually? Looking at the development tree I can't surely say so. Yes, they are enabled in the optimizer team tree and everybody can install the tree and build MySQL 5.6 with subquery optimizations. But I can't tell you how close the team is to the completion. I could if I had looked at their outstanding bugs. But this information is not open for me.

Anyway, if you really want to compare the gains you get from optimizer features of MySQL 5.6 with those of MariaDB 5.3 I would recommend you to build the MySQL 5.6 tree with subquery optimizations.

Did we make such comparison? Yes, we did with the DBT-3 benchmark on the databases of scale factor 10 (~30GB), 30 (~80GB), 100 (270GB) for InnoDB and MyISAM.

Where does MariaDB win? On Q3, Q5, Q7, Q8, Q10. Why? Mainly because when employing BKA we can use MRR for primary keys in InnoDB and MRR with sorted keys for MyISAM/InnoDB. How much do we win? It depends the engine and the database scale factor (and the type the disk system, of course).
For example, when executing Q7 for the InnoDB database scale factor 10 we won 257s : 1886s , while when executing Q3 we won 288s : 691, when executing Q8 – 330s : 952s (a conventional HDD was used).

For the InnoDB scale factor 100 we had the following ratio
Q3. 980s : 2008s
Q7. 1393s : 8498s
Q8. 1118s : 3209s.

Do we lose sometimes? Never for MyISAM. Yet on Q9 and Q12 for InnoDB we did. We did not lose too much, but quite noticeably : 589s : 403s on Q9, 465s : 336s on Q12 if to execute on the database of scale factor 10.
The actual cause of this loss is to be investigated yet, but the interesting thing is that we do not lose here if we do not ask for MRR key sorting.

What about subquery optimizations for DBT-3? Here MySQL 5.6 and MariaDB 5.3 show approximately the same results. A more disappointing fact is that, with the exception of Q18 execution of which on MariaDB 5.2 takes years while on MariaDB 5.3 it takes a couple of minutes, other queries with subqueries from DBT-3 do not benefit too much from the new subquery optimizations.

Comparing optimizer features of MariaDB 5.3 and MySQL 5.6 I should mention also other optimizer improvements that do not overlap.
For MariaDB 5.3 these are:

For MySQL 5.6 these are:

(BTW, have you ever tried to execute Q7 or Q8 with optimizer trace turned on and to look through the results?).

The MariaDB 5.5 tree is green in Buildbot and I expect the first alpha release these days. MariaDB 5.5 will not include much of new development from MP AB. It will be mainly a merge of MySQL 5.5 with MariaDB 5.3. Yet some cute optimizer features will appear there.
I already blogged about one of them. Another feature will allow us to use subquery optimizations for some [NOT] EXISTS subqueries. There will be a couple of other very specific optimizer features developed for our valuable customers.

The optimizer code of MariaDB and that of MySQL diverge more and more. Does it mean that we won't be able to back-port interesting features from MySQL 5.6 into MariaDB? Not at all. We still understand the optimizer internals pretty well.

Do Oracle's MySQL optimizer guys understand it as well? Judging the evolution of their bug fixes I could say that definitely their understanding is getting better and better. I can't say that they've already reached our level, but it may happen in the future.
I almost forgot to give you their names in order you could easily detect their patches in the huge mysql trunk tree to follow their progress yourself and to be able to make judgments yourself rather the trust some shaky speculations. Here they are (in no particular order):

Roy Lyseng (from Sun, Norway)
Jorgen Loland (from Sun, Norway),
Ole John Aske (from Sun, Norway),
Jon Olav Hauglid (from Sun, Norway),
Olav Sandstaa (from Sun, Norway),
Oystein Grovlen (from Sun, Norway),
Tor Didriksen (from Sun, Norway),
Guilhem Bichot (from MySQL AB, original Maria team, France)
Evgeny Potemkin (from MySQL AB, original optimizer team, Russia)

(there is another Russian guy who is from the original optimizer team, who implemented EXPLAIN for DML statements. His name is Gleb Shchepa. Yet I'm not sure that he has not moved to another team).

And here we are, those who actively work on optimizer features in MariaDB:

Sergey Petrunia (with MySQL/MariaDB since November 2003, Russia)
Timour Katchaounov (with MySQL/MariaDB since April 2004, Bulgaria)
Sanja Byelkin (the author of the query cache, the author the original subquery code, one who implemented views in MySQL, Ukraine)
Igor Babaev (with MySQL/MariaDB since December 2002, USA/Russia)
and, of course, the last, but not the least,
Michael (Monty) Widenius (the main reviewer of the optimizer code now, Finland)

Notes of an optimizer reviewer

Tuesday, October 16, 2012

Once more about comparison of subquery optimization in MySQL 5.6 and MariaDb 5.5, once more about nothing...

Sunday, February 26, 2012

Comparing the optimizer features of MariaDB 5.3 and MySQL 5.6

Followers

Blog Archive

About Me