Evaluating TiDB against the MySQL Test Suite

Author: Morgan Tocker (SQL Engine Team at PingCAP)

There is an old saying that history has a tendency to repeat itself, and I have software proof of that. For those not yet familiar, TiDB is a distributed SQL database that is compatible with MySQL. It’s not a fork, and is not even written in the same language as MySQL (MySQL: C++, TiDB: Go).

As one would expect with a database product, TiDB has extensive testing:

Unit tests to evaluate the correctness of specific code areas.
Integration tests to test features at the SQL level, and test cases for fixed bugs.
Performance testing to identify any regressions.
Stress testing to identify how the system performs under load.
Chaos testing to observe how the system behaves when faults are injected.

However, the point about MySQL compatibility bears repeating. Even though they share no common ancestry, is there anything that we can learn from MySQL to improve the quality of TiDB?

Both systems were developed by humans. To paraphrase the above, can we use the history of discovered bugs in MySQL to improve the quality of TiDB?

Introducing the MySQL Test Suite

MySQL is (to my knowledge) unusual in that many of its tests are written in plain SQL. The main mysql-test-runner lives external to the MySQL server and relies on a series of .test and .result files to test the correctness of the database. Here would be an example:

# example.test
SELECT 1+1
# example.result
SELECT 1+1
1+1
2

There are over 1300 .test files. For many years, this was the main way that MySQL wrote both functional coverage and regression tests.

Trying the MySQL Test Suite against TiDB

If the test files are written in SQL, that means we can run them against TiDB, right? The short answer is yes! But the real answer is more nuanced. To explain why, let’s start with an example:

# union.test
SELECT 1 UNION SELECT 2;

In MySQL, the above query will always return 2 rows in the order of 1 and then 2.

In TiDB, the order is non-deterministic. This is because it executes both parts of the query in parallel and then combines the result. The mysql-test-runner does include a way to say that a query order does not matter, but this requires a manual inspection to know that it’s safe:

# union.test
–sorted_result
SELECT 1 UNION SELECT 2;

We internally refer to --sorted_result as a macro, and the test-runner features many such macros. For example, you can patch out a column with --replace_column N <newvalue>. This is useful when working with random or temporal values.

In fact, there are many such legal cases where TiDB can return different results from MySQL. And there are even more if you include features that are documented as not supported, such as stored procedures, triggers, and full-text search.

Given the fact that the tests usually test edge cases, this adds up to a lot of nuanced behaviors that need manual investigation.

How do you manually evaluate 1300 tests?

The value of the MySQL Test Suite became very clear to me while I was working on Dynamic Privileges (a feature of TiDB 5.1 and MySQL 8.0). As part of evaluating my work, I ran it against the grant_dynamic.test from MySQL, and low and behold – 3 very useful bugs!

But that was in early 2021, and I was not in a position to evaluate the trove of other tests. But by August 2021 the lightbulb moment struck:

The main skill requirement is SQL
SQL-92 is now 30 years young

With my wife having left the workforce to raise our kids, I was reminded that there are in fact many talented people out there who might not be on our traditional radar. We would just need to teach MySQL & TiDB. However, having previously worked as a MySQL trainer makes this decision a no-brainer to me.

We put out a call for returnships, with the goal of having three people onboard as a cohort so that we could more easily provide training and support.

The results

We went from a team of never having ever used TiDB to a critical bug filed within the first week. I can not understate exactly how quickly the team got up and running.
Within 4 months we hit the following:

Bug Severity	Count
Critical	7
Major	46
Moderate	90
Minor	11
Total	154

But bugs are only part of the benefit. We also improved our test coverage, which means new bugs are less likely to be introduced to TiDB.

There were also some unexpected side-effects of having introduced new colleagues to work on this task. Earlier I mentioned that there are some features documented as not supported. As we discovered cases in the tests that were less than clear in the docs, we made clarifications. For example:

TiDB does not always return the same `auto_increment` value as MySQL and we now explain why that is expected.
We also better explain a common cause for why a COMMIT statement can fail.
(Regrettably) We now have a disclaimer on a common source of compatibility issues that are difficult to fix completely.

Conclusion

Was the project a success? I think the numbers speak for themselves there.

I might be wrong, but I felt like onboarding everyone together was a key part of the secret to our success. With everyone starting from the same place, it set us up to work as a team and support each other.

On a personal note, I also want to say how enjoyable this project was for me. I have always enjoyed the training, but this was different because I had three students who showed up ready to learn, and worked hard on all the extra credit exercises so they could learn everything they could.
About a month back while we were diving into some very strange nuances I made a joke about how it would become useful next time there is a ‘strange MySQL behaviors’ trivia contest. What I didn’t say is that on the day that contest occurs, I’ve already got the dream team.

Book a Demo