corteza-tuning

Author: Huansheng Chen (Testing Engineer at PingCAP)
Editors: Calvin Weng, Tom Dewan

Imagine this: you work in sales operations. After a sales promotion, your boss urges you to provide the sales statistics. You enter your Customer Relationship Management (CRM) platform, set your query conditions, and run the query. The cursor blinks for 60 seconds without any results. The boss is watching. To ease the tension and embarrassment, you choose to refresh the query condition. Another minute goes by, and you finally get some results before people get mad.  

If you have a Software as a Service (SaaS) platform backed by MySQL, you have been in this situation yourself. It reveals a core issue with MySQL: as a business grows, and its data sets grow larger and query workloads become increasingly complex, scaling out MySQL databases by sharding becomes a performance bottleneck. This results in inefficient query processing and a poor user experience. The industry is looking for an answer. 

As an open source distributed SQL database with Hybrid Transactional and Analytical Processing (HTAP) capabilities, TiDB is a nice replacement for MySQL in the scenario above. In this article, we compare the performance of TiDB and MySQL on one of the most popular low-code SaaS platforms, Corteza CRM. We also demonstrate how TiDB leverages its HTAP capabilities to boost the SaaS platform without any code changes and speeds up development of new applications.

Results first: TiDB vs MySQL 

Before we get to our analysis, let’s go right to the before and after comparison. We used a 16 GB dataset with about 5 million leads. As you can see, with the same hardware and dataset, TiDB performs 5x better than MySQL, with an execution time of 6 seconds and 31 seconds respectively.

Corteza CRM meets TiDB

In our experiment, we used Corteza CRM Suite, a highly flexible, scalable CRM platform built on top of Corteza. The Corteza CRM suite data model is as follows:

  • The compose_record table stores the correspondence between record_id and module_id.
  • The compose_record_value table stores record_id and the correspondence between each column and its value.

Corteza CRM data model

As an open source distributed SQL database, TiDB integrates storage engines of both row and column formats and decides the best way to access data. TiKV is designed for Online Transactional Processing (OLTP), while TiFlash is column-oriented storage for Online Analytical Processing (OLAP) workloads.

TiDB architecture

Migrating from MySQL to TiDB 

The first step in replacing a database is migrating your data to the new database. Because TiDB is MySQL compatible, you don’t have to change code when you connect a Corteza CRM system to TiDB. We used TiDB’s data import tool, mydumper, to migrate the data from MySQL to TiDB. Corteza works seamlessly after the data is copied from MySQL to TiDB.

Why is MySQL slow?

Since detailed business data is stored in the compose_record_value table in key-value format in Corteza, a common application SQL pattern is multiple JOINs on the compose_record_value table. As shown in the sample SQL statements below, the analytical query contains intensive filtering, JOINs, and aggregations. Based on the data model, there are six JOINs from compose_record to compose_record_value in the application query; the compose_record_value table is accessed six times and is filtered by the “.name” column on every access. As the data sets in the compose_record and compose_record_value tables grow, query performance downgrades because the number of index lookups on compose_record_value becomes larger.

SELECT
…
FROM
  compose_record AS crd
  LEFT JOIN compose_record_value AS rv_BudgetedCost ON (
    rv_BudgetedCost.record_id = crd.id
    AND rv_BudgetedCost.name = 'BudgetedCost'
    AND rv_BudgetedCost.deleted_at IS NULL
  )
  LEFT JOIN compose_record_value AS rv_ActualCost ON (
    rv_ActualCost.record_id = crd.id
    AND rv_ActualCost.name = 'ActualCost'
    AND rv_ActualCost.deleted_at IS NULL
  )
  LEFT JOIN compose_record_value …
WHERE
…
GROUP BY
  dimension_0
ORDER BY
  dimension_0

Tuning with HTAP of TiDB

The traditional index strategy and nested loop JOIN does not scale for the execution plan pattern of Corteza CRM. This brings us to the HTAP capabilities of TiDB. Besides serving as the columnar storage for OLAP workloads, TiFlash supports the Massively Parallel Processing (MPP) mode to execute queries. MPP introduces cross-node data exchange (data shuffle process) into the computation. TiDB uses the optimizer’s cost estimate to automatically determine whether to select the MPP mode. In addition, for complex queries like this, we could leverage partition pruning optimization to eliminate the amount of data that needs to be accessed and potentially significantly improve query execution times.

We optimized the performance of Corteza CRM on TiDB as follows:

  • Enable HTAP to leverage the advantages of TiFlash MPP engine, such as parallel hash join and modern columnar data scan.
  • Range partition the compose_record table by the “created_at” column.
  • List partition the compose_record_value table by the “name” column.

By enabling HTAP features:

  • The optimizer was able to push down all execution processes to TiFflash, including the fast columnar data scan, hash join, and window function processing.
  • By optimizing the schema by the range and list partition, the scan method changed from TableFullScan to PartitionTableScan. This is much more efficient for the application to access the compose_record_value table.

After tuning with HTAP, the latency of the example query above reduced from 730.8 ms to 88.2 ms. The performance is 8x faster!.

 

SQL tuning results (lower is better)

Can we improve performance even more?

In our testing, we improved the performance of Corteza CRM from 30 seconds to 6 seconds with just a few tuning steps. As good as that is, we think we can do even better. If we can improve the business logic of our database schema, we can leverage the HTAP capabilities of TiDB. Such a design may include:

  • Creating indexes on columns with good filtering for small tables. This allows the TiDB optimizer to choose a better “hybrid” plan that leverages both index scan and columnar scan in one query at the same time.
  • Better partitioning of the schema so more queries can benefit from partition pruning. This will result in less data processing and shorter latency.

Besides the performance benefits, the HTAP capabilities of TiDB also ensure the data “freshness.” Applications can access newly-inserted data from both row and columnar storage without introducing other components. This greatly simplifies the architecture of the SaaS platform and reduces time-to-market and maintenance costs.

Conclusion

SaaS platforms backed by traditional MySQL databases are increasingly faced with performance bottlenecks brought on by ever larger datasets and more complex queries.   Compared with traditional database solutions, TiDB provides a solid one-stop solution to keep up with users’ fast-growing and complex business workloads. By leveraging the MySQL compatibility and real-time HTAP capabilities of TiDB, CRM platforms like Corteza can easily tackle the performance bottleneck, and you don’t have to change any code.  

If you are interested in TiDB, you’re welcome to join our Slack community and TiDB Internals to share your thoughts with us. You can also follow us on Twitter, LinkedIn, and GitHub for the latest information. 

Read more about HTAP and TiDB:

Huansheng Chen

About the Author

Huansheng Chen

More From Huansheng Chen

Subscribe to Stay Informed!

TiDB Cloud logo-white

TiDB Cloud

Get the massive scale and resiliency of TiDB databases in a fully managed cloud service

TiDB logo-white

TiDB

TiDB is effortlessly scalable, open, and trusted to meet the real-time needs of the digital enterprise