placement rules in sql

Author: PingCAP
Editors: Fendy Feng, Calvin Weng, Tom Dewan

TiDB is a distributed and elastically scalable database. Data scheduling management is one of its most basic capabilities. TiDB clusters can be deployed across multiple servers and data centers, and data can also be distributed across regions based on hotspots and data volume. As TiDB users’ business becomes more complex and globally distributed, they have higher demands for resource isolation, access latency, and cross-region data access. This requires more flexible data placement rules. 

To improve the user experience, TiDB 6.0 introduces Placement Rules in SQL, a new data placement framework based on the SQL interface. It allows TiDB users to flexibly schedule and manage the number of replicas, role types, and placement locations of any data. This enables TiDB to provide more flexible data management in multi-services shared clusters, cross-region deployments, and many other scenarios to meet diverse business needs. 

Next, let’s learn how this new feature can help you by introducing three use cases. 

Flexible cross-region data placement to reduce latency 

One TiDB user is a service provider with a global business. They previously adopted a centralized data architecture. After their business expanded internationally, the cross-region data access through the centralized architecture became slow, and the cost of cross-region traffic was high. So, to support local business, they built local data centers. 

There were two types of data: locally managed regional data and globally accessible configuration data. Regional data volume was large, and it was updated frequently, but it was rarely accessed from other regions. In contrast, the global configuration data volume was small, and it was rarely updated. However, it was globally unique and needed to be accessed across regions. Apparently, traditional standalone databases or databases deployed in a single region can’t meet these requirements. 

TiDB became their choice and fixed their headaches. As shown in the figure below, TiDB clusters are deployed in three separate data centers based in North China (DC1), East China (DC2), and South China (DC3). This allows their customers in different regions to access their local data in the nearest data center. 

TiDB clusters are deployed in three separate data centers

With TiDB 6.0’s Placement Rules in SQL, they can schedule all the replicas of regional data to a specific data center in a specific region. All the regional data can be stored and managed within the region. This helps reduce the latency of cross-region data replication and cut traffic costs. All they need to do is to label the storage nodes in different data centers, create corresponding placement policies, and execute SQL commands to specify where the data is stored. 

Below are examples of placement policies. 

CREATE PLACEMENT POLICY 'east_cn' CONSTRAINTS = '[+region=east_cn]';
CREATE PLACEMENT POLICY 'north_cn' CONSTRAINTS = '[+region=north_cn]';

ALTER TABLE orders PARTITION p_hangzhou PLACEMENT POLICY = 'east_cn';
ALTER TABLE orders PARTITION p_beijing PLACEMENT POLICY = 'north_cn';

Workload isolation

Another TiDB user is a large internet company with more than 2,000 business lines. All the data was managed by one or more MySQL clusters. Due to large data volumes, approximately 1,000 MySQL instances stored the data. Because of this, many routine operations such as data monitoring, diagnoses, version upgrades, and security protection put tremendous pressure on the operation and maintenance (O&M) team. What’s worse, as business grew, the O&M cost increased year by year. They wanted to reduce the database instances to cut O&M costs, but it was hard to ensure the data isolation, access security, flexible data scheduling, and low management costs.

Through TiDB 6.0’s Placement Rules in SQL, you can easily create flexible share-cluster rules. For example, they can create the following SQL commands to make Businesses A and B share the same storage resources to reduce storage and management costs, and make Businesses C and D use exclusive resources to ensure higher business isolation. 

CREATE PLACEMENT POLICY 'shared_nodes' CONSTRAINTS = "[+region=shared_nodes]";
CREATE PLACEMENT POLICY 'business_c' CONSTRAINTS = "[+region=business_c]";
CREATE PLACEMENT POLICY 'business_d' CONSTRAINTS = "[+region=business_d]";

ALTER DATABASE a POLICY=shared_nodes;
ALTER DATABASE b POLICY=shared_nodes;
ALTER DATABASE c POLICY=business_c;
ALTER DATABASE d POLICY=business_d;

Flexible resource sharing between Business A, B, C, and D

Since multiple business services share one TiDB cluster, the frequency of O&M tasks such as upgrading, patching, backup planning, and scaling decreases considerably. This helps relieve the management burden and boost efficiency.

With Placement Rules in SQL, they use a few TiDB clusters to manage a large number of MySQL instances. Data sources from different business services can be stored in different databases, and then scheduled to different physical nodes according to the placement policies you set. This physically isolates different business data and avoids the mutual interference caused by resource contention and hardware failures. Besides, they avoid unnecessary data access across business services by managing the user permissions, thus improving data quality and data security. 

In short, the new Placement Rules in SQL introduced by TiDB 6.0 require far fewer clusters, considerably reduce O&M costs and DBA’s workloads such as upgrades, monitoring, and alerting. They also balance the requirements of resource isolation and low cost.

Low latency reads with Stale Read 

Another TiDB user is also an internet company. They built both primary and secondary data centers in one city. Although they could access the data in the secondary data center, the read latency was high during peak hours due to network delay. This affected the user experience. 

With TiDB 6.0’s new Placement Rules, they can precisely assign the data replicas to a specific machine room. They can also leverage TiDB’s Stale Read features to read historical data during a specified period in scenarios where data freshness and strong consistency are not required. This avoids read latency caused by data replication across data centers and realizes high throughput. 

Below is an example of placement rules in SQL statements. 

CREATE PLACEMENT POLICY eastnwest PRIMARY_REGION="us-east-1" REGIONS="us-east-1,us-east-2,us-west-1" SCHEDULE="MAJORITY_IN_PRIMARY" FOLLOWERS=4;
CREATE TABLE orders (order_id BIGINT PRIMARY KEY, cust_id BIGINT, prod_id BIGINT) PLACEMENT POLICY=eastnwest;

Below is an example of the AS OF TIMESTAMP syntax used to support the Stale Read feature to read historical data in TiDB. 

SELECT * FROM orders WHERE order_id = 14325 AS OF TIMESTAMP '2022-03-01 16:45:26';

Summary

Placement Rules in SQL is an interesting feature in TiDB 6.0. With this new feature, now for the first time, users can freely schedule data at different levels of partitions, tables, or databases based on labels. This gives TiDB users more flexible data management to meet diverse business needs. 

If you are interested in TiDB 6.0 and its new features, feel free to give it a try and share your feedback. We look forward to exploring more possibilities with you. 

PingCAP

About the Author

PingCAP

More From PingCAP

Subscribe to Stay Informed!

TiDB Cloud logo-white

TiDB Cloud

Get the massive scale and resiliency of TiDB databases in a fully managed cloud service

TiDB logo-white

TiDB

TiDB is effortlessly scalable, open, and trusted to meet the real-time needs of the digital enterprise