How We Reduced Our Batch Processing Time by 58% with a Scale-Out MySQL Alternative

Daily Batch Reconciliation is a routine process in banking where a bank’s internal transaction records are cross-checked daily with external entities such as clearinghouses or other financial institutions. This is done to maintain the accuracy of the bank’s financial records. This article highlights how WeBank, by leveraging TiDB – an advanced, open-source, distributed SQL database – in its daily batch reconciliation operations, has successfully reduced its batch processing time by over 58%.

Introduction

WeBank is China’s first privately-owned online bank backed by Tencent. WeBank offers accessible and high-quality financial services to underbanked individuals as well as small- and medium-sized enterprises. So far, WeBank has served over 250 million individual customers, 20 million individual business customers, and 1.5 million corporate customers.

This case study explores WeBank’s successful use of PingCAP’s TiDB, an advanced, open-source, distributed SQL database, to clear its technical hurdles and accommodate business growth.

WeBank’s Data Center Node (DCN) Architecture

WeBank revolutionizes traditional banking models with its unique DCN (Data Center Node) architecture. This unified system, developed in combination with Tencent’s financial-grade distributed database, TDSQL, supports WeBank’s vast user base and billions of daily financial transactions. The DCN architecture is made up of the following components:

DCN architecture

DCN Units: WeBank manages customers as individual units, much like branches in a conventional bank, facilitating seamless expansion as the customer base grows. DCN is the minimal unit of deployment, including everything from the application layer and middleware to the underlying database.
Global Name Service (GNS): This component directs users to their respective DCN, ensuring efficient routing of user requests.
Reliable Message Bus (RMB): RMB enables inter-DCN communications, which is essential for transactions between users in different DCNs. Working together with GNS, it solves the routing and message exchange functions of the entire DCN.
Admin Area (ADM): This global operational data management backend is designed for unique business scenarios that cannot be accommodated within a single DCN. This is where TiDB resides.

The problem: Database performance and capacity bottlenecks

Despite its innovative DCN architecture, WeBank grappled with several technical issues. As shown in the below architecture diagram, both Online Transaction Processing (OLTP) for customer service and batch processing for bank clearing and settlement run on a single TDSQL database.

TDSQL based DCN

Running multiple workloads on a single TDSQL database led to:

Performance bottlenecks: The single-node TDSQL was unable to efficiently manage the long-running batch processing tasks. The high load during batch processing, with CPU and IO usage reaching up to 70%, strained the system. This increased the latency between TDSQL primary and backup clusters and escalated the risk of operational issues for OLTP applications.
Scalability and capacity issues: As WeBank’s business expanded, the rapid increase in customer data exerted considerable pressure on the database, leading to capacity and performance bottlenecks. This necessitated horizontal scaling of DCNs – a complex and costly operation that also inflated development and operational costs.
Operational and maintenance challenges: Large data volumes in single tables significantly increased database management tasks. Actions like adding a field to a single-table DDL took longer as a MySQL single-instance database didn’t support online DDL. Moreover, with an abundance of data in a single table, the likelihood of performance issues with business SQL statements increased.
Technical limitations: TDSQL was not equipped to handle horizontal scaling. Furthermore, it was unable to manage the vast amount of global data that couldn’t be split into DCNs. This constrained the database’s ability to support all business scenarios.

Recognizing these limitations as a potential blocker for future growth, WeBank decided to update its DCN technology architecture.

The solution: An upgraded DCN architecture with TiDB

WeBank’s DCN technology architecture underwent a significant upgrade when it adopted TiDB, solving its previous technical challenges.

Upgraded DCN architecture with TiDB

WeBank leveraged TiDB’s real-time synchronization tool, Data Migration (DM), to aggregate multiple DCN transaction data to the TiDB cluster in real-time.

The company also shifted the batch computing program to the TiDB cluster, leaving TDSQL to focus on OLTP for customer service workloads. This took the pressure off TDSQL from storing massive data and handling high load from batch processing.

By exploiting TiDB’s horizontal scaling feature, WeBank managed to store full historical customer service data in TiDB, while TDSQL will store data going back no more than six months. Queries for data beyond six months will be serviced by TiDB.

WeBank chose TiDB as its core technology stack for several compelling reasons:

Extreme scalability and strong consistency: TiDB’s distributed SQL architecture enables horizontal scaling while ensuring strong data consistency. Its separation of compute from storage allows flexible resource utilization and cost control.
High availability: TiDB supports high availability with automatic fault recovery. PingCAP’s around-the-clock technical support provides professional and experienced customer service to resolve issues quickly.
MySQL compatibility: TiDB’s compatibility with the MySQL protocol minimized application migration costs. Data replication from MySQL to TiDB was seamless and in real-time.
Open-source and community support: TiDB’s active open-source community and rapid feature improvements align with WeBank’s embrace of open source technologies.

The results: Significantly reduced TCO while improving operational capabilities

TiDB has revolutionized the way WeBank handles its fast-growing business. It has laid the groundwork for a future-proof, efficient, and cost-effective data management system. With TiDB, WeBank is shaping the future of distributed database standards for the financial services industry in China.

Technical benefits

Reduced batch processing time by 58%: WeBank significantly reduces batch processing time from 3 hours and 13 minutes on a 16-node TDSQL to 50 minutes on a 12-node TiDB cluster.
Horizontal scalability: TiDB enables WeBank to achieve horizontal scale in both database computing and storage by adding physical nodes. This eliminates the technical complexities and high costs of expanding DCNs.
Separation of batch processing and OLTP systems: TiDB separates batch tasks from OLTP workloads, ensuring efficient data operations.
Addressing performance and capacity bottlenecks: TiDB solves the performance and capacity bottlenecks of a single-instance TDSQL database, especially in scenarios where DCN cannot be split. This reduces the overall operational management cost of the system.

Business benefits

Significantly reduced total cost of ownership (TCO): TiDB replaces TDSQL and HBase, reducing redundant backend architecture and improving business efficiency.
Increased business focus: The introduction of TiDB allows WeBank to focus on its core operations, promoting fast development cycles and maximizing business value.
Improved operational capabilities: TiDB has improved the operational capabilities of WeBank’s open-source community and helped cultivate senior TiDB engineers. This has enables WeBank to influence the TiDB open-source project for its business services.
Industry influence: Together with WeBank, TiDB is helping to develop distributed database selection standards for the financial services industry in China.

Industry

Financial Services