
Elevate modern apps with TiDB.
Authors: Chunhui Liu and Chao Hong (Database Administrators at Shopee)
Shopee is the leading e-commerce platform in Southeast Asia and Taiwan. It is a shopping platform tailored for the region, providing customers with an easy, secure, and fast online shopping experience through strong payment and logistical support.
As our business boomed, our team faced severe challenges in scaling our back-end system to meet the demand. Our previous article introduces how we use TiDB, an open-source, MySQL-compatible, hybrid transactional and analytical processing (HTAP) database, to scale-out our system so that we can deliver better service for our users without worrying about database capacity.
There are so many databases available in the market. How do you choose the right one? In this post, I’ll share our thoughts with you. I hope this post can help you when you’re comparing multiple databases and looking for the right fit for your application.
At Shopee, we use these databases:
Our business is growing fast. This year, we’re creating ten times as many databases per week than we did one or two years ago. Every year, the number of logical databases in the production environment grows by about three to five times. In 2019, this number increased more than five times.
Before a database enters the production environment, database administrators (DBAs) and R&D teams give a rough assessment of its physical and logical designs.
According to our experience, if we make a wrong decision during the design phase, we’ll spend a lot of time and energy rectifying it later. Therefore, we must formulate a concise and efficient database selection strategy to ensure that we can make a correct decision most of the time.
At Shopee, we use this general database selection strategy:
When we’re choosing a database from a variety of products, we ask ourselves:
At Shopee, we have a list with dozens of checkpoints to help us assess a new database design, among which “To shard or not to shard” is an important one. For a long time, MySQL sharding was the only way for us to horizontally scale a database. After we introduced TiDB to Shopee, we got a new “no sharding” option.
According to our experience, in some scenarios, MySQL sharding has more cons than pros. We have to bear extra costs for daily development, operations, and maintenance. During the database selection phase, DBAs and R&Ds need to identify the following scenarios and find specific ways to solve problems:
So far, I’ve described Shopee’s basic strategy for database selection and related critical metrics. In this section, I’ll share with you what we’ve learned.
At Shopee, we sometimes use MySQL as an in-memory database. After an application comes into the production environment, the R&D team may first focus on the application logic, while the code of the database access layer might not be optimal.
Therefore, during the early phase of a project, slow queries and bursty high-frequency reads and writes are common issues. To address these issues, we try to ensure that the memory space is large enough to load all hot data into the MySQL buffer pool. This helps mitigate some application performance issues.
According to our statistics, 80% of Shopee’s production databases have less than 50 GB of data. Because the memory size of our database servers is larger than 50 GB, we can get through the application trial-and-error period. When we enter the data-outburst phase, we can ask the R&D team to optimize the database.
Of all our production databases, 2.5% have more than 1 TB of data. The average data size of these terabyte-scale databases is 2 TB. The largest has more than 4 TB of data. The DBA’s first priority is continually reducing the data volume of those terabyte-scale databases.
To tackle the soaring data size for a database, we can try MySQL sharding and TiDB. We can also archive old data and upgrade hardware resources.
Old data occupies a lot of disk space, but they’re not read or written frequently. It means that they’re probably not “hot data.” If the application owner lets us, we usually archive old data to a separate MySQL instance. The application needs to migrate reads and writes to the new instance. The new instance stores old data by year or month in different tables to avoid a single table being too large. For the new instance, we can also enable InnoDB transparent page compression to decrease disk usage.
TiDB is a good choice for data archiving. Theoretically, a TiDB cluster can infinitely scale out, and users don’t need to worry about limited disk capacity. TiDB features elastic horizontal scalability in both the computing and storage layers. Therefore, we can gradually add servers according to the data growth and application read and write traffic. This helps ensure efficient hardware usage and prevents many resources from being idle in the early production stage of an archiving database.
You may wonder what happens when MySQL’s data size hits 1 TB, and disk space is scarce. Can we double the disk space and increase the memory capacity to win more time for engineers to shard a database?
Actually, when a database has terabyte-scale data, it might be difficult to implement database sharding for an application that has been there for a long time. If it’s possible to archive old data to maintain the data volume at a stable level (but the database still stores terabytes of data), we can also upgrade hardware to improve database performance.
We have two ways to use Redis to handle high concurrent reads and writes:
The application front-end directly reads and writes to Redis. The application back-end smoothly and asynchronously persists data to MySQL or TiDB. In this case, MySQL and TiDB serve as the persistent layer for the Redis data. When we’re designing a system, if we predict high concurrent reads and writes in the production environment, it works to use Redis as a buffer layer.
For some social-related applications at Shopee, their peak traffic during a big promotion campaign is dozens of or even hundreds of times higher than usual. These applications are typical performance-critical applications. If the R&D team doesn’t anticipate it and the application still directly reads and writes to the database, the database will crash when the traffic surges in the promotion campaign. In this scenario, Redis is an ideal solution to alleviate the back-end database’s peak traffic.
What if the whole Redis cluster goes down? We have two solutions:
We still use the application to read and write to the database. We can use Shopee’s Data Event Center (DEC), a middleware that continuously parses the MySQL binlog, reorganizes results, and then writes to Redis. This way, intensive read-only traffic could be moved to Redis. This greatly reduces the load on the database.
When we rebuild data into Redis, we can customize data structures for specific query patterns. SQL is not suitable for implementing some queries. Sometimes, using Redis to run these queries is more efficient.
In addition, compared with the application writing data to the database and Redis simultaneously, rebuilding data in Redis by parsing the MySQL binlog is more beneficial. Its application implementation is easy, so developers don’t need to be aware of the data replication logic from the database to Redis.
But its shortcoming is write latency. Data is written to the MySQL primary and then sent to Redis. In this process, there may be dozens of milliseconds of latency. If we want to use Redis this way, we need to know whether the application accepts such latency.
When we query new orders in real time, we usually avoid high-frequency read-only queries on MySQL primaries by this way. To avoid the impact of secondary replication delays, we had to route queries for some critical columns of Shopee order tables to MySQL primaries. In a big promotion campaign, the primary might be overwhelmed. Therefore, we changed our approach and first wrote new order data to MySQL and then transformed binlog parsing results to Redis. This way, we effectively relieved the pressure on MySQL primaries.
When developers migrate data from MySQL to TiDB, DBAs often remind developers to adapt data structures and code to TiDB.
Here is an example. A system was previously implemented with MySQL sharding. This solution equally divided all data into 1,000 tables. After we migrated to TiDB, we stopped sharding and merged these tables into a single table.
After we finished migration and the application resumed services, we found that an SQL query’s performance jittered severely. In high concurrent traffic, this query even led to the entire TiDB cluster hanging.
We analyzed this query and found that:
Based on the analysis above, the R&D team introduced Redis. When they transformed binlog parsing results to Redis, they customized a data structure for the query. They switched 90% of the read-only queries to Redis. As a result, the query became faster and more stable, and TiDB’s storage and computing nodes were remarkably reduced.
TiDB is highly compatible with the MySQL syntax. This eases the migration from MySQL to TiDB. But TiDB is a new database, and its implementation is totally different from MySQL’s. So we need to work out a customized solution according to TiDB’s characteristics and the specific application scenarios.
TiDB is an open-source NewSQL database that supports HTAP workloads. It’s MySQL compatible and features horizontal scalability, strong consistency, and high availability. See its architecture.
This section gives an overview of the TiDB cluster status at Shopee and the scenarios in which we use TiDB.
At Shopee, we use TiDB in three scenarios:
This post covered Shopee’s thoughts on how to select a relational database and our experience using MySQL, TiDB, and Redis.
In brief, if your data size is small and your application is in its early stages, MySQL is a good choice for you. You also don’t need to compromise your application design for sharding. That’s because as your business develops and data volume grows, you can smoothly migrate from MySQL to TiDB. Your application will be horizontally scalable while application development is still flexible. Meanwhile, you can utilize Redis to speed up queries and relieve the pressure on the database, so that you can focus more on throughput and strong consistency.
We chose TiDB because of its open-source, horizontally-scalable, and MySQL- compatible features. In the past two years, we’ve witnessed TiDB’s fast development and notable progress. TiDB is one of the most important database infrastructures at Shopee, and we’re using it in more and more scenarios. In the future, TiDB will serve more traffic for Shopee.
We would like to thank PingCAP members and the TiDB community for building TiDB and providing helpful support for us.
Shopee is the leading e-commerce shopping platform in Southeast Asia and Taiwan. It was launched in 7 markets in 2015 to connect consumers, sellers, and businesses in the region.
Shopee offers an easy, secure, and engaging experience that is enjoyed by millions of people daily. It offers a wide range of categories, supported by integrated payments and logistics, as well as popular entertainment features tailored for each market. Shopee is also a key contributor to the region’s digital economy with a firm commitment to helping brands and entrepreneurs succeed in e-commerce.
Shopee is a part of Sea Limited (NYSE:SE), a leading global consumer internet company. In addition to Shopee, Sea’s other core businesses include its digital entertainment arm, Garena, and digital financial services arm, SeaMoney. Sea’s mission is to better the lives of consumers and small businesses with technology.
Elevate modern apps with TiDB.