
Elevate modern apps with TiDB.
Authors:
Ninja Van is Southeast Asia’s fastest growing last-mile logistics company. We are now in six countries in Southeast Asia. Our customers include Amazon, Shopee, Lazada, Lineman, GradExpress, Zilingo, Tokopedia, and Sendo.
We deliver upwards of 1.5 million parcels a day. As our data size rapidly grew, our databases faced great pressure, and we had significant issues in ProxySQL, sharding, and Galera. After we compared Vitess, CockroachDB (CRDB), and TiDB, an open-source, MySQL-compatible, distributed SQL database, we found that TiDB is an optimal solution. Now, we’ve achieved database scalability with TiDB on Kubernetes (K8s).
This post discusses our applications’ pain points, why we chose TiDB over Vitess and CockroachDB, how we’re using TiDB on K8s, and our future plans with TiDB.
We deliver millions of parcels each day. Nearly all of our system applications are deployed on top of K8s. Our Online Transactional Processing (OLTP) databases are still on virtual machines (VMs), and most of our databases are on MySQL. We have over 100 microservices, and we mainly rely on MySQL for OLTP storage. Generally, our applications exhibit very high read to write ratios.
Our databases include:
We ran MySQL with a Galera plugin, which formed a high availability (HA) multi-primary solution. This solution allowed all the nodes in the cluster to function as the primary database, so we could read or write to any of them. As the following diagram shows, we kept writes on a single node, and we did read/write splitting across the nodes:
In this diagram:
The cluster diagram indicates that one node is a “virtual primary database.” When you have replication, and if replications are asynchronous, you sometimes have delays going to the other nodes. If a task requires strict consistency—like deductions from an account—the writes will go to the virtual primary. If you want to have a consistent read, you should also read from the virtual primary, because you risk getting stale data if you read it from the other nodes.
We can use ProxySQL to achieve consistent reads, but it’s not very write scalable. We also faced issues with sharding and Galera.
One of the ProxySQL use cases is that you can perform sharding. However, sharding has drawbacks:
JOIN
operations. You need to shift so-called “cross-shard joins” to the application layer.Based on these reasons, we’ve refrained from doing application-level sharding.
Galera has these issues:
Another challenge was that our OLTP volume and write volume were growing rapidly, and we also needed to add more schemas. We needed a new, easier-to-use database. But what kind?
Our major requirements were:
We explored a few options, including Vitess, CockroachDB, and TiDB. We found that TiDB fulfills all our requirements. After we tested TiDB, we were satisfied with its performance in OLTP and Online Analytical Processing (OLAP) scenarios.
For Vitess, we’ve done a thorough investigation and exploration. Here are some points we think are missing in Vitess:
READ COMMITTED
semantics when executing cross-shard queries. This differs from MySQL and TiDB, which defaults to REPEATABLE READ
.We chose TiDB instead of CockroachDB, because:
Our test cluster specifications were:
Instance | Google Cloud Platform machine type | Instance number | Storage |
Placement Driver | n1-standard-4 | 3 | network SSD |
TiKV | n1-standard-16 | 3 | local SSD * 5 (1875 GB) |
TiDB | n1-standard-16 | 2 | – |
Our database size was ~6 TB.
For OLTP scenarios, we did the benchmarking with Sysbench and customized Lua scripts. We are quite satisfied with the performance of TiDB in general OLTP scenarios.
In Ninja Van, many operations rely on local warehouse, fleet, ops, and Business Intelligence queries on the real-time data. Currently, they run queries via Redash on MySQL secondaries. Here are some queries we compared between MySQL and TiDB. The SQL statements are omitted due to business sensitivity. You can see from the table that TiDB improved performance of some queries by up to 100 times.
Query name | Avg. MySQL runtime (sec.) | Avg. TiDB run time (sec.) | Performance boost | Comment |
Complete address For AV Unverified Addresses |
207.92 |
33.4 |
6.22 x | Multiple joins on indexed columns |
Order Tags Partnerships SME Prio |
281.94 |
4.67 |
60.37 x | Nested joins |
RTS For AV |
193.17 |
14 |
13.79 x | |
Rider Route Audit – Individual |
207.48 |
16 |
12.96 x | Multiple joins with complex WHERE conditions |
Damage Ticket Creator |
223.85 |
2 |
111.92 x | Multiple joins with GROUP BY |
TiDB is an open-source, MySQL-compatible, NewSQL database built by PingCAP and its open-source community.
It features horizontal scalability, strong consistency, and high availability. It’s a one-stop solution for both OLTP and OLAP workloads. You can learn more about TiDB’s architecture here.
TiDB Operator is an automatic operation system for TiDB clusters in K8s. It provides a full management lifecycle for TiDB, including deployment, upgrades, scaling, backup, failover, and configuration changes.
As shown in the diagram below, there are two main parts:
After you deploy the cluster, TiDB Operator takes note of any changes you make. It then makes any adjustments needed to make your current state match the desired state.
Let’s look at some examples.
Let’s say we only have one TiKV node (TiDB’s storage engine), and we want to scale up to three. Simply change the replicas
number:
replicas: 3
The Pods(default) screen immediately updates to show three TiKV nodes running.
To backup your cluster to Google Cloud Storage (GCS), you define your backup parameters in a YAML file and then apply it to K8s. TiDB Operator performs the backup.
These operations run quite smoothly. However, you need to be aware of some pitfalls.
There is a huge performance increase between local and regional solid-state drives (SSDs). So you might be very motivated to run your TiDB cluster on the local SSD.
However, before you try that, keep in mind the following:
Developers have worked hard to make TiDB as MySQL compatible as possible. However, there are still a few differences to be aware of:
_general_ci
, and are case insensitive. TiDB’s default collations are binary collations, end in _bin
, and are case sensitive.new_collations_enabled_on_first_bootstrap
.Out of memory (OOM) errors are inevitable. However, if you carefully set the following parameters, you’ll have a three-part line of defense against them.
Task | Parameter |
Based on your machine specifications, set an appropriate memory quota for queries. | Specify mem-quota-query . |
If the query exceeds the memory quota, specify whether to cancel the query. | Set oom-action to “cancel”. |
If a single SQL statement exceeds the memory quota, use temporary storage. | Enable use-tmp-storage . |
Even with these preventative steps, you still might have an OOM error sometimes. For example, if you have a lot of queries and they each use a lot of memory, an OOM could happen. So observe your queries carefully.
For OLTP scenarios, we recently started migrating all our microservices to TiDB. As a logistics company, we face similar challenges as the e-commerce industry during peaks like the 11.11 shopping day. All of our microservices are backed by Kubernetes and auto-scaling. However, when the database is the bottleneck, we can’t respond fast enough. TiDB’s horizontal scalability and high availability help a lot in such scenarios.
In OLAP scenarios, some of the queries are inefficient not only because they are suboptimal, but also because our database schemas are designed for OLTP, not OLAP. Database normalization is not always helpful. To solve this, we are exploring a real-time data warehousing solution with Apache Flink and TiDB + TiFlash to build wide tables in real time.
We all believe that TiDB is the perfect choice for our next-generation HTAP database solution. However, Rome was not built in a day; there are some milestones we would like to achieve during our transition to TiDB.
TiDB is highly MySQL compatible, and it makes our lives much easier. However, as a distributed system, TiDB has certain characteristics that are fundamentally different from MySQL. For example, in MySQL (with the InnoDB storage engine), the auto-increment primary key is widely used and considered a best practice. This is mainly due to the way that MySQL InnoDB indexes and stores the data. Having an auto-increment primary key can avoid unnecessary data movement, page split, and memory fragmentation.
In TiDB, data is stored in TiKV as regions. Having an auto-increment primary key will lead all the write traffic to one of the TiKV instances before one region is filled. As a result, the write traffic is imbalanced across the cluster.
Most of our knowledge about MySQL still applies to TiDB, but not all of it. We need a change of mindset to release the full potential of a distributed system like TiDB.
Currently, we rely on Maxwell to capture DB changes and sync them to a data lake. Maxwell can capture the executed SQL statements, and we used it to customize some logic on the consumer side. However, TiCDC cannot capture the SQL statements as it receives changes on TiKV, and it doesn’t have information from the SQL layer. We have no choice but to introduce a MySQL instance as a relay to forward the SQL statements. It’s not the most elegant approach, and it introduces extra complexity to the whole structure, but it works. We will see how we can improve this in the future.
We hope we can migrate from MySQL to TiDB on K8s with minimal downtime, and we’ve developed a plan for this as well. It requires two-way sync between the original MySQL instance and the TiDB cluster we want to migrate to. First, migrate MySQL to TiDB to move the data, and then, as a plan B, synchronize MySQL with TiDB to keep the MySQL instance in sync. If anything happens, we can rollback to MySQL in the shortest time.
If you’d like to learn more about our experience with TiDB, or if you have any questions, you can join the TiDB community on Slack.
Elevate modern apps with TiDB.