HTAP Summit 2024 session replays are now live!Access Session Replays
Mercari TiDB HTAP Summit Featured Image

At HTAP Summit 2024, Takashi Honda, a Database Reliability Engineer at Mercari, shared insights into why distributed SQL has become indispensable for SaaS platforms. Drawing from his extensive experience optimizing large-scale infrastructures, Honda illustrated how traditional databases like MySQL and PostgreSQL often reach their limits in performance, scalability, and operational inefficiencies as businesses scale

In this blog, we delve into Honda’s exploration of distributed SQL, the critical KPIs for scaling SaaS platforms, and the advantages this evolved database architecture offers for data-intensive workloads. By focusing on real-world scenarios and proven strategies, Honda’s session offers a blueprint for SaaS leaders who want to future-proof their platforms in an era of rapid data growth.

The Mercari SaaS Ecosystem: What You Need to Know

Mercari, a leading C2C marketplace in Japan, serves as a platform for individuals to buy and sell items they no longer need. With the largest customer and item base among Japanese marketplaces, Mercari’s backend relies heavily on a vast MySQL infrastructure spanning over 40 terabytes across more than 50 servers. However, managing this on-premise setup has presented significant scalability, elasticity, and development efficiency challenges.

Overcoming MySQL Operational Hurdles

Mercari’s MySQL servers, characterized by vertical sharding instead of horizontal sharding, faced unique issues due to cross-transactional operations in its C2C marketplace. Honda said these challenges manifested in three key areas:

  1. Development: The lack of connection between vertical sharding units and application logic complicated feature development. Developers had to manually manage table and shard mappings, while cross-shard table joins were prohibited.
  2. Scalability: The company’s on-premise setup hampered the team’s ability to scale efficiently, as launching new instances took approximately 1.5 days.
  3. Elasticity: Sudden traffic surges were difficult to handle due to the lengthy setup times for new instances.

Migration to managed MySQL solutions like Amazon RDB, Amazon RDS, or Google Cloud SQL wasn’t possible due to Mercari’s extensive resource requirements and the projected limits of such services.

How Mercari Landed On a Viable Solution: Enter Distributed SQL

To address these challenges, the Mercari team, led by Honda, initiated a proof of concept (POC) for migrating to TiDB, an open source, distributed SQL database solution. For Mercari, TiDB addressed the following key pain points:

  • Rapid Scaling: A significant improvement over current launch times.
  • Horizontal Scalability: Enabling data to scale out seamlessly.
  • MySQL Compatibility: Minimizing the need for backend code changes.
  • Lower Capacity Limitations: Accommodating future service growth.
  • Operational Benefits: Reducing security risks and the dependency on rare specialized skills associated with on-premise setups.

With these advantages, Mercari decided to evaluate TiDB’s performance through a comprehensive POC. We’ll explore some of these performance tuning methodologies throughout the remainder of this blog.

The Mercari Performance Testing Approach

Given the complexity of Mercari’s microservices architecture, simulating the company’s entire production traffic was impractical. Therefore, Honda said the team devised a testing methodology centered around representative queries, categorized into four types:

  1. High-frequency, low-latency queries: These quick, lightweight queries are critical for user interactions like browsing item details or searching products. Given their high frequency, even minor slowdowns could cascade into noticeable performance issues. Optimizing these ensures seamless day-to-day operations.
  2. Low-frequency, high-latency queries: Though rare, these resource-intensive queries can strain the system when executed. Including them in testing ensures the system can handle heavy loads without disruptions during peak events or complex tasks.
  3. Transactional queries, including select queries: These are core to Mercari’s workflows, handling critical actions like purchases, bids, or inventory updates. Optimizing them ensures smooth transaction pipelines, vital for maintaining user trust and business continuity.
  4. Queries from the most critical shard: Some database shards, managing higher traffic or sensitive data, are more impactful. By focusing on queries from these shards, the team ensured optimal performance where it mattered most.

To achieve realistic QPS (queries per second), the team combined custom test scenarios with SysBench’s implemented scenarios. Their goal was to reach a throughput of 1.5 million QPS, a projection based on near-future requirements.

Takashi Honda presenting at HTAP Summit 2024
Mercari Database Reliability Engineer Takashi Honda delivering his talk at HTAP Summit 2024.

Mercari’s Key Tuning Strategies for Distributed SQL

Mercari fine-tuned TiDB to meet their workload demands, focusing on concurrency, throughput, and log management to ensure top performance and scalability.

Optimizations in Concurrency and Throughput

  • gRPC Connection Count and Max Batch Wait Time: Adjustments to these parameters allowed Mercari to efficiently handle large numbers of TiDB and TiKV nodes, reducing request overhead while maintaining high throughput.
  • gRPC Concurrency: Increasing the number of worker threads improved the handling of read-heavy workloads, enhancing overall throughput despite a corresponding increase in CPU usage.

Log Flushing Optimization

  • Store IOProcess: Altering the number of threads responsible for flushing logs to the disk aimed to improve latency for update queries. However, this adjustment had a limited impact on Mercari’s predominantly read-heavy workload.

Performance Results

Through these tuning efforts, Honda said Mercari achieved over 3 million QPS—double the initial target—and peaked at 4.5 million QPS under optimal conditions. These results demonstrated the viability of TiDB as a scalable and high-performance solution for Mercari’s workload.

How Mercari Addressed Latency Challenges

Latency is crucial for high-demand applications like Mercari’s marketplace. To evaluate TiDB’s impact, the team analyzed and tested its performance under various conditions.

Latency Observations

TiDB’s inherent architecture introduces higher latency compared to MySQL, with typical read latencies in TiDB ranging from 3–4 ms versus MySQL’s sub-1 ms performance. Under high-load scenarios, the latency gap widened significantly. Honda said the team estimated an overall 50% slowdown in latency compared to MySQL.

Latency Injection Testing

To assess the real-world impact of increased latency, Honda said the team conducted latency injection tests in the production environment. By placing ProxySQL servers between MySQL servers and clients, they simulated the delay without disrupting customer operations.

A diagram depicting the Mercari team's latency injection testing with ProxySQL servers between MySQL servers and clients.
Figure 1. A diagram depicting the Mercari team’s latency injection testing with ProxySQL servers between MySQL servers and clients.

Key advantages of ProxySQL include:

  • Compatibility with MySQL protocols, requiring no changes on the client side.
  • Easy rollback capabilities, enabling latency adjustments in real time.
  • Scalability, ensuring seamless integration into Mercari’s extensive infrastructure.

Honda noted that the results revealed no significant impact on customer experience, marketplace metrics, or query operations. This only affirmed the acceptability of TiDB’s latency profile for Mercari.

Conclusion

Encouraged by the POC results, Mercari has embarked on a phased migration of its MySQL servers to TiDB clusters. Plans include consolidating these clusters to optimize operational costs and streamline management. This ambitious project underscores Mercari’s commitment to scaling its infrastructure in alignment with business growth.

Want to transform your SaaS platform with a scalable, high-performance data architecture? Dive deeper into the strategies and insights shared at HTAP Summit 2024 by watching Mercari’s full session. Don’t miss this opportunity to future-proof your SaaS platform and stay ahead of the competition. Happy learning!


Watch Now


HTAP Summit 2024 session replays!

Watch Now

Have questions? Let us know how we can help.

Contact Us

TiDB Cloud Dedicated

A fully-managed cloud DBaaS for predictable workloads

TiDB Cloud Serverless

A fully-managed cloud DBaaS for auto-scaling workloads