An important idea in the database world is that specialized databases will outperform general-purpose databases. Michael Stonebraker, an A. M. Turing Award Laureate and one of the most influential people in the database world, also discussed this in his paper, “One Size Fits All: An Idea Whose Time Has Come and Gone.”[1]
This is a rational judgment, because it’s tough enough to build a database that supports either Online Transactional Processing (OLTP) or Online Analytical Processing (OLAP) workloads, let alone one that supports both at the same time. But the dilemma is, today, many users are facing increasing demands with mixed OLTP and OLAP workloads. How do we crack this dilemma?
We, at PingCAP, provide a solution: TiDB, a Hybrid Transactional and Analytical Processing (HTAP) database that can handle mixed workloads. In this post, I will unveil the mystery of HTAP, and how an HTAP database can help users solve their problems.
HTAP ≠ OLTP + OLAP
As our section title indicates, an HTAP is NOT a straight integration of OLTP and OLAP.
A good analogy is a motorhome. It’s sometimes called “a home on wheels,” but it really isn’t a combination of a car and house. Instead, a motorhome is a unique experience—a special product to meet special needs. So is HTAP.
HTAP is designed for special scenarios, not solely OLTP, OLAP, or a combination of the two.
The rise of HTAP scenarios
In recent years, the demand for real-time data processing and analytics has grown rapidly. Traditional databases that specialize more in offline data processing are failing to meet users’ growing needs. There are two major reasons behind this.
First, the technology stacks for real-time data processing have been constantly developing and maturing. Take the big data ecosystem as an example. The real-time computation framework has evolved from Apache Storm with simple semantics, to Apache Storm with Trident on top of it, and then to Apache Flink with complex semantics and supplemented by built-in state storage. Only now, after all these changes, the stream processing framework has been widely adopted in many complicated real-time analytical scenarios. These frameworks are paired with downstream sinks with different characteristics, which in turn accelerate the innovation of real-time applications.
In addition, users keep trying new ideas to digitize their business operations in real time; technology stacks become easier to use; and the development of database technology also stimulates the prevalence of real-time applications.
Second, the digital transformation process is speeding up in many traditional industries, generating new demands. Processing tasks that were once impossible are now requirements for a well-run business.
Take China’s express delivery industry as an example. As its market size continues to expand, delivery orders have grown enormously. The real-time monitoring and analytics of those orders has become a must and can help optimize all aspects of operations such as real-time delivery route optimization and penalty management. Traditional off-line analytics can’t meet these demands—especially during large shopping carnivals when peak transactions occur.
Today, more and more users are facing scenarios with mixed workloads, rather than pure OLTP or OLAP. We call these HTAP scenarios. Traditional OLAP solutions are too cumbersome to meet the new demands, as are pure OLTP databases. What users really want is a solution that is in between OLTP and OLAP databases.
HTAP—PingCAP’s solution
At PingCAP, our product strategy is that TiDB is an OLTP-oriented database supplemented with an OLAP capability. That is to say, our ambition is in the fields of OLTP and HTAP. Since we are discussing HTAP today, I will skip the OLTP part and focus on HTAP.
I explained earlier why HTAP is not a straight combination of OLTP and OLAP. I can explain a bit more from our own experience.
Previously, we faced scenarios with many data hub applications. Users intended to converge data from data silos on different business lines to the same real-time centralized data store, and then deliver data services and analytics on top of it. In another case, users planned to build a read replica replicated from their OLTP database. This replica was used to support separate analytics and data serving workloads and respond to unlimited queries and analytical services.
The scenarios above require the database to:
- Have a distributed architecture similar to traditional data warehouses to uphold data aggregation with similar scale.
- Ensure data consistency and real-time performance as transactional databases do, and also provide index-based data recall and large-scale analytics based on columnstore.
- Connect smoothly with offline data warehouses.
That is to say, the database has to focus on mixed OLTP and OLAP workloads. It can also be lighter than traditional data warehouses because it does not need to:
- Have complicated computing models inside for offline scenarios.
- Support petabytes of data storage; the amount of real-time data usually does not reach the limit of a data warehouse’s cold storage.
We have also met with scenarios where the major task was transactional processing but real-time analytics was occasionally required.
HTAP is what the scenarios above are all about. TiDB’s HTAP capability is designed for those requirements, and it is real-time, agile, and light-weight:
- Users can get access to transactions and make analytical queries at the same time through a unified front-end.
- Rowstore and columnstore are kept consistent in real time.
- The row and column resources are isolated, and the replication mechanism ensures load balancing and automatic fault recovery.
- Transactional services are processed in a stateless, stand-alone service node group, while analytical queries are processed in the vectorization accelerated, Massively Parallel Processing (MPP) mode.
The diagram below shows how TiDB handles OLTP and OLAP workloads simultaneously and independently.
TiDB’s architecture
In addition, in scenarios with mixed workloads, it is impossible to clearly split complicated tasks and then use different types of databases to cope with them. But, by adopting TiDB, everything is different.
TiDB can be used as a data hub for users to make high-concurrency short queries with complex indexing just like they did with traditional databases. They can also use TiDB’s columnstore and MPP technology on the same logical data to accelerate large-scale, real-time analytics, and its performance is never inferior to traditional specialized OLAP databases. What’s more, TiDB’s cost-based optimizer (CBO) can automatically allocate different types of queries to different storage or computing engines.
User cases
Let’s see how TiDB’s HTAP capability helps our customers solve their problems and achieve their business success.
ZTO Express
ZTO Express is a leading express delivery company in China and one of the largest in the world. They use TiDB as their full-link logistics management platform database.
On their management platform, the status of a delivery order updates constantly. In many cases, the status updates more frequently than new orders coming in. The management platform needs to monitor order status in real time, respond to order queries from mobile applications, and guarantee the analytical reporting is kept in real time.
To meet such requirements, they wanted a database to:
- Support OLTP workloads.
- Support high-concurrency short queries from mobile applications.
- Be able to scale out easily, especially during large shopping festivals when peak transactions occur.
- Support real-time analytics without compromising service performance; the express delivery business is a constant race against time.
This is a typical HTAP scenario with mixed OLTP and OLAP workloads. If they choose traditional database solutions, they have to introduce a very complicated architecture. Instead, TiDB’s HTAP capability can perfectly meet all the requirements above. The image below shows the main process that TiDB handles our customer’s mixed requirements.
The main flow TiDB handles ZTO Express’ mixed requirements
Moreover, TiDB also helped ZTO Express:
- Increase their real-time delivery tracking days from 30 to 45.
- Improve their IT efficiency by 300%.
- Reduce the response time of real-time reporting from 5 minutes to within 1 minute
- Increase the real-time data tracking days from 30 to 45 with a lower cost, when compared to their previous Exadata solution.
A leading internet company in China
This customer is a leading internet company in China. They deploy TiDB in their advertising system to support comprehensive ad queries and monitoring services.
Advertising data is written to TiDB in real time. During peak hours, TiDB has to respond to a maximum of hundreds of thousands of queries per second (QPS) with mixed requests, including advertising details and records, index filtering, and analytical queries related to columnstore and MPP architecture.
For such mixed types of queries on the same piece of data, TiDB is the best choice. TiDB’s HTAP capability can perfectly support real-time data consistency between different systems during data synchronization. In addition, if you use traditional database combinations, it is difficult to manually route diversified query conditions between different storage engines. But TiDB, the HTAP solution with a CBO, can automatically dispatch different queries to different storage or computing engines.
The image below shows the main flow that TiDB handles this customer’s mixed workloads.
The main process TiDB handles this customer’s mixed workloads
Furthermore, TiDB helped this customer:
- Save at least 40% of their server cost due to TiDB’s simpler architecture.
- Maintain a stable performance at a record 250,000 QPS during their peak annual shopping day.
- Support real-time reporting service and uphold both detailed data recall and multidimensional analytics.
You can learn about some of our other customer success stories here.
Summary
HTAP is not strictly a combination of OLTP and OLAP, but a hybrid and unique area. TiDB’s HTAP capability is exactly designed for special scenarios with mixed workloads, and it has three major characteristics: real-time, agile, and light-weight.
Every time we add a new feature to TiDB, we make sure it doesn’t undermine the three characteristics. We don’t want to let TiDB fall into unnecessary competition with traditional database products and lose its advantage.
If you want to know more about HTAP, you can join our Slack channel and talk to TiDB experts. You can also contact us and request a demo.
References:
Experience modern data infrastructure firsthand.
TiDB Cloud Dedicated
A fully-managed cloud DBaaS for predictable workloads
TiDB Cloud Serverless
A fully-managed cloud DBaaS for auto-scaling workloads