zero-downtime upgrades

In the vast landscape of databases, ensuring zero-downtime upgrades and operation continuity remains a challenge. Due to inherent design limitations, traditional databases often introduce significant downtime during upgrades – a challenge that can spell operational chaos for businesses reliant on real-time data access. 

Enter TiDB, a cutting-edge distributed SQL database that offers a solution to overcome this challenge. Built on a robust, cloud-native, and loosely coupled architecture, TiDB introduces online rolling upgrades – a feature that enables zero-downtime upgrades with uninterrupted operations.

In this post, we will explore the unique and easy-to-use upgrading mechanisms of TiDB with a hands-on demonstration. 

The intricacies of zero-downtime upgrades with TiDB

Traditional databases often use “stop-and-wait” techniques, freezing all operations for the time-consuming upgrade process. In contrast, TiDB uses an online rolling upgrade strategy. This approach ensures zero-downtime upgrades by upgrading components in a specific sequence: 

  1. Placement Driver (PD) servers 
  2. The TiKV servers.
  3. The TiDB servers.

Each server upgrades one at a time, ensuring that other servers seamlessly handle the incoming load, resulting in a smooth and uninterrupted upgrade experience.

Here’s a closer look at how each key component contributes to the process:

Figure 1. Auto-upgrading architeture

ComponentDefinitionAuto-Upgrading Mechanism
Placement Driver (PD) ServersPD servers act as the cluster manager, managing metadata, scheduling, and load balancing.During the upgrade, each PD server is upgraded one at a time. If a PD is the current leader, leadership is transferred first, causing only a brief pause in active TSO requests without affecting ongoing transactions or client connections.
TiKV ServersTiKV is the distributed transactional key-value storage layer, responsible for data storage and retrieval.TiKV servers are upgraded one at a time. Before upgrading, the leader for each Region is transferred to another TiKV server, thereby ensuring that ongoing operations are not disrupted.
TiDB Servers (Facilitated by TiProxy)TiDB is the stateless SQL server responsible for SQL query processing, maintaining sessions, and handling transactions.TiProxy assists in the smooth upgrading of TiDB servers by sitting between the network load balancer and the SQL Layer. It migrates client sessions to other TiDB servers during the upgrade, thereby ensuring zero disruption to client applications.

This upgrade mechanism ensures that at each stage of the upgrade, the client experiences zero downtime and continues to interact with the database as if nothing has changed. In TiDB’s world, upgrades are not an interruption but a seamless transition. To learn more about how the upgrade mechanism works, see Maintaining Database Connectivity in Serverless Infrastructure with TiProxy.

A hands-on demonstration of TiDB’s zero-downtime upgrades

To provide a tangible illustration of TiDB’s zero-downtime upgrade capability, let’s walk through a real-world demonstration using a self-hosted TiDB cluster. While fully-managed TiDB Cloud provides these capabilities out-of-the-box, a self-hosted environment allows for a more detailed exploration of the upgrade process. 

We conducted the demo on AWS. We have provided a step-by-step guide with detailed scripts, programs, CloudFormation templates, and workflow so that you can do it yourself. Feel free to refine or reproduce it in other cloud-based demonstration implementations. In this section, we will only focus on the observations through the upgrading process.

Best practice and recommendations 

While the primary focus is on demonstrating zero downtime during upgrades, TiDB’s architectural design also allows for proactive scaling. This is particularly useful if your workload relies heavily on parallel processing. We recommend the following practices for online upgrading:

  • Before the upgrade: Proactively scale out a TiDB server instance to ensure a smooth rolling upgrade. This scaling can also extend to TiKV server instances depending on your workload requirements.
  • After the upgrade: TiDB allows you to scale in, effectively saving on operational costs. You can manage the scaling either manually or through TiDB’s auto-scaling solutions.

Pre-upgrading observations 

We have set up three terminal windows for the demonstration:

Figure 2. Terminal setup

  • Top terminal: Running the TiProxy service.
  • Middle terminal: Displays our sample application with four active database connections. These connections are inserting data into the databases at a uniform frequency, routed through a network load balancer and the TiProxy service.
  • Bottom terminal: Query events inserted by the example application and show the number of insert requests processed by each TiDB server.

As you can see, there are two TiDB servers actively processing approximately 170 events each. They are receiving an equal number of connections are processing an equal number of requests. Note that we used two TiDB here for both high availability and smooth upgrading. 

The middle terminal and bottom terminal together represent the application workloads – one for writing and the other for reading.

Before the upgrading, we confirmed the TiProxy status in the AWS console:

Figure 3. TiProxy status in the AWS console

And we also confirmed that The current TiDB version is v6.5.1, and our target upgrade version is v6.5.2. 

Figure 4. Initial cluster status

The upgrade process

Initiating the upgrade to version 6.5.2 was as straightforward – simply execute the following command:

$ tiup cluster upgrade tidb-demo v6.5.2 --yes

Here’s how the different components were upgraded sequentially:

Placement Driver (PD) Servers: Upgraded one-by-one without causing any interruptions to the sample application.

Figure 5. PD upgrade process

TiKV Servers: Each TiKV node was upgraded sequentially. The leader role for each Region was transferred to another server before proceeding with the upgrade. Again, no disruptions were observed.

Figure 6. TiKV upgrade process

TiDB Servers: TiProxy played a pivotal role here. Before upgrading each TiDB server, TiProxy moved its active sessions to another TiDB server, ensuring uninterrupted service. For example, before upgrading TiDB server with IP 1.216, TiProxy migrated its hosted sessions to server 3.163.

Figure 7. TiDB upgrade process

Throughout the upgrade process, the four sessions in our sample program remained active, and the client did not notice the upgrade process. 

Conclusion

The upgrade process went seamlessly without requiring any client-side intervention. This lives up to TiDB’s promise of zero-downtime upgrades. TiDB’s cloud-native architecture aims to alleviate those concerns. Its online rolling upgrade capability stands as a testament to achieving a truly cloud-native, loosely coupled architecture, offering peace of mind and operational excellence to organizations of all sizes.

Moreover, for users who prefer a fully-managed cloud DBaaS, TiDB Dedicated and TiDB Serverless offer all these capabilities out-of-the-box. The DBaaS options eliminate operational concerns such as provisioning, upgrading, and scaling, making it easier for businesses to focus on their core competencies.

Interested in gaining a deeper understanding of TiDB’s key capabilities and fundamentals? Check out PingCAP University and enroll in one of our free training courses.


Sign Up Today


Experience modern data infrastructure firsthand.

Try TiDB Serverless

Have questions? Let us know how we can help.

Contact Us

TiDB Cloud Dedicated

A fully-managed cloud DBaaS for predictable workloads

TiDB Cloud Serverless

A fully-managed cloud DBaaS for auto-scaling workloads