Zhihu’s Guide to Petabyte-Scale TiDB Database Migration

Author: Xiaolei Dai, Database lead at Zhihu

Efficient and reliable data migration is crucial for large-scale online services. Zhihu, one of China’s largest knowledge-sharing platforms, recently undertook the complex task of online migration for dozens of TiDB databases, totaling petabytes (PB) of data. This blog post, based on the expertise of Zhihu’s database lead, Dai Xiaolei, highlights the strategies and solutions used to accomplish seamless data migration between data centers within the same city.

By examining three specific scenarios, this post explores the use of TiDB’s advanced features, such as Placement Rules and TiCDC (TiDB’s Change Data Capture component), to ensure high availability, minimize downtime, and maintain data integrity. Whether migrating within the same Kubernetes environment or synchronizing clusters via TiCDC, the detailed guidance presented here covers every aspect of the migration process—from resource allocation to post-migration monitoring.

Zhihu’s experience provides valuable insights for organizations looking to implement reliable migration strategies for large-scale TiDB databases.

The Complete Migration Process

With Zhihu’s experience managing large-scale data migrations, the process begins by ensuring robust infrastructure and resource readiness. From there, the focus shifts to the actual migration and switching plan, leveraging advanced synchronization techniques to ensure seamless transitions between data centers.

Preparing for Petabyte-Scale TiDB Migration

Migrating petabytes of data across data centers is a complex process that requires thorough preparation to ensure minimal disruption to services and data integrity. The initial phase of any large-scale TiDB migration focuses on establishing a stable network and ensuring sufficient resources are in place. By laying this foundation, the migration process can proceed with minimal risks.

Dedicated Line Requirements

To minimize latency, data centers should be located within 150 km of each other, typically in the same city or neighboring cities.
To ensure stable long-term operation, at least two dedicated optical fiber lines must connect the data centers, with a latency of around 3 milliseconds (ms).
Each dedicated line should offer a bandwidth of over 200 Gbps to accommodate high data transfer volumes without bottlenecks.

Resource Requirements

Physical Machines: The new data center must have high-density physical machines offering better configurations than the previous setup. Sufficient planning for hard disk PV (Persistent Volumes), CPU, and memory resources is essential to handle the migration load.
Kubernetes (K8s) Setup: The K8s environment should be fully configured and capable of binding physical machine resources. The chosen K8s version must be stable and compatible with TiDB’s deployment requirements.
TiDB-Operator: The TiDB-operator should be of a higher version, having been thoroughly tested to ensure stability during the migration process.

TiDB Cluster Migration and Switching Plan

Migrating an online TiDB cluster requires a detailed, structured approach to ensure minimal disruption and data integrity. In Zhihu’s case, the groundwork for this process was laid during a previous multi-cloud active-active deployment. Given that Zhihu’s TiDB infrastructure is fully deployed on Kubernetes (K8s), this section outlines the migration plan with a focus on Kubernetes-based clusters.

The overall migration architecture is shown in the diagram above, with the following key points:

From a structural perspective, the upper Kubernetes (K8s) cluster labeled “online” represents the TiDB cluster deployment in the current old data center, which includes all its components. Below that is a newly established K8s cluster where another TiDB cluster is deployed.

The two red boxes in the middle represent the synchronization links. One is based on TiDB’s placement-rule replica placement, which, from the viewpoint of the system, is still a migration link for the same TiDB cluster. The other is a TiCDC-based synchronization link between two different upstream and downstream TiDB clusters (which can have different versions).

The Three Main Migration Methods Used

As we explore the diverse strategies for online TiDB database migration, we can categorize them into three primary approaches: cross-cloud, cross-Kubernetes migrations utilizing placement-rule replica placements, TiCDC-linked active-standby cluster migrations, and other specialized scenarios like business double-writing and night-only operations.

Cross-Cloud, Cross-K8s TiDB Placement-Rule Replica Placement Migration (Used for 60% of TiDB Cluster Migrations)

This approach utilizes TiDB’s Placement Rules to control data replication and placement across different clusters in different data centers (within the same city). The core idea is to split data replication between the old and new data centers while maintaining transparency for business operations.

Migration Architecture

Placement Rules is a replica rule system introduced in PD (Placement Driver) in version 4.0. It guides PD to generate corresponding scheduling for different types of data. By combining different scheduling rules, users can precisely control various attributes of any segment of continuous data, such as the number of replicas, storage location, host type, participation in Raft voting, and eligibility to serve as a Raft leader.

The Placement Rules feature is enabled by default in TiDB version 5.0 and above (in versions before 5.0, it can be enabled using the pd-ctl command: config set enable-placement-rules true). The default behavior after enabling placement rules is as follows:

# ./pd-ctl -i
» config placement-rules show
[
  {
    "group_id": "pd",
    "id": "default",
    "start_key": "",
    "end_key": "",
    "role": "voter",
    "is_witness": false,
    "count": 5
  }

If you want to implement a same-city dual data center replica placement for this TiDB cluster, placing 3 voter replicas in the old cluster and 2 follower replicas in the new cluster, and use label_constraints within each rule to restrict the replicas to their respective data centers, the configuration would be as follows:

[
    {
        "group_id": "pd",
        "id": "default",
        "start_key": "",
        "end_key": "",
        "role": "voter",
        "count": 3,
        "label_constraints": [
            {"key": "zone", "op": "in", "values": ["zone1"]}
        ],
        "location_labels": ["rack", "host"]
    },
    {
        "group_id": "pd",
        "id": "online2",
        "start_key": "",
        "end_key": "",
        "role": "follower",
        "count": 2,
        "label_constraints": [
            {"key": "zone", "op": "in", "values": ["zone2"]}
        ],
        "location_labels": ["rack", "host"]
    }
]

The above configuration completes the replica placement for the new data center. Essentially, this splits the previous 5 voter replicas, which were all in the old data center (id=default), into 3 voters in the old data center and 2 followers in the new, homogeneous cluster. After the data synchronization for the 2 follower replicas is complete, modify the placement-rule configuration to change the region role of id=online2 from follower to voter. This will distribute the region leader across the data centers.

Once the monitoring shows balanced leaders, set the count of the old data center to 0 and the new data center to 5 to supplement the new data center with replicas. After a period when the regions in the old data center drop to 0, the data migration is fully completed. The adjustment is as follows:

[
    {
        "group_id": "pd",
        "id": "default",
        "start_key": "",
        "end_key": "",
        "role": "voter",
        "count": 0,        
        "label_constraints": [
            {"key": "zone", "op": "in", "values": ["zone1"]}
        ],
        "location_labels": ["rack", "host"]
    },
    {
        "group_id": "pd",
        "id": "online2",
        "start_key": "",
        "end_key": "",
        "role": "voter",  
        "count": 5,       
        "label_constraints": [
            {"key": "zone", "op": "in", "values": ["zone2"]}
        ],
        "location_labels": ["rack", "host"]
    }
]

Steps for Multi-Cloud Homogeneous Cluster Migration

Create a Same-Version TiDB Cluster in online2:
- Copy the tc.yaml configuration from the old cluster and modify the necessary settings.
- To allow cross-cloud and cross-Kubernetes communication between the new cluster’s PD (Placement Driver) and the old cluster’s PD, configure the clusterDomain.

Adjust the PD’s priority configuration to keep the PD leader in the old data center during migration to ensure optimal performance. Use the following command:

pd-ctl member leader_priority <member_name> <priority>

Create TiKV Nodes with Same Configuration and Quantity:
- Set up TiKV nodes with the same configuration and quantity as the old cluster.
- Consider the replica placement strategy (role settings: voter/follower/learner) based on the earlier discussed 3:2 placement-rule configuration.
- Why the same quantity? The size of a single replica means that the new cluster needs to accommodate two replicas to avoid overwhelming the resources of the new data center’s TiKV nodes.
- Why the same configuration? Ensure the CPU load and memory usage of the TiKV cluster are sufficient to handle the load after migration.
Optimize Region Creation Speed:

Use the store limit configuration to increase the region creation speed for the new online2 TiKV store. Run the following command:

kubectl exec tidb-test-0 -n tidb-test -- ./pd-ctl store | grep -B 2 'tidb-test.online2.com' | grep 'id":' | awk -F':' '{print $2}' | awk -F',' '{print "store limit "$1" 40 add-peer"}'

How to monitor?
- Check the “Region Health” monitoring, paying attention to changes in miss-peer/extra-peer.
- Alternatively, multiply the leader region count by 2, or use the store region count to estimate.
Note: Unlike other PD parameters (like xxxx-schedule-limit), the store limit restricts the consumption speed of operators, while other limits restrict the generation speed.

Set Configurations to Accelerate Scheduling:
- First, use config show to view the current scheduling default values.

Execute the following commands to adjust the concurrency of region/leader scheduling (monitor PD leader load):

config set leader-schedule-limit 16     ### Controls the concurrency of Transfer Leader scheduling

config set region-schedule-limit 2048   ### Controls the concurrency of adding/removing Peer scheduling

config set replica-schedule-limit 64    ### Controls the number of concurrent replica scheduling tasks

Balance Region Leader to the New Data Center (Follower -> Voter):
- Promote the TiKV in the online2 data center to leader status through placement rules, reducing the region count in the old cluster to zero.
- Consider whether the business can tolerate the increased latency of cross-data-center read/write operations (most can during the migration cycle with controllable impact).
- For latency-sensitive operations, schedule all leaders to be moved to online2 during off-peak hours, and switch the PD leader and business domain to the new data center.
Multi-Cloud Active-Active State:
- The system will persist in this state for some time as balancing region leaders and decommissioning the old data center’s regions may take several days.

When scaling down the old data center store, some regions may be difficult to migrate, necessitating forced deletion. Use the following command to forcibly remove regions in TiKV with

 store id=10:
kubectl exec tidb-test-pd-0 -n tidb-test -- ./pd-ctl region store 10 | jq '.regions | .[] | "\(.id)"' | awk -F'"' '{print "kubectl exec tidb-test-pd-0 -n tidb-test -- ./pd-ctl operator add remove-peer ", $2, " 10"}' > /home/daixiaolei/remove_peer.sh

Switch PD Leader to the New Data Center:
- When switching the PD leader, be cautious of cluster fluctuations and aim for low-peak hours.
- Note: Previously, switching the PD leader with a million regions caused the cluster to freeze for 1 hour, which was resolved using pd-recover.
- Since you’ve already set leader_priority, you can switch the PD leader in two ways:
  - Automatically: Set the PD leader priority in online2 higher than the old data center. The PD cluster will automatically switch based on the adjusted weights.

Manually: Set the PD leader priority in online2 equal to the old data center and manually transfer the leader during off-peak hours using: member leader transfer <member_name>

Switch Business Read/Write to the New Data Center:
- As the replica placement cluster spans both data centers and accesses the same cluster, you only need to migrate the DNS from the old data center to the new one.
- There’s no need to terminate connections; they will automatically release once the container is destroyed.
Scale Down TiDB Servers in the Old Data Center:
- Before scaling down, check cluster-processlist to see if there are any active business connections.
- If there are active connections, locate the corresponding application through the pod’s IP and request the business to switch the master promptly.
- Monitor for any remaining traffic. Once confirmed that there’s no traffic, scale down the TiDB server in the old data center to zero.
Decommission All PD Nodes in the Old Data Center:
- Confirm that the new online2 data center has at least three PD nodes, then decommission all PD nodes in the old data center.
Delete PVCs in the Old Data Center:
- By default, the PVCs of decommissioned TiKV pods are not deleted, so manual deletion of the PVCs is necessary. This will automatically release the PVs.
Delete TC in the Old Data Center:
- Deleting the TC in the old data center carries significant risk, as an error could have a catastrophic impact on the new online2 data center.
- Before proceeding, use kubectl get pods -n tidb-test to ensure the old cluster is empty and double-check with the DBA.
Recycle Physical Machine Resources (Delete Nodes, Power Off for Recycling): The data center migration is complete once the resources are reclaimed.

Advantages

Automatic TiKV Data Placement: This method allows access to all business read and write within the same cluster. TiKV storage nodes can be distributed across different TiDB clusters within various Kubernetes environments, ensuring a seamless migration process entirely transparent to the business.
Minimal Impact on Performance: The configuration of three voter replicas in the old data center enables TiDB writes to be committed by a local majority, which helps maintain performance levels during migration. Additionally, TiDB reads can continue to access the leader in the old data center without disruption.

Disadvantages

Version Compatibility: This replica placement method cannot be used across different TiDB versions; the placement rules must match the current cluster version.
Increased Latency: If a leader is established in the new data center, cross-data-center read/write operations may introduce increased latency. However, this is generally manageable for most business scenarios.

TiCDC-Linked Active-Standby Clusters (30% of TiDB Clusters Migrated Using This Method)

This method involves creating a new TiDB cluster in the new data center and using TiCDC (TiDB Change Data Capture) to sync data between the old and new clusters. The old data center continues processing until the new cluster is entirely up to date.

Advantages

Independent Clusters for Upstream and Downstream: The downstream new cluster can have a version higher than the current cluster, which allows the migration and upgrade to a new version for businesses that need to replace the old TiDB version or want to utilize new version features. However, it is crucial to be aware of SQL compatibility after upgrading TiDB.
Cluster Splitting: Allows splitting multiple core databases from a single TiDB cluster into multiple clusters, improving isolation and stability.
Rollback Capability: Migration via TiCDC can be rolled back at any time. When switching the business to read and write from the downstream cluster, a reverse synchronization task from the new cluster to the old cluster can be created. This will synchronize data from the new cluster back to the old one. The reverse synchronization task can be stopped after confirming that the migration was successful for one week, and then the old cluster can be deleted.

Disadvantages

Higher TiCDC Latency: For read-sensitive businesses, it is impossible to validate read traffic after switching.
Synchronization Limitations: TiCDC has limitations (single worker at 30,000/s). For an incremental synchronization link with an online write throughput of 50,000/s, the link may not be feasible.
BR Backup/Restore: Backing up and restoring a large cluster with BR can take a long time. The gc_life_time must be adjusted to ensure incremental synchronization. However, setting gc_life_time too long will retain a large amount of MVCC data, potentially affecting TiDB read performance.

TiCDC Migration Steps

Create a New TiDB Cluster in online2: According to business requirements, the version can either be the same or upgraded to a higher version.
Adjust gc_life_time of the Old Cluster: For example, set it to 3 days (72h, adjustable as needed).
Use BR Tool: Back up the data from the old cluster’s databases to S3 using the BR tool. Ensure that the BR version matches the TiDB cluster version.
Restore Backup: Use BR to restore the TiDB backup from S3 to the new cluster in online2.

br backup db \
    --pd "${PDIP}:2379" \
    --db test \
    --storage "s3://backup-data/db-test/2024-06-30/" \
    --ratelimit 128 \
    --log-file backuptable.log

br restore db \
    --pd "${PDIP}:2379" \
    --db "test" \
    --ratelimit 128 \
    --storage "s3://backup-data/db-test/2024-06-30/" \
    --log-file restore_db.log

Create a TiCDC Task: Set up a CDC task from the old cluster to the new one (the old cluster needs to deploy the CDC component). Note that you will need to obtain the TSO (Timestamp Oracle) from the backup logs, as this TSO is required for creating the TiCDC synchronization task.
Synchronization Verification: Monitor Grafana for CDC metrics or use the command cdc query changefeed-id to ensure there is no delay. Verify data consistency by checking both the data volume and sample data.
Gradual Traffic Shift: If data verification shows no issues, allow read-insensitive business operations to gradually route read traffic to the new cluster to validate the execution of business SQL.
Switch Write Traffic: Once 100% of the read traffic has been shifted to the new cluster, you can proceed to switch the write traffic, or you can choose to switch read/write operations during off-peak hours (e.g., late at night) for latency-sensitive operations. There are two options for switching write traffic: a. The business side controls the stop-write process and then switches to the downstream cluster. b. The database revokes write permissions for the old cluster to enforce a stop-write. Revoking permissions does not affect established connections, so you must kill all active connections or restart the TiDB server.
Post-Stop Synchronization: After stopping the write operations, monitor the synchronization link to ensure no data is synced. Then, stop the TiCDC synchronization link from the old data center to the new one and establish a reverse synchronization link from the new TiDB cluster to the old data center. This is a safety measure to switch back to the old cluster if any issues arise with the new cluster, ensuring no data loss.
Complete Business Switch: Finalize the TiDB switch by directing all business operations to the new downstream cluster.
Monitor and Decommission: After one week of monitoring, decommission the TiCDC reverse synchronization link and delete the old cluster.
Rollback Plan: If any read/write issues arise within the week, you can switch back to the old cluster anytime.

Other Scenarios (Business Double-Writing, Night-Only Writing – 10%)

These migration methods are more specialized and less commonly used. Still, they can be highly effective for specific types of operations, such as log-based systems or clusters with limited write times. They provide alternative solutions for migrating data with minimal downtime or impact on business operations.

Empty Cluster Business Double-Writing

In this method, businesses that rely on log-based operations can create a new TiDB cluster in the new data center and perform double-writing for an extended period (typically 7 to 30 days). This means that the old and new clusters receive the same data during this time, ensuring that the new cluster is fully synchronized before the switch.

Steps:

Set up a new, empty TiDB cluster in the new data center.
Begin double-writing all incoming data to both the old and new clusters.
After 7-30 days, once the new cluster is fully synchronized and validated, switch all operations over to the new cluster.

Backup with BR/Dumping and Switching Read/Write Operations

This method is ideal for TiDB clusters that only handle write operations at specific times, such as night-only write processes. During the day, when there are no write operations, the data can be backed up and migrated to the new cluster.

Steps:

Use BR (Backup & Restore) or Dumping to create a backup of the old cluster during the day, when there is no active data writing.
Restore the data to the new TiDB cluster using tools like BR or Lightning.
Switch read/write operations to the new cluster during the designated write window (e.g., nighttime).

Conclusion

In conclusion, Zhihu’s journey over the past three months has been nothing short of transformative. They successfully migrated numerous TiDB clusters—amounting to petabytes of data—to their new data center using the methodologies outlined in this blog.

As we look to the future, we encourage organizations seeking scalable and reliable database solutions to leverage TiDB. Its versatility and robust architecture can be a game-changer in navigating the complexities of data management.

Experience modern data infrastructure firsthand.

Try TiDB Serverless

Product

Have questions? Let us know how we can help.

TiDB Cloud Dedicated

A fully-managed cloud DBaaS for predictable workloads

TiDB Cloud Serverless

A fully-managed cloud DBaaS for auto-scaling workloads

Start for Free Learn More

From Plan to Execution: Zhihu’s Guide to Petabyte-Scale TiDB Database Migration

The Complete Migration Process

Preparing for Petabyte-Scale TiDB Migration

Dedicated Line Requirements

Resource Requirements

TiDB Cluster Migration and Switching Plan

The Three Main Migration Methods Used

Cross-Cloud, Cross-K8s TiDB Placement-Rule Replica Placement Migration (Used for 60% of TiDB Cluster Migrations)

Migration Architecture

Steps for Multi-Cloud Homogeneous Cluster Migration

Advantages

Disadvantages

TiCDC-Linked Active-Standby Clusters (30% of TiDB Clusters Migrated Using This Method)

Advantages

Disadvantages

TiCDC Migration Steps

Other Scenarios (Business Double-Writing, Night-Only Writing – 10%)

Empty Cluster Business Double-Writing

Backup with BR/Dumping and Switching Read/Write Operations

Conclusion

Related Resources

TiCDC: Replication Latency in Milliseconds for 100+ TB Clusters

How TiDB Processes DML in Data Migration

Migrating from MySQL to TiDB: A Practical Guide to Distributed SQL Adoption

TiCDC: Replication Latency in Milliseconds for 100+ TB Clusters

How TiDB Processes DML in Data Migration

Migrating from MySQL to TiDB: A Practical Guide to Distributed SQL Adoption

Have questions? Let us know how we can help.

TiDB Cloud Dedicated

TiDB Cloud Serverless