tidb_feature_1800x600 (1)

With the release of TiDB 8.5, TiDB BR (Backup & Restore) has made a significant change: Full-table checksum verification is now turned off by default during backups. This update boosts backup efficiency by cutting unnecessary overhead while keeping data integrity intact.

In this post, we’ll explain how TiDB has optimized backup verification, the expected performance benefits, and why full-table checksums might no longer be essential for most users.

A Smarter Approach to Backup Verification

Previously, TiDB BR’s checksum process involved two full-table scans:

  1. The first scan during backup to extract data and compute checksums.
  2. A second scan, via coprocessor requests, to validate the backup’s integrity.

While effective, this method added significant performance costs, especially with large datasets. Each backup essentially doubled the I/O workload, leading to longer backup times and higher resource use.

Real-world experience shows that such rigorous checks are often unnecessary. Instead of relying on intensive full-table scans, TiDB BR now uses more efficient validation techniques that ensure reliability without extra performance costs.

Maintaining Backup Integrity

Turning off full-table checksums by default doesn’t compromise backup integrity. TiDB BR includes multiple safeguards:

  • Backup Range Integrity Validation: Ensures all backup ranges are complete, so no data is missed.
  • File-Level Checksum Verification: Every backup file has built-in checksum validation, which stays enabled by default. This detects corruption or file loss after the backup and before the restore.
  • Restore Checksum Validation: During data restoration, TiDB BR automatically verifies the consistency of SST files using backup metadata and performs file-level checksum validation for each SST file.

These mechanisms provide strong reliability without requiring a costly second scan.

Performance Benefits of the New Approach to Backup Verification

By avoiding redundant full-table scans, disabling checksum by default delivers several advantages:

  • Faster Backups and Restores: Eliminating the second full-table scan significantly reduces backup duration, improving efficiency, especially for large datasets.
  • Lower Cluster Impact: Backup operations now consume fewer resources, minimizing disruption to normal database performance.
  • Simplified Workflow: In most cases, checksum validation isn’t necessary, making backup execution more straightforward.

Handling Advanced Scenarios

While the default settings work for most users, TiDB BR still allows full-table checksum verification when needed. This is particularly useful for:

  • Debugging or Release Testing: Developers who need to verify backup integrity during testing.
  • Handling Non-Transactional Writes: In rare cases where non-transactional writes (e.g., Lightning local imports) occur before the backup timestamp, enabling checksum can help detect anomalies.
Scenario–checksum = false–checksum = true
Backup file corruptionYESYES
Backup file lossYESYES
Backup metadata file corruption/lossYESYES
Logic bugs in implementation (e.g., missing backup ranges due to KV changes)NOYES
Non-Transactional WritesNOYES

If checksum verification is required but performance impact needs to be minimized, users can leverage TiDB’s background task resource control to limit resource consumption. More details can be found here.

Conclusion

TiDB BR’s updated approach to backup verification reflects its commitment to performance optimization and usability. By prioritizing efficient validation techniques, users can achieve faster backups while maintaining strong data integrity.

For advanced scenarios requiring full-table checksum verification, the option remains available, ensuring flexibility. This balance between performance and verification helps TiDB meet the practical needs of modern database operations.

Please note that while the default value switch occurs from TiDB 8.5 onward, BR’s underlying verification mechanism has been in place since the beginning. For older versions of BR, you can also disable --checksum during full backups to improve performance.

If you have any questions about TiDB BR’s updated approach, please feel free to connect with us on TwitterLinkedIn, or through our Slack Channel


Experience modern data infrastructure firsthand.

Try TiDB Serverless

Have questions? Let us know how we can help.

Contact Us

TiDB Cloud Dedicated

A fully-managed cloud DBaaS for predictable workloads

TiDB Cloud Serverless

A fully-managed cloud DBaaS for auto-scaling workloads