PingCAP
  • Cloud
  • TiDB Academy
  • Docs
  • Success Stories
  • Blog
  • Free Download
PingCAP
  • Cloud
  • TiDB Academy
  • Docs
  • Success Stories
  • Blog
  • Free Download

Contact

中文
文档
v3.0 (stable) dev v2.1This doc does not exist in v2.1
  • Introduction
    • TiDB Introduction
    • Benchmarks
      • How to Test TiDB Using Sysbench
      • How to Run TPC-C Test on TiDB
      • Sysbench Performance Test - v3.0 vs. v2.1
      • TPC-C Performance Test - v3.0 vs. v2.1
      • Interaction Test on Online Workloads and `ADD INDEX` Operations
      • TiDB in Kubernetes Sysbench Test
      • DM 1.0-alpha Benchmark Report
      • DM 1.0-GA Benchmark Report
  • Concepts
    • Architecture
    • Key Features
      • Horizontal Scalability
      • MySQL Compatible Syntax
      • Replicate from and to MySQL
      • Distributed Transactions with Strong Consistency
      • Cloud Native Architecture
      • Minimize ETL with HTAP
      • Fault Tolerance & Recovery with Raft
      • Automatic Rebalancing
      • Deployment and Orchestration with Ansible, Kubernetes, Docker
      • JSON Support
      • Spark Integration
      • Read Historical Data Without Restoring from Backup
      • Fast Import and Restore of Data
      • Hybrid of Column and Row Storage
      • SQL Plan Management
      • Open Source
      • Online Schema Changes
  • How-to
    • Get Started
      • Start a Cluster
        • From Binary
        • From Homebrew
        • From DBdeployer
      • Explore SQL with TiDB
      • Import Example Database
      • Read Historical Data
      • TiDB Binlog Tutorial
      • TiDB Data Migration Tutorial
      • TiDB Lightning Tutorial
      • TiSpark Quick Start Guide
    • Deploy
      • Hardware Recommendations
      • From Binary Tarball
        • For Testing Environments
        • For Production Environments
      • Orchestrated Deployment
        • Ansible Deployment (Recommended)
        • Ansible Offline Deployment
        • Docker Deployment
      • Geographic Redundancy
        • Overview
        • Configure Location Awareness
      • Data Migration with Ansible
    • Configure
      • Time Zone
      • Memory Control
    • Secure
      • Transport Layer Security (TLS)
        • Enable TLS For MySQL Clients
        • Enable TLS Between TiDB Components
      • Generate Self-signed Certificates
    • Monitor
      • Overview
      • Monitor a TiDB Cluster
    • Migrate
      • Overview
      • Migrate from MySQL
        • Migrate the Full Data
        • Migrate the Incremental Data
      • Migrate from Aurora
      • Migrate from CSV
    • Maintain
      • Common Ansible Operations
      • Backup and Restore
        • Use `mydumper` and `loader`
        • Use BR
      • Identify Slow Queries
    • Scale
      • Scale using Ansible
      • Scale a TiDB Cluster
    • Upgrade
      • Upgrade to TiDB 3.1
    • Troubleshoot
      • Troubleshoot Cluster Setup
      • Troubleshoot TiDB Lightning
  • Reference
    • SQL
      • MySQL Compatibility
      • SQL Language Structure
        • Literal Values
        • Schema Object Names
        • Keywords and Reserved Words
        • User-Defined Variables
        • Expression Syntax
        • Comment Syntax
      • Data Types
        • Overview
        • Default Values
        • Numeric Types
          • `BIT`
          • `BOOL|BOOLEAN`
          • `TINYINT`
          • `SMALLINT`
          • `MEDIUMINT`
          • `INT|INTEGER`
          • `BIGINT`
          • `DECIMAL`
          • `FLOAT`
          • `DOUBLE`
        • Date and Time Types
          • `DATE`
          • `DATETIME`
          • `TIMESTAMP`
          • `TIME`
          • `YEAR`
        • String Types
          • `CHAR`
          • `VARCHAR`
          • `TEXT`
          • `LONGTEXT`
          • `BINARY`
          • `VARBINARY`
          • `TINYBLOB`
          • `BLOB`
          • `MEDIUMBLOB`
          • `LONGBLOB`
          • `ENUM`
          • `SET`
        • JSON Type
      • Functions and Operators
        • Function and Operator Reference
        • Type Conversion in Expression Evaluation
        • Operators
        • Control Flow Functions
        • String Functions
        • Numeric Functions and Operators
        • Date and Time Functions
        • Bit Functions and Operators
        • Cast Functions and Operators
        • Encryption and Compression Functions
        • Information Functions
        • JSON Functions
        • Aggregate (GROUP BY) Functions
        • Window Functions
        • Miscellaneous Functions
        • Precision Math
        • List of Expressions for Pushdown
      • SQL Statements
        • `ADD COLUMN`
        • `ADD INDEX`
        • `ADMIN`
        • `ALTER DATABASE`
        • `ALTER TABLE`
        • `ALTER USER`
        • `ANALYZE TABLE`
        • `BEGIN`
        • `COMMIT`
        • `CREATE DATABASE`
        • `CREATE INDEX`
        • `CREATE TABLE LIKE`
        • `CREATE TABLE`
        • `CREATE USER`
        • `CREATE VIEW`
        • `DEALLOCATE`
        • `DELETE`
        • `DESC`
        • `DESCRIBE`
        • `DO`
        • `DROP COLUMN`
        • `DROP DATABASE`
        • `DROP INDEX`
        • `DROP TABLE`
        • `DROP USER`
        • `DROP VIEW`
        • `EXECUTE`
        • `EXPLAIN ANALYZE`
        • `EXPLAIN`
        • `FLUSH PRIVILEGES`
        • `FLUSH STATUS`
        • `FLUSH TABLES`
        • `GRANT <privileges>`
        • `INSERT`
        • `KILL [TIDB]`
        • `LOAD DATA`
        • `MODIFY COLUMN`
        • `PREPARE`
        • `RECOVER TABLE`
        • `RENAME INDEX`
        • `RENAME TABLE`
        • `REPLACE`
        • `REVOKE <privileges>`
        • `ROLLBACK`
        • `SELECT`
        • `SET [NAMES|CHARACTER SET]`
        • `SET PASSWORD`
        • `SET TRANSACTION`
        • `SET [GLOBAL|SESSION] <variable>`
        • `SHOW CHARACTER SET`
        • `SHOW COLLATION`
        • `SHOW [FULL] COLUMNS FROM`
        • `SHOW CREATE TABLE`
        • `SHOW CREATE USER`
        • `SHOW DATABASES`
        • `SHOW ENGINES`
        • `SHOW ERRORS`
        • `SHOW [FULL] FIELDS FROM`
        • `SHOW GRANTS`
        • `SHOW INDEXES [FROM|IN]`
        • `SHOW INDEX [FROM|IN]`
        • `SHOW KEYS [FROM|IN]`
        • `SHOW PRIVILEGES`
        • `SHOW [FULL] PROCESSSLIST`
        • `SHOW SCHEMAS`
        • `SHOW [FULL] TABLES`
        • `SHOW TABLE REGIONS`
        • `SHOW TABLE STATUS`
        • `SHOW [GLOBAL|SESSION] VARIABLES`
        • `SHOW WARNINGS`
        • `SPLIT REGION`
        • `START TRANSACTION`
        • `TRACE`
        • `TRUNCATE`
        • `UPDATE`
        • `USE`
      • Constraints
      • Generated Columns
      • Partitioning
      • Character Set
      • SQL Mode
      • Views
    • Configuration
      • tidb-server
        • MySQL System Variables
        • TiDB Specific System Variables
        • Configuration Flags
        • Configuration File
      • pd-server
        • Configuration Flags
        • Configuration File
      • tikv-server
        • Configuration Flags
        • Configuration File
    • Security
      • Security Compatibility with MySQL
      • The TiDB Access Privilege System
      • TiDB User Account Management
      • Role-Based Access Control
    • Transactions
      • Overview
      • Transaction Model
      • Isolation Levels
      • Pessimistic Transactions
    • System Databases
      • `mysql`
      • `information_schema`
    • Errors Codes
    • Supported Client Drivers
    • Garbage Collection (GC)
      • GC Overview
      • GC Configuration
    • Performance
      • Overview
      • Understanding the Query Execution Plan
      • Introduction to Statistics
      • Optimizer Hints
      • Follower Read
      • Check the TiDB Cluster Status Using SQL Statements
      • Execution Plan Binding
      • Statement Summary Table
      • Tune TiKV
    • Key Monitoring Metrics
      • Overview
      • TiDB
      • PD
      • TiKV
    • Alert Rules
    • Best Practices
      • Highly Concurrent Write Best Practices
      • HAproxy Best Practices
      • PD Scheduling Best Practices
    • TiSpark
    • TiDB Binlog
      • Overview
      • Deploy
      • Maintain
      • Monitor
      • Upgrade
      • Reparo
      • Binlog Slave Client
      • FAQ
    • Tools
      • Mydumper
      • Syncer
      • Loader
      • TiDB Data Migration
        • Overview
          • DM Overview
          • Restrictions
          • DM-worker
          • DM Relay Log
        • Features
          • Table Routing
          • Black and White Lists
          • Binlog Event Filter
          • Replication Delay Monitoring
          • Sharding Support
            • Introduction
            • Restrictions
            • Handle Sharding DDL Locks Manually
        • Usage Scenarios
          • Simple Scenario
          • Shard Merge Scenario
          • Shard Merge Best Practices
        • Deploy
        • Configure
          • Overview
          • Task Configuration
        • Manage the DM Cluster
          • Cluster Operations
          • Cluster Upgrade
        • Manage Replication Tasks
          • Manage Tasks
          • Precheck Tasks
          • Query Task Status
          • Skip or Replace Abnormal SQL Statements
        • Monitor
        • Migrate from MySQL compatible database
          • Migrate from Aurora
        • Troubleshoot
          • DM Troubleshooting
          • Error Description
          • Error Handling
        • FAQ
      • TiDB Lightning
        • Overview
        • Deployment
        • Checkpoints
        • Table Filter
        • CSV Support
        • Monitor
        • Troubleshoot
        • FAQ
      • sync-diff-inspector
      • PD Control
      • PD Recover
      • TiKV Control
      • TiDB Control
      • Download
  • TiDB in Kubernetes
    • About TiDB Operator
    • Get Started
      • kind
      • GKE
      • Minikube
    • Deploy
      • Prerequisites
      • TiDB Operator
      • TiDB in General Kubernetes
      • TiDB in AWS EKS
      • TiDB in GCP GKE
      • TiDB in Alibaba Cloud ACK
      • Access TiDB in Kubernetes
    • Configure
      • Cluster Initialization
    • Monitor
    • Maintain
      • Destroy a TiDB cluster
      • Maintain a Hosting Kubernetes Node
      • Backup and Restore
      • Restore Data with TiDB Lightning
      • Collect Logs
      • Automatic Failover
      • TiDB Binlog
    • Scale
    • Upgrade
      • TiDB Cluster
      • TiDB Operator
    • Reference
      • Configuration
        • TiDB Cluster
        • Backup
        • PV
        • TiDB Drainer
      • Tools
        • tkctl
        • Tools in Kubernetes
    • Troubleshoot
    • FAQs
  • FAQs
    • TiDB FAQs
    • TiDB Lightning FAQs
    • Upgrade FAQs
  • Support
    • Support Resources
    • Report an Issue
  • Contribute
    • Contribute to TiDB
    • Improve the Docs
  • Adopters
  • Roadmap
  • Releases
    • v3.0
      • 3.0.7
      • 3.0.6
      • 3.0.5
      • 3.0.4
      • 3.0.3
      • 3.0.2
      • 3.0.1
      • 3.0 GA
      • 3.0.0-rc.3
      • 3.0.0-rc.2
      • 3.0.0-rc.1
      • 3.0.0-beta.1
      • 3.0.0-beta
    • v2.1
      • 2.1.18
      • 2.1.17
      • 2.1.16
      • 2.1.15
      • 2.1.14
      • 2.1.13
      • 2.1.12
      • 2.1.11
      • 2.1.10
      • 2.1.9
      • 2.1.8
      • 2.1.7
      • 2.1.6
      • 2.1.5
      • 2.1.4
      • 2.1.3
      • 2.1.2
      • 2.1.1
      • 2.1 GA
      • 2.1 RC5
      • 2.1 RC4
      • 2.1 RC3
      • 2.1 RC2
      • 2.1 RC1
      • 2.1 Beta
    • v2.0
      • 2.0.11
      • 2.0.10
      • 2.0.9
      • 2.0.8
      • 2.0.7
      • 2.0.6
      • 2.0.5
      • 2.0.4
      • 2.0.3
      • 2.0.2
      • 2.0.1
      • 2.0
      • 2.0 RC5
      • 2.0 RC4
      • 2.0 RC3
      • 2.0 RC1
      • 1.1 Beta
      • 1.1 Alpha
    • v1.0
      • 1.0.8
      • 1.0.7
      • 1.0.6
      • 1.0.5
      • 1.0.4
      • 1.0.3
      • 1.0.2
      • 1.0.1
      • 1.0
      • Pre-GA
      • RC4
      • RC3
      • RC2
      • RC1

GC Overview

TiDB uses MVCC to control transaction concurrency. When you update the data, the original data is not deleted immediately but is kept together with the new data, with a timestamp to distinguish the version. The goal of Garbage Collection (GC) is to clear the obsolete data.

GC process

Each TiDB cluster contains a TiDB instance that is selected as the GC leader, which controls the GC process.

GC runs periodically on TiDB. The default frequency is once every 10 minutes. For each GC, TiDB firstly calculates a timestamp called “safe point” (defaults to the current time minus 10 minutes). Then, TiDB clears the obsolete data under the premise that all the snapshots after the safe point retain the integrity of the data. Specifically, there are three steps involved in the GC process:

  1. Resolve Locks
  2. Delete Ranges
  3. Do GC

Resolve Locks

The TiDB transaction model is implemented based on Google’s Percolator. It’s mainly a two-phase commit protocol with some practical optimizations. When the first phase is finished, all the related keys are locked. Among these locks, one is the primary lock and the others are secondary locks which contain a pointer to the primary lock; in the second phase, the key with the primary lock gets a write record and its lock is removed. The write record indicates the write or delete operation in the history or the transactional rollback record of this key. The type of write record that replaces the primary lock indicates whether the corresponding transaction is committed successfully. Then all the secondary locks are replaced successively. If the threads fail to replace the secondary locks, these locks are retained.

The Resolve Locks step rolls back or commits the locks before the safe point, depending on whether their primary key has been committed or not. If the primary key is also retained, the transaction times out and is rolled back. This step is required. Once GC has cleared the write record of the primary lock, you can never know whether this transaction is successful or not. Also, if the the transaction contains retained secondary keys, it’s important to know whether it should be rolled back or committed. As a result, data consistency cannot be guaranteed.

In the Resolve Lock step, the GC leader processes requests from all Regions. From TiDB 3.0, this process runs concurrently by default, with the default concurrency equal to the number of TiKV nodes in the cluster. For more details on how to configure, see GC Configuration.

Delete Ranges

A great amount of data with consecutive keys is removed during operations such as DROP TABLE/INDEX. Removing each key and performing GC later for them can result in low execution efficiency on storage reclaiming. In such scenarios, TiDB actually does not delete each key. Instead, it only records the range to be removed and the timestamp of the deletion. Then the Delete Ranges step performs a fast physical deletion on the ranges whose timestamp is before the safe point.

Do GC

The Do GC step clears the outdated versions for all keys. To guarantee that all timestamps after the safe point have consistent snapshots, this step deletes the data committed before the safe point, but retains the last write before the safe point as long as it is not a deletion.

In the previous GC mechanism for TiDB 2.1 and earlier versions, the GC leader sends GC requests to all Regions. From TiDB 3.0, the GC leader only uploads the safe point to PD for each TiKV node to obtain. When the TiKV node detects a change on the safe point, it performs GC on all leader Regions on the current node. In the meantime, the GC leader can trigger the next round of GC.

Note:

You can modify the tikv_gc_mode to use the previous GC mechanism. For more details, refer to GC Configuration.

"GC Overview" was last updated Nov 10 2019: v3.1/*: add the v3.1 docs directory (#1639) (d142ada)
Edit this page Request docs changes

What’s on this page

Product

  • TiDB
  • TiSpark
  • Roadmap

Docs

  • Quick Start
  • Best Practices
  • FAQ
  • TiDB Tools
  • Release Notes

Resources

  • Blog
  • Weekly
  • GitHub
  • TiDB Community

Company

  • About
  • Careers
  • News
  • Contact Us
  • Privacy Policy
  • Terms of Service

Connect

  • Twitter
  • LinkedIn
  • Reddit
  • Google Group
  • Stack Overflow

© 2019 PingCAP. All Rights Reserved.

中文