PingCAP
  • Cloud
  • TiDB Academy
  • Docs
  • Success Stories
  • Blog
  • Free Download
PingCAP
  • Cloud
  • TiDB Academy
  • Docs
  • Success Stories
  • Blog
  • Free Download

Contact

中文
文档
v3.0 (stable) dev v2.1This doc does not exist in v2.1
  • Introduction
    • TiDB Introduction
    • Benchmarks
      • How to Test TiDB Using Sysbench
      • How to Run TPC-C Test on TiDB
      • Sysbench Performance Test - v3.0 vs. v2.1
      • TPC-C Performance Test - v3.0 vs. v2.1
      • Interaction Test on Online Workloads and `ADD INDEX` Operations
      • TiDB in Kubernetes Sysbench Test
      • DM 1.0-alpha Benchmark Report
      • DM 1.0-GA Benchmark Report
  • Concepts
    • Architecture
    • Key Features
      • Horizontal Scalability
      • MySQL Compatible Syntax
      • Replicate from and to MySQL
      • Distributed Transactions with Strong Consistency
      • Cloud Native Architecture
      • Minimize ETL with HTAP
      • Fault Tolerance & Recovery with Raft
      • Automatic Rebalancing
      • Deployment and Orchestration with Ansible, Kubernetes, Docker
      • JSON Support
      • Spark Integration
      • Read Historical Data Without Restoring from Backup
      • Fast Import and Restore of Data
      • Hybrid of Column and Row Storage
      • SQL Plan Management
      • Open Source
      • Online Schema Changes
  • How-to
    • Get Started
      • Start a Cluster
        • From Binary
        • From Homebrew
        • From DBdeployer
      • Explore SQL with TiDB
      • Import Example Database
      • Read Historical Data
      • TiDB Binlog Tutorial
      • TiDB Data Migration Tutorial
      • TiDB Lightning Tutorial
      • TiSpark Quick Start Guide
    • Deploy
      • Hardware Recommendations
      • From Binary Tarball
        • For Testing Environments
        • For Production Environments
      • Orchestrated Deployment
        • Ansible Deployment (Recommended)
        • Ansible Offline Deployment
        • Docker Deployment
      • Geographic Redundancy
        • Overview
        • Configure Location Awareness
      • Data Migration with Ansible
    • Configure
      • Time Zone
      • Memory Control
    • Secure
      • Transport Layer Security (TLS)
        • Enable TLS For MySQL Clients
        • Enable TLS Between TiDB Components
      • Generate Self-signed Certificates
    • Monitor
      • Overview
      • Monitor a TiDB Cluster
    • Migrate
      • Overview
      • Migrate from MySQL
        • Migrate the Full Data
        • Migrate the Incremental Data
      • Migrate from Aurora
      • Migrate from CSV
    • Maintain
      • Common Ansible Operations
      • Backup and Restore
        • Use `mydumper` and `loader`
        • Use BR
      • Identify Slow Queries
    • Scale
      • Scale using Ansible
      • Scale a TiDB Cluster
    • Upgrade
      • Upgrade to TiDB 3.1
    • Troubleshoot
      • Troubleshoot Cluster Setup
      • Troubleshoot TiDB Lightning
  • Reference
    • SQL
      • MySQL Compatibility
      • SQL Language Structure
        • Literal Values
        • Schema Object Names
        • Keywords and Reserved Words
        • User-Defined Variables
        • Expression Syntax
        • Comment Syntax
      • Data Types
        • Overview
        • Default Values
        • Numeric Types
          • `BIT`
          • `BOOL|BOOLEAN`
          • `TINYINT`
          • `SMALLINT`
          • `MEDIUMINT`
          • `INT|INTEGER`
          • `BIGINT`
          • `DECIMAL`
          • `FLOAT`
          • `DOUBLE`
        • Date and Time Types
          • `DATE`
          • `DATETIME`
          • `TIMESTAMP`
          • `TIME`
          • `YEAR`
        • String Types
          • `CHAR`
          • `VARCHAR`
          • `TEXT`
          • `LONGTEXT`
          • `BINARY`
          • `VARBINARY`
          • `TINYBLOB`
          • `BLOB`
          • `MEDIUMBLOB`
          • `LONGBLOB`
          • `ENUM`
          • `SET`
        • JSON Type
      • Functions and Operators
        • Function and Operator Reference
        • Type Conversion in Expression Evaluation
        • Operators
        • Control Flow Functions
        • String Functions
        • Numeric Functions and Operators
        • Date and Time Functions
        • Bit Functions and Operators
        • Cast Functions and Operators
        • Encryption and Compression Functions
        • Information Functions
        • JSON Functions
        • Aggregate (GROUP BY) Functions
        • Window Functions
        • Miscellaneous Functions
        • Precision Math
        • List of Expressions for Pushdown
      • SQL Statements
        • `ADD COLUMN`
        • `ADD INDEX`
        • `ADMIN`
        • `ALTER DATABASE`
        • `ALTER TABLE`
        • `ALTER USER`
        • `ANALYZE TABLE`
        • `BEGIN`
        • `COMMIT`
        • `CREATE DATABASE`
        • `CREATE INDEX`
        • `CREATE TABLE LIKE`
        • `CREATE TABLE`
        • `CREATE USER`
        • `CREATE VIEW`
        • `DEALLOCATE`
        • `DELETE`
        • `DESC`
        • `DESCRIBE`
        • `DO`
        • `DROP COLUMN`
        • `DROP DATABASE`
        • `DROP INDEX`
        • `DROP TABLE`
        • `DROP USER`
        • `DROP VIEW`
        • `EXECUTE`
        • `EXPLAIN ANALYZE`
        • `EXPLAIN`
        • `FLUSH PRIVILEGES`
        • `FLUSH STATUS`
        • `FLUSH TABLES`
        • `GRANT <privileges>`
        • `INSERT`
        • `KILL [TIDB]`
        • `LOAD DATA`
        • `MODIFY COLUMN`
        • `PREPARE`
        • `RECOVER TABLE`
        • `RENAME INDEX`
        • `RENAME TABLE`
        • `REPLACE`
        • `REVOKE <privileges>`
        • `ROLLBACK`
        • `SELECT`
        • `SET [NAMES|CHARACTER SET]`
        • `SET PASSWORD`
        • `SET TRANSACTION`
        • `SET [GLOBAL|SESSION] <variable>`
        • `SHOW CHARACTER SET`
        • `SHOW COLLATION`
        • `SHOW [FULL] COLUMNS FROM`
        • `SHOW CREATE TABLE`
        • `SHOW CREATE USER`
        • `SHOW DATABASES`
        • `SHOW ENGINES`
        • `SHOW ERRORS`
        • `SHOW [FULL] FIELDS FROM`
        • `SHOW GRANTS`
        • `SHOW INDEXES [FROM|IN]`
        • `SHOW INDEX [FROM|IN]`
        • `SHOW KEYS [FROM|IN]`
        • `SHOW PRIVILEGES`
        • `SHOW [FULL] PROCESSSLIST`
        • `SHOW SCHEMAS`
        • `SHOW [FULL] TABLES`
        • `SHOW TABLE REGIONS`
        • `SHOW TABLE STATUS`
        • `SHOW [GLOBAL|SESSION] VARIABLES`
        • `SHOW WARNINGS`
        • `SPLIT REGION`
        • `START TRANSACTION`
        • `TRACE`
        • `TRUNCATE`
        • `UPDATE`
        • `USE`
      • Constraints
      • Generated Columns
      • Partitioning
      • Character Set
      • SQL Mode
      • Views
    • Configuration
      • tidb-server
        • MySQL System Variables
        • TiDB Specific System Variables
        • Configuration Flags
        • Configuration File
      • pd-server
        • Configuration Flags
        • Configuration File
      • tikv-server
        • Configuration Flags
        • Configuration File
    • Security
      • Security Compatibility with MySQL
      • The TiDB Access Privilege System
      • TiDB User Account Management
      • Role-Based Access Control
    • Transactions
      • Overview
      • Transaction Model
      • Isolation Levels
      • Pessimistic Transactions
    • System Databases
      • `mysql`
      • `information_schema`
    • Errors Codes
    • Supported Client Drivers
    • Garbage Collection (GC)
      • GC Overview
      • GC Configuration
    • Performance
      • Overview
      • Understanding the Query Execution Plan
      • Introduction to Statistics
      • Optimizer Hints
      • Follower Read
      • Check the TiDB Cluster Status Using SQL Statements
      • Execution Plan Binding
      • Statement Summary Table
      • Tune TiKV
    • Key Monitoring Metrics
      • Overview
      • TiDB
      • PD
      • TiKV
    • Alert Rules
    • Best Practices
      • Highly Concurrent Write Best Practices
      • HAproxy Best Practices
    • TiSpark
    • TiDB Binlog
      • Overview
      • Deploy
      • Maintain
      • Monitor
      • Upgrade
      • Reparo
      • Binlog Slave Client
      • FAQ
    • Tools
      • Mydumper
      • Syncer
      • Loader
      • TiDB Data Migration
        • Overview
          • DM Overview
          • Restrictions
          • DM-worker
          • DM Relay Log
        • Features
          • Table Routing
          • Black and White Lists
          • Binlog Event Filter
          • Replication Delay Monitoring
          • Sharding Support
            • Introduction
            • Restrictions
            • Handle Sharding DDL Locks Manually
        • Usage Scenarios
          • Simple Scenario
          • Shard Merge Scenario
          • Shard Merge Best Practices
        • Deploy
        • Configure
          • Overview
          • Task Configuration
        • Manage the DM Cluster
          • Cluster Operations
          • Cluster Upgrade
        • Manage Replication Tasks
          • Manage Tasks
          • Precheck Tasks
          • Query Task Status
          • Skip or Replace Abnormal SQL Statements
        • Monitor
        • Migrate from MySQL compatible database
          • Migrate from Aurora
        • Troubleshoot
          • DM Troubleshooting
          • Error Description
          • Error Handling
        • FAQ
      • TiDB Lightning
        • Overview
        • Deployment
        • Checkpoints
        • Table Filter
        • CSV Support
        • Monitor
        • Troubleshoot
        • FAQ
      • sync-diff-inspector
      • PD Control
      • PD Recover
      • TiKV Control
      • TiDB Control
      • Download
  • TiDB in Kubernetes
    • About TiDB Operator
    • Get Started
      • kind
      • GKE
      • Minikube
    • Deploy
      • Prerequisites
      • TiDB Operator
      • TiDB in General Kubernetes
      • TiDB in AWS EKS
      • TiDB in GCP GKE
      • TiDB in Alibaba Cloud ACK
      • Access TiDB in Kubernetes
    • Configure
      • Cluster Initialization
    • Monitor
    • Maintain
      • Destroy a TiDB cluster
      • Maintain a Hosting Kubernetes Node
      • Backup and Restore
      • Restore Data with TiDB Lightning
      • Collect Logs
      • Automatic Failover
      • TiDB Binlog
    • Scale
    • Upgrade
      • TiDB Cluster
      • TiDB Operator
    • Reference
      • Configuration
        • TiDB Cluster
        • Backup
        • PV
        • TiDB Drainer
      • Tools
        • tkctl
        • Tools in Kubernetes
    • Troubleshoot
    • FAQs
  • FAQs
    • TiDB FAQs
    • TiDB Lightning FAQs
    • Upgrade FAQs
  • Support
    • Support Resources
    • Report an Issue
  • Contribute
    • Contribute to TiDB
    • Improve the Docs
  • Adopters
  • Roadmap
  • Releases
    • v3.0
      • 3.0.7
      • 3.0.6
      • 3.0.5
      • 3.0.4
      • 3.0.3
      • 3.0.2
      • 3.0.1
      • 3.0 GA
      • 3.0.0-rc.3
      • 3.0.0-rc.2
      • 3.0.0-rc.1
      • 3.0.0-beta.1
      • 3.0.0-beta
    • v2.1
      • 2.1.18
      • 2.1.17
      • 2.1.16
      • 2.1.15
      • 2.1.14
      • 2.1.13
      • 2.1.12
      • 2.1.11
      • 2.1.10
      • 2.1.9
      • 2.1.8
      • 2.1.7
      • 2.1.6
      • 2.1.5
      • 2.1.4
      • 2.1.3
      • 2.1.2
      • 2.1.1
      • 2.1 GA
      • 2.1 RC5
      • 2.1 RC4
      • 2.1 RC3
      • 2.1 RC2
      • 2.1 RC1
      • 2.1 Beta
    • v2.0
      • 2.0.11
      • 2.0.10
      • 2.0.9
      • 2.0.8
      • 2.0.7
      • 2.0.6
      • 2.0.5
      • 2.0.4
      • 2.0.3
      • 2.0.2
      • 2.0.1
      • 2.0
      • 2.0 RC5
      • 2.0 RC4
      • 2.0 RC3
      • 2.0 RC1
      • 1.1 Beta
      • 1.1 Alpha
    • v1.0
      • 1.0.8
      • 1.0.7
      • 1.0.6
      • 1.0.5
      • 1.0.4
      • 1.0.3
      • 1.0.2
      • 1.0.1
      • 1.0
      • Pre-GA
      • RC4
      • RC3
      • RC2
      • RC1

Maintain Kubernetes Nodes that Hold the TiDB Cluster

TiDB is a highly available database that can run smoothly when some of the database nodes go offline. For this reason, you can safely shut down and maintain the Kubernetes nodes at the bottom layer without influencing TiDB’s service. Specifically, you need to adopt various maintenance strategies when handling nodes that hold PD, TiKV, and TiDB instances because of their different features.

This document introduces how to perform a temporary or long-term maintenance task for the Kubernetes nodes.

Prerequisites

  • kubectl
  • tkctl
  • jq

Note:

Before you perform a long-term node maintenance, you need to make sure that the remaining resources in the Kubernetes cluster are enough for running the TiDB cluster.

Maintain nodes that hold PD and TiDB instances

Migrating PD and TiDB instances from a node is relatively fast, so you can proactively evict the instances to other nodes and perform maintenance on the desired node:

  1. Check whether there is any TiKV instance on the node to be maintained:

    kubectl get pod --all-namespaces -o wide | grep <node-name>

    If any TiKV instance is found, see Maintain nodes that hold TiKV instances.

  2. Use the kubectl cordon command to prevent new Pods from being scheduled to the node to be maintained:

    kubectl cordon <node-name>
  3. Use the kubectl drain command to migrate the database instances on the maintenance node to other nodes:

    kubectl drain <node-name> --ignore-daemonsets --delete-local-data

    After running this command, TiDB instances on this node are automatically migrated to another available node, and the PD instance will trigger the auto-failover mechanism after five minutes and complete the nodes.

  4. At this time, if you want to make this Kubernetes node offline, you can delete it by running:

    kubectl delete node <node-name>

    If you want to recover a Kubernetes node, you need to first make sure that it is in a healthy state:

    watch kubectl get node <node-name>

    After the node goes into the Ready state, you can proceed with the following operations.

  5. Use kubectl uncordon to lift the scheduling restriction on the node:

    kubectl uncordon <node-name>
  6. See whether all Pods get back to normal and are running:

    watch kubectl get -n $namespace pod -o wide

    Or:

    watch tkctl get all

    When you confirm that all Pods are running normally, then you have successfully finished the maintenance task.

Maintain nodes that hold TiKV instances

Migrating TiKV instances is relatively slow and might lead to unwanted data migration load on the cluster. For this reason, you need to choose different strategies as needed prior to maintaining the nodes that hold TiKV instances:

  • If you want to maintain a node that is recoverable in a short term, you can recover the TiKV node from where it is after the maintenance without migrating it elsewhere.
  • If you want to maintain a node that is not recoverable in a short term, you need to make a plan for the TiKV migration task.

Maintain a recoverable node in a short term

For a short-term maintenance, you can increase the TiKV instance downtime that the cluster allows by adjusting the max-store-down-time configuration of the PD cluster. You can finish the maintenance and recover the Kubernetes node during this time, and then all TiKV instances on this node will be automatically recovered.

kubectl port-forward svc/<CLUSTER_NAME>-pd 2379:2379
pd-ctl -d config set max-store-down-time 10m

After configuring max-store-down-time to an appropriate value, the follow-up operations are the same as those in Maintain nodes that hold PD and TiDB instances.

Maintain an irrecoverable node in a short term

For the maintenance on an node that cannot be recovered in a short term (for example, a node has to go offline for a long time), you need to use pd-ctl to proactively tell the TiDB cluster to make the corresponding TiKV instances offline, and manually unbind the instances from the node.

  1. Use kubectl cordon to prevent new Pods from being scheduled to the node to be maintained:

    kubectl cordon <node-name>
  2. Check the TiKV instance on the maintenance node:

    tkctl get -A tikv | grep <node-name>
  3. Use pd-ctl to proactively put the TiKV instance offline:

    Note:

    Before you make the TiKV instances offline, you need to make sure that the number of remaining TiKV instances in the cluster is no smaller than the number of TiKV replicas in the PD configuration (configuration item: max-replicas). If this is not the case, you need to scale TiKV first.

    Check store-id of the TiKV instance:

    kubectl get tc <CLUSTER_NAME> -ojson | jq '.status.tikv.stores | .[] | select ( .podName == "<POD_NAME>" ) | .id'

    Make the TiKV instance offline:

    kubectl port-forward svc/<CLUSTER_NAME>-pd 2379:2379
    pd-ctl -d store delete <ID>
  4. Wait for the store to change its status from state_name to Tombstone:

    watch pd-ctl -d store <ID>
  5. Unbind the TiKV instance from the local drive of the node:

    Get the PesistentVolumeClaim used by the Pod:

    kubectl get -n <namespace> pod <pod_name> -ojson | jq '.spec.volumes | .[] | select (.name == "tikv") | .persistentVolumeClaim.claimName'

    Delete the PesistentVolumeClaim:

    kubectl delete -n <namespace> pvc <pvc_name>
  6. Delete the TiKV instance:

    kubectl delete -n <namespace> pod <pod_name>
  7. Check whether the TiKV instance is normally scheduled to another node:

    watch kubectl -n <namespace> get pod -o wide

    If there are more TiKV instances on the maintenance node, you need to follow the above steps until all instances are migrated to other nodes.

  8. After you make sure that there is no more TiKV instance on the node, you can evict other instances on the node:

    kubectl drain <node-name> --ignore-daemonsets --delete-local-data
  9. Confirm again that there is no more TiKV, TiDB and PD instances running on this node:

    kubectl get pod --all-namespaces | grep <node-name>
  10. (Optional) If this node is made offline for a long time, it is recommended to delete it from the Kubernetes cluster:

    kuebctl delete node <node-name>

Now, you have successfully finished the maintenance task for the node.

"Maintain Kubernetes Nodes that Hold the TiDB Cluster" was last updated Nov 10 2019: v3.1/*: add the v3.1 docs directory (#1639) (d142ada)
Edit this page Request docs changes

What’s on this page

Product

  • TiDB
  • TiSpark
  • Roadmap

Docs

  • Quick Start
  • Best Practices
  • FAQ
  • TiDB Tools
  • Release Notes

Resources

  • Blog
  • Weekly
  • GitHub
  • TiDB Community

Company

  • About
  • Careers
  • News
  • Contact Us
  • Privacy Policy
  • Terms of Service

Connect

  • Twitter
  • LinkedIn
  • Reddit
  • Google Group
  • Stack Overflow

© 2019 PingCAP. All Rights Reserved.

中文