PingCAP
  • Cloud
  • TiDB Academy
  • Docs
  • Success Stories
  • Blog
  • Free Download
PingCAP
  • Cloud
  • TiDB Academy
  • Docs
  • Success Stories
  • Blog
  • Free Download

Contact

中文
文档
v3.0 (stable) dev v2.1
  • Introduction
    • TiDB Introduction
    • Benchmarks
      • How to Test TiDB Using Sysbench
      • How to Run TPC-C Test on TiDB
      • Sysbench Performance Test - v3.0 vs. v2.1
      • TPC-C Performance Test - v3.0 vs. v2.1
      • Interaction Test on Online Workloads and `ADD INDEX` Operations
      • TiDB in Kubernetes Sysbench Test
      • DM 1.0-alpha Benchmark Report
      • DM 1.0-GA Benchmark Report
  • Concepts
    • Architecture
    • Key Features
      • Horizontal Scalability
      • MySQL Compatible Syntax
      • Replicate from and to MySQL
      • Distributed Transactions with Strong Consistency
      • Cloud Native Architecture
      • Minimize ETL with HTAP
      • Fault Tolerance & Recovery with Raft
      • Automatic Rebalancing
      • Deployment and Orchestration with Ansible, Kubernetes, Docker
      • JSON Support
      • Spark Integration
      • Read Historical Data Without Restoring from Backup
      • Fast Import and Restore of Data
      • Hybrid of Column and Row Storage
      • SQL Plan Management
      • Open Source
      • Online Schema Changes
  • How-to
    • Get Started
      • Start a Cluster
        • From Binary
        • From Homebrew
        • From DBdeployer
      • Explore SQL with TiDB
      • Import Example Database
      • Read Historical Data
      • TiDB Binlog Tutorial
      • TiDB Data Migration Tutorial
      • TiDB Lightning Tutorial
      • TiSpark Quick Start Guide
    • Deploy
      • Hardware Recommendations
      • From Binary Tarball
        • For Testing Environments
        • For Production Environments
      • Orchestrated Deployment
        • Ansible Deployment (Recommended)
        • Ansible Offline Deployment
        • Docker Deployment
      • Geographic Redundancy
        • Overview
        • Configure Location Awareness
      • Data Migration with Ansible
    • Configure
      • Time Zone
      • Memory Control
    • Secure
      • Transport Layer Security (TLS)
        • Enable TLS For MySQL Clients
        • Enable TLS Between TiDB Components
      • Generate Self-signed Certificates
    • Monitor
      • Overview
      • Monitor a TiDB Cluster
    • Migrate
      • Overview
      • Migrate from MySQL
        • Migrate the Full Data
        • Migrate the Incremental Data
      • Migrate from Aurora
      • Migrate from CSV
    • Maintain
      • Common Ansible Operations
      • Backup and Restore
        • Use `mydumper` and `loader`
        • Use BR
      • Identify Slow Queries
    • Scale
      • Scale using Ansible
      • Scale a TiDB Cluster
    • Upgrade
      • Upgrade to TiDB 3.1
    • Troubleshoot
      • Troubleshoot Cluster Setup
      • Troubleshoot TiDB Lightning
  • Reference
    • SQL
      • MySQL Compatibility
      • SQL Language Structure
        • Literal Values
        • Schema Object Names
        • Keywords and Reserved Words
        • User-Defined Variables
        • Expression Syntax
        • Comment Syntax
      • Data Types
        • Overview
        • Default Values
        • Numeric Types
          • `BIT`
          • `BOOL|BOOLEAN`
          • `TINYINT`
          • `SMALLINT`
          • `MEDIUMINT`
          • `INT|INTEGER`
          • `BIGINT`
          • `DECIMAL`
          • `FLOAT`
          • `DOUBLE`
        • Date and Time Types
          • `DATE`
          • `DATETIME`
          • `TIMESTAMP`
          • `TIME`
          • `YEAR`
        • String Types
          • `CHAR`
          • `VARCHAR`
          • `TEXT`
          • `LONGTEXT`
          • `BINARY`
          • `VARBINARY`
          • `TINYBLOB`
          • `BLOB`
          • `MEDIUMBLOB`
          • `LONGBLOB`
          • `ENUM`
          • `SET`
        • JSON Type
      • Functions and Operators
        • Function and Operator Reference
        • Type Conversion in Expression Evaluation
        • Operators
        • Control Flow Functions
        • String Functions
        • Numeric Functions and Operators
        • Date and Time Functions
        • Bit Functions and Operators
        • Cast Functions and Operators
        • Encryption and Compression Functions
        • Information Functions
        • JSON Functions
        • Aggregate (GROUP BY) Functions
        • Window Functions
        • Miscellaneous Functions
        • Precision Math
        • List of Expressions for Pushdown
      • SQL Statements
        • `ADD COLUMN`
        • `ADD INDEX`
        • `ADMIN`
        • `ALTER DATABASE`
        • `ALTER TABLE`
        • `ALTER USER`
        • `ANALYZE TABLE`
        • `BEGIN`
        • `COMMIT`
        • `CREATE DATABASE`
        • `CREATE INDEX`
        • `CREATE TABLE LIKE`
        • `CREATE TABLE`
        • `CREATE USER`
        • `CREATE VIEW`
        • `DEALLOCATE`
        • `DELETE`
        • `DESC`
        • `DESCRIBE`
        • `DO`
        • `DROP COLUMN`
        • `DROP DATABASE`
        • `DROP INDEX`
        • `DROP TABLE`
        • `DROP USER`
        • `DROP VIEW`
        • `EXECUTE`
        • `EXPLAIN ANALYZE`
        • `EXPLAIN`
        • `FLUSH PRIVILEGES`
        • `FLUSH STATUS`
        • `FLUSH TABLES`
        • `GRANT <privileges>`
        • `INSERT`
        • `KILL [TIDB]`
        • `LOAD DATA`
        • `MODIFY COLUMN`
        • `PREPARE`
        • `RECOVER TABLE`
        • `RENAME INDEX`
        • `RENAME TABLE`
        • `REPLACE`
        • `REVOKE <privileges>`
        • `ROLLBACK`
        • `SELECT`
        • `SET [NAMES|CHARACTER SET]`
        • `SET PASSWORD`
        • `SET TRANSACTION`
        • `SET [GLOBAL|SESSION] <variable>`
        • `SHOW CHARACTER SET`
        • `SHOW COLLATION`
        • `SHOW [FULL] COLUMNS FROM`
        • `SHOW CREATE TABLE`
        • `SHOW CREATE USER`
        • `SHOW DATABASES`
        • `SHOW ENGINES`
        • `SHOW ERRORS`
        • `SHOW [FULL] FIELDS FROM`
        • `SHOW GRANTS`
        • `SHOW INDEXES [FROM|IN]`
        • `SHOW INDEX [FROM|IN]`
        • `SHOW KEYS [FROM|IN]`
        • `SHOW PRIVILEGES`
        • `SHOW [FULL] PROCESSSLIST`
        • `SHOW SCHEMAS`
        • `SHOW [FULL] TABLES`
        • `SHOW TABLE REGIONS`
        • `SHOW TABLE STATUS`
        • `SHOW [GLOBAL|SESSION] VARIABLES`
        • `SHOW WARNINGS`
        • `SPLIT REGION`
        • `START TRANSACTION`
        • `TRACE`
        • `TRUNCATE`
        • `UPDATE`
        • `USE`
      • Constraints
      • Generated Columns
      • Partitioning
      • Character Set
      • SQL Mode
      • Views
    • Configuration
      • tidb-server
        • MySQL System Variables
        • TiDB Specific System Variables
        • Configuration Flags
        • Configuration File
      • pd-server
        • Configuration Flags
        • Configuration File
      • tikv-server
        • Configuration Flags
        • Configuration File
    • Security
      • Security Compatibility with MySQL
      • The TiDB Access Privilege System
      • TiDB User Account Management
      • Role-Based Access Control
    • Transactions
      • Overview
      • Transaction Model
      • Isolation Levels
      • Pessimistic Transactions
    • System Databases
      • `mysql`
      • `information_schema`
    • Errors Codes
    • Supported Client Drivers
    • Garbage Collection (GC)
      • GC Overview
      • GC Configuration
    • Performance
      • Overview
      • Understanding the Query Execution Plan
      • Introduction to Statistics
      • Optimizer Hints
      • Follower Read
      • Check the TiDB Cluster Status Using SQL Statements
      • Execution Plan Binding
      • Statement Summary Table
      • Tune TiKV
    • Key Monitoring Metrics
      • Overview
      • TiDB
      • PD
      • TiKV
    • Alert Rules
    • Best Practices
      • Highly Concurrent Write Best Practices
      • HAproxy Best Practices
      • PD Scheduling Best Practices
    • TiSpark
    • TiDB Binlog
      • Overview
      • Deploy
      • Maintain
      • Monitor
      • Upgrade
      • Reparo
      • Binlog Slave Client
      • FAQ
    • Tools
      • Mydumper
      • Syncer
      • Loader
      • TiDB Data Migration
        • Overview
          • DM Overview
          • Restrictions
          • DM-worker
          • DM Relay Log
        • Features
          • Table Routing
          • Black and White Lists
          • Binlog Event Filter
          • Replication Delay Monitoring
          • Sharding Support
            • Introduction
            • Restrictions
            • Handle Sharding DDL Locks Manually
        • Usage Scenarios
          • Simple Scenario
          • Shard Merge Scenario
          • Shard Merge Best Practices
        • Deploy
        • Configure
          • Overview
          • Task Configuration
        • Manage the DM Cluster
          • Cluster Operations
          • Cluster Upgrade
        • Manage Replication Tasks
          • Manage Tasks
          • Precheck Tasks
          • Query Task Status
          • Skip or Replace Abnormal SQL Statements
        • Monitor
        • Migrate from MySQL compatible database
          • Migrate from Aurora
        • Troubleshoot
          • DM Troubleshooting
          • Error Description
          • Error Handling
        • FAQ
      • TiDB Lightning
        • Overview
        • Deployment
        • Checkpoints
        • Table Filter
        • CSV Support
        • Monitor
        • Troubleshoot
        • FAQ
      • sync-diff-inspector
      • PD Control
      • PD Recover
      • TiKV Control
      • TiDB Control
      • Download
  • TiDB in Kubernetes
    • About TiDB Operator
    • Get Started
      • kind
      • GKE
      • Minikube
    • Deploy
      • Prerequisites
      • TiDB Operator
      • TiDB in General Kubernetes
      • TiDB in AWS EKS
      • TiDB in GCP GKE
      • TiDB in Alibaba Cloud ACK
      • Access TiDB in Kubernetes
    • Configure
      • Cluster Initialization
    • Monitor
    • Maintain
      • Destroy a TiDB cluster
      • Maintain a Hosting Kubernetes Node
      • Backup and Restore
      • Restore Data with TiDB Lightning
      • Collect Logs
      • Automatic Failover
      • TiDB Binlog
    • Scale
    • Upgrade
      • TiDB Cluster
      • TiDB Operator
    • Reference
      • Configuration
        • TiDB Cluster
        • Backup
        • PV
        • TiDB Drainer
      • Tools
        • tkctl
        • Tools in Kubernetes
    • Troubleshoot
    • FAQs
  • FAQs
    • TiDB FAQs
    • TiDB Lightning FAQs
    • Upgrade FAQs
  • Support
    • Support Resources
    • Report an Issue
  • Contribute
    • Contribute to TiDB
    • Improve the Docs
  • Adopters
  • Roadmap
  • Releases
    • v3.0
      • 3.0.7
      • 3.0.6
      • 3.0.5
      • 3.0.4
      • 3.0.3
      • 3.0.2
      • 3.0.1
      • 3.0 GA
      • 3.0.0-rc.3
      • 3.0.0-rc.2
      • 3.0.0-rc.1
      • 3.0.0-beta.1
      • 3.0.0-beta
    • v2.1
      • 2.1.18
      • 2.1.17
      • 2.1.16
      • 2.1.15
      • 2.1.14
      • 2.1.13
      • 2.1.12
      • 2.1.11
      • 2.1.10
      • 2.1.9
      • 2.1.8
      • 2.1.7
      • 2.1.6
      • 2.1.5
      • 2.1.4
      • 2.1.3
      • 2.1.2
      • 2.1.1
      • 2.1 GA
      • 2.1 RC5
      • 2.1 RC4
      • 2.1 RC3
      • 2.1 RC2
      • 2.1 RC1
      • 2.1 Beta
    • v2.0
      • 2.0.11
      • 2.0.10
      • 2.0.9
      • 2.0.8
      • 2.0.7
      • 2.0.6
      • 2.0.5
      • 2.0.4
      • 2.0.3
      • 2.0.2
      • 2.0.1
      • 2.0
      • 2.0 RC5
      • 2.0 RC4
      • 2.0 RC3
      • 2.0 RC1
      • 1.1 Beta
      • 1.1 Alpha
    • v1.0
      • 1.0.8
      • 1.0.7
      • 1.0.6
      • 1.0.5
      • 1.0.4
      • 1.0.3
      • 1.0.2
      • 1.0.1
      • 1.0
      • Pre-GA
      • RC4
      • RC3
      • RC2
      • RC1

TiDB Binlog User Guide

This document describes how to deploy the Kafka version of TiDB Binlog.

About TiDB Binlog

TiDB Binlog is a tool for enterprise users to collect binlog files for TiDB and provide real-time backup and replication.

TiDB Binlog supports the following scenarios:

  • Data replication: to replicate TiDB cluster data to other databases
  • Real-time backup and recovery: to back up TiDB cluster data, and recover in case of cluster outages

TiDB Binlog architecture

The TiDB Binlog architecture is as follows:

TiDB Binlog architecture

The TiDB Binlog cluster mainly consists of three components:

Pump

Pump is a daemon that runs on the background of each TiDB host. Its main function is to record the binlog files generated by TiDB in real time and write to the file in the disk sequentially.

Drainer

Drainer collects binlog files from each Pump node, converts them into specified database-compatible SQL statements in the commit order of the transactions in TiDB, and replicates to the target database or writes to the file sequentially.

Kafka & ZooKeeper

The Kafka cluster stores the binlog data written by Pump and provides the binlog data to Drainer for reading.

Note:

In the local version of TiDB Binlog, the binlog is stored in files, while in the Kafka version, the binlog is stored using Kafka.

Install TiDB Binlog

The corresponding relationship between the tidb-ansible branch and the TiDB version is as follows:

TiDB Ansible branch TiDB version Note
release-2.0 2.0 version The latest 2.0 stable version. You can use it in the production environment.

Download Binary for the CentOS 7.3+ platform

# Download the tool package.
wget http://download.pingcap.org/tidb-binlog-kafka-linux-amd64.tar.gz
wget http://download.pingcap.org/tidb-binlog-kafka-linux-amd64.sha256

# Check the file integrity. If the result is OK, the file is correct.
sha256sum -c tidb-binlog-kafka-linux-amd64.sha256

# Extract the package.
tar -xzf tidb-binlog-kafka-linux-amd64.tar.gz
cd tidb-binlog-kafka-linux-amd64

Deploy TiDB Binlog

Note

  • You need to deploy a Pump for each TiDB server in the TiDB cluster. Currently, the TiDB server only supports the binlog in UNIX socket.

  • When you deploy a Pump manually, to start the service, follow the order of Pump -> TiDB; to stop the service, follow the order of TiDB -> Pump.

    We set the startup parameter binlog-socket as the specified unix socket file path of the corresponding parameter socket in Pump. The final deployment architecture is as follows:

    TiDB Pump deployment architecture

  • Drainer does not support renaming DDL on the table of the ignored schemas (schemas in the filter list).

  • To start Drainer in the existing TiDB cluster, usually you need to do a full backup, get the savepoint, import the full backup, and start Drainer and replicate from the savepoint.

    To guarantee the integrity of data, perform the following operations 10 minutes after Pump is started:

    • Use binlogctl of the tidb-tools project to generate the position for the initial start of Drainer.
    • Do a full backup. For example, back up TiDB using Mydumper.
    • Import the full backup to the target system.
    • The savepoint metadata started by the Kafka version of Drainer is stored in the checkpoint table of the downstream database tidb_binlog by default. If no valid data exists in the checkpoint table, configure initial-commit-ts to make Drainer work from a specified position when it is started:

      bin/drainer --config=conf/drainer.toml --initial-commit-ts=${position}
  • The drainer outputs pb and you need to set the following parameters in the configuration file:

    [syncer]
    db-type = "pb"
    disable-dispatch = true
    
    [syncer.to]
    dir = "/path/pb-dir"
  • The drainer outputs kafka and you need to set the following parameters in the configuration file:

    [syncer]
    db-type = "kafka"
    
    # when db-type is kafka, you can uncomment this to config the down stream kafka, or it will be the same kafka addrs where drainer pulls binlog from.
    [syncer.to]
    kafka-addrs = "127.0.0.1:9092"
    kafka-version = "0.8.2.0"

    The data which outputs to kafka follows the binlog format sorted by ts and defined by protobuf. See driver to access the data and sync to the down stream.

  • Deploy Kafka and ZooKeeper cluster before deploying TiDB Binlog. Make sure that Kafka is 0.9 version or later.

Recommended Kafka cluster configuration

Name Number Memory size CPU Hard disk
Kafka 3+ 16G 8+ 2+ 1TB
ZooKeeper 3+ 8G 4+ 2+ 300G

Recommended Kafka parameter configuration

  • auto.create.topics.enable = true: if no topic exists, Kafka automatically creates a topic on the broker.
  • broker.id: a required parameter to identify the Kafka cluster. Keep the parameter value unique. For example, broker.id = 1.
  • fs.file-max = 1000000: Kafka uses a lot of files and network sockets. It is recommended to change the parameter value to 1000000. Change the value using vi /etc/sysctl.conf.
  • Configure the following three parameters to 1G, to avoid the failure of writing into Kafka caused by too large a single message when a large number of data is modified in a transaction.
    • message.max.bytes=1073741824
    • replica.fetch.max.bytes=1073741824
    • fetch.message.max.bytes=1073741824

Deploy Pump using TiDB Ansible

  • If you have not deployed the Kafka cluster, use the Kafka Ansible to deploy.
  • When you deploy the TiDB cluster using TiDB Ansible, edit the tidb-ansible/inventory.ini file, set enable_binlog = True, and configure the zookeeper_addrs variable as the ZooKeeper address of the Kafka cluster. In this way, Pump is deployed while you deploy the TiDB cluster.

Configuration example:

# binlog trigger
enable_binlog = True
# ZooKeeper address of the Kafka cluster. Example:
# zookeeper_addrs = "192.168.0.11:2181,192.168.0.12:2181,192.168.0.13:2181"
# You can also append an optional chroot string to the URLs to specify the root directory for all Kafka znodes. Example:
# zookeeper_addrs = "192.168.0.11:2181,192.168.0.12:2181,192.168.0.13:2181/kafka/123"
zookeeper_addrs = "192.168.0.11:2181,192.168.0.12:2181,192.168.0.13:2181"

Deploy Pump using Binary

A usage example:

Assume that we have three PDs, three ZooKeepers, and one TiDB. The information of each node is as follows:

TiDB="192.168.0.10"
PD1="192.168.0.16"
PD2="192.168.0.15"
PD3="192.168.0.14"
ZK1="192.168.0.13"
ZK2="192.168.0.12"
ZK3="192.168.0.11"

Deploy Drainer/Pump on the machine with the IP address “192.168.0.10”.

The IP address of the corresponding PD cluster is “192.168.0.16,192.168.0.15,192.168.0.14”.

The ZooKeeper IP address of the corresponding Kafka cluster is “192.168.0.13,192.168.0.12,192.168.0.11”.

This example describes how to use Pump/Drainer.

  1. Description of Pump command line options

    Usage of Pump:
    -L string
        log level: debug, info, warn, error, fatal (default "info")
    -V
        to print Pump version info
    -addr string
        the RPC address that Pump provides service (-addr= "192.168.0.10:8250")
    -advertise-addr string
        the RPC address that Pump provides external service (-advertise-addr="192.168.0.10:8250")
    -config string
        to configure the file path of Pump; if you specifies the configuration file, Pump reads the configuration first; if the corresponding configuration also exists in the command line argument, Pump uses the command line configuration to cover that in the configuration file
    -data-dir string
        the path of storing Pump data
    -enable-tolerant
        after enabling tolerant, Pump wouldn't return error if it fails to write binlog (default true)
    -zookeeper-addrs string (-zookeeper_addrs="192.168.0.11:2181,192.168.0.12:2181,192.168.0.13:2181")
        the ZooKeeper address; this option gets the Kafka address from ZooKeeper, and you need to keep it the same with the configuration in Kafka
    -gc int
        the maximum days that the binlog is retained (default 7), and 0 means retaining the binlog permanently
    -heartbeat-interval int
        the interval between heartbeats that Pump sends to PD (unit: second)
    -log-file string
        the path of the log file
    -log-rotate string
        the log file rotating frequency (hour/day)
    -metrics-addr string
        the Prometheus Pushgateway address; leaving it empty disables Prometheus push
    -metrics-interval int
        the frequency of reporting monitoring information (default 15, unit: second)
    -pd-urls string
        the node address of the PD cluster (-pd-urls="http://192.168.0.16:2379,http://192.168.0.15:2379,http://192.168.0.14:2379")
    -socket string
        the monitoring address of the unix socket service (default "unix:///tmp/pump.sock")
  2. Pump configuration file

    # Pump configuration.
    # the RPC address that Pump provides service (default "192.168.0.10:8250")
    addr = "192.168.0.10:8250"
    
    # the RPC address that Pump provides external service (default "192.168.0.10:8250")
    advertise-addr = ""
    
    # an integer value to control expiry date of the binlog data, indicates how long (in days) the binlog data is stored.
    # (default value is 0, means binlog data would never be removed)
    gc = 7
    
    # the path of storing Pump data
    data-dir = "data.pump"
    
    # the ZooKeeper address; You can set the option to get the Kafka address from ZooKeeper; if the namespace is configured in Kafka, you need to keep the same configuration here
    zookeeper-addrs = "192.168.0.11:2181,192.168.0.12:2181,192.168.0.13:2181"
    # example of the ZooKeeper address that configures the namespace
    zookeeper-addrs = "192.168.0.11:2181,192.168.0.12:2181,192.168.0.13:2181/kafka/123"
    
    # the interval between heartbeats that Pump sends to PD (unit: second)
    heartbeat-interval = 3
    
    # the node address of the PD cluster
    pd-urls = "http://192.168.0.16:2379,http://192.168.0.15:2379,http://192.168.0.14:2379"
    
    # the monitoring address of the unix socket service (default "unix:///tmp/pump.sock")
    socket = "unix:///tmp/pump.sock"
  3. Startup example

    ./bin/pump -config pump.toml

Deploy Drainer using Binary

  1. Description of Drainer command line arguments

    Usage of Drainer:
    -L string
        log level: debug, info, warn, error, fatal (default "info")
    -V
        to print Pump version info
    -addr string
        the address that Drainer provides service (default "192.168.0.10:8249")
    -c int
        to replicate the downstream concurrency number, and a bigger value means better throughput performance (default 1)
    -config string
        to configure the file path of Drainer; if you specifies the configuration file, Drainer reads the configuration first; if the corresponding configuration also exists in the command line argument, Pump uses the command line configuration to cover that in the configuration file
    -data-dir string
        the path of storing Drainer data (default "data.drainer")
    -zookeeper-addrs string (-zookeeper-addrs="192.168.0.11:2181,192.168.0.12:2181,192.168.0.13:2181")
        the ZooKeeper address; you can set this option to get the Kafka address from ZooKeeper, and you need to keep it the same with the configuration in Kafka
    -dest-db-type string
        the downstream service type of Drainer (default "mysql")
    -detect-interval int
        the interval of detecting Pump's status from PD (default 10, unit: second)
    -disable-dispatch
        whether to disable dispatching sqls in a single binlog; if you set the value to true, it is restored into a single transaction to replicate in the order of each binlog (If the downstream service type is "mysql", set the value to false)
    -ignore-schemas string
        the DB filtering list (default "INFORMATION_SCHEMA,PERFORMANCE_SCHEMA,mysql,test"); does not support the rename DDL operation on the table of ignore schemas
    -initial-commit-ts (default 0)
        If Drainer does not provide related breakpoint information, you can use this option to configure the related breakpoint information
    -log-file string
        the path of the log file
    -log-rotate string
        the log file rotating frequency (hour/day)
    -metrics-addr string
        the Prometheus Pushgateway address; leaving it empty disables Prometheus push
    -metrics-interval int
        the frequency of reporting monitoring information (default 15, unit: second)
    -pd-urls string
        the node address of the PD cluster (-pd-urls="http://192.168.0.16:2379,http://192.168.0.15:2379,http://192.168.0.14:2379")
    -txn-batch int
        the number of SQL statements in a single transaction that is output to the downstream database (default 1)
  2. Drainer configuration file

    # Drainer configuration
    
    # the address that Drainer provides service ("192.168.0.10:8249")
    addr = "192.168.0.10:8249"
    
    # the interval of detecting Pump's status from PD (default 10, unit: second)
    detect-interval = 10
    
    # the path of storing Drainer data (default "data.drainer")
    data-dir = "data.drainer"
    
    # the ZooKeeper address; you can use this option to get the Kafka address from ZooKeeper; if the namespace is configured in Kafka, you need to keep the same configuration here
    zookeeper-addrs = "192.168.0.11:2181,192.168.0.12:2181,192.168.0.13:2181"
    # example of the ZooKeeper address that configures the namespace
    zookeeper-addrs = "192.168.0.11:2181,192.168.0.12:2181,192.168.0.13:2181/kafka/123"
    
    # the node address of the PD cluster
    pd-urls = "http://192.168.0.16:2379,http://192.168.0.15:2379,http://192.168.0.14:2379"
    
    # the path of the log file
    log-file = "drainer.log"
    
    # Syncer configuration.
    [syncer]
    
    # the DB filtering list (default "INFORMATION_SCHEMA,PERFORMANCE_SCHEMA,mysql,test")
    # does not support the rename DDL operation on the table of ignore schemas
    ignore-schemas = "INFORMATION_SCHEMA,PERFORMANCE_SCHEMA,mysql"
    
    # the number of SQL statements in a single transaction that is output to the downstream database (default 1)
    txn-batch = 1
    
    # to replicate the downstream concurrency number, and a bigger value means better throughput performance (default 1)
    worker-count = 1
    
    # whether to disable dispatching sqls in a single binlog;
    # if you set the value to true, it is restored into a single transaction to replicate in the order of each binlog (If the downstream service type is "mysql", set the value to false)
    disable-dispatch = false
    
    # the downstream service type of Drainer (default "mysql")
    # valid values: "mysql", "pb"
    db-type = "mysql"
    
    # replicate-do-db priority over replicate-do-table if have same db name
    # and we support regex expression,
    # the regex expression starts with '~'
    
    # replicate-do-db = ["~^b.*","s1"]
    
    # [[syncer.replicate-do-table]]
    # db-name ="test"
    # tbl-name = "log"
    
    # [[syncer.replicate-do-table]]
    # db-name ="test"
    # tbl-name = "~^a.*"
    
    # server parameters of the downstream database when the db-type is set to "mysql"
    [syncer.to]
    host = "192.168.0.10"
    user = "root"
    password = ""
    port = 3306
    
    # the directory of the binlog file when the db-type is set to "pb"
    # [syncer.to]
    # dir = "data.drainer"
  3. Startup example

    ./bin/drainer -config drainer.toml

Download PbReader (Linux)

PbReader parses the pb file generated by Drainer and translates it into SQL statements.

CentOS 7+

# Download PbReader package
wget http://download.pingcap.org/pb_reader-latest-linux-amd64.tar.gz
wget http://download.pingcap.org/pb_reader-latest-linux-amd64.sha256

# Check the file integrity. If the result is OK, the file is correct.
sha256sum -c pb_reader-latest-linux-amd64.sha256

# Extract the package.
tar -xzf pb_reader-latest-linux-amd64.tar.gz
cd pb_reader-latest-linux-amd64

The PbReader usage example

./bin/pbReader -binlog-file=${path}/binlog-0000000000000000

Monitor TiDB Binlog

This section introduces how to monitor TiDB Binlog’s status and performance, and display the metrics using Prometheus and Grafana.

Configure Pump/Drainer

Use the Pump service deployed using Ansible. Set metrics in startup parameters.

When you start Drainer, set the two parameters of --metrics-addr and --metrics-interval. Set --metrics-addr as the address of Push Gateway. Set --metrics-interval as the frequency of push (default 15 seconds).

Configure Grafana

Create a Prometheus data source

  1. Login the Grafana Web interface.

    • The default address is: http://localhost:3000

    • The default account name: admin

    • The password for the default account: admin

  2. Click the Grafana logo to open the sidebar menu.

  3. Click “Data Sources” in the sidebar.

  4. Click “Add data source”.

  5. Specify the data source information:

    • Specify the name for the data source.
    • For Type, select Prometheus.
    • For Url, specify the Prometheus address.
    • Specify other fields as needed.
  6. Click “Add” to save the new data source.

Create a Grafana dashboard

  1. Click the Grafana logo to open the sidebar menu.

  2. On the sidebar menu, click “Dashboards” -> “Import” to open the “Import Dashboard” window.

  3. Click “Upload .json File” to upload a JSON file (Download TiDB Grafana Config).

  4. Click “Save & Open”. A Prometheus dashboard is created.

"TiDB Binlog user guide" was last updated Nov 15 2019: *: move binlog folder and refine tools toc (#1648) (67603b4)
Edit this page Request docs changes

What’s on this page

Product

  • TiDB
  • TiSpark
  • Roadmap

Docs

  • Quick Start
  • Best Practices
  • FAQ
  • TiDB Tools
  • Release Notes

Resources

  • Blog
  • Weekly
  • GitHub
  • TiDB Community

Company

  • About
  • Careers
  • News
  • Contact Us
  • Privacy Policy
  • Terms of Service

Connect

  • Twitter
  • LinkedIn
  • Reddit
  • Google Group
  • Stack Overflow

© 2019 PingCAP. All Rights Reserved.

中文