PingCAP
  • Cloud
  • TiDB Academy
  • Docs
  • Success Stories
  • Blog
  • Free Download
PingCAP
  • Cloud
  • TiDB Academy
  • Docs
  • Success Stories
  • Blog
  • Free Download

Contact

中文
文档
v3.0 (stable) dev v2.1
  • Introduction
    • TiDB Introduction
    • Benchmarks
      • How to Test TiDB Using Sysbench
      • Sysbench Performance Test - v2.1 vs. v2.0
      • TPC-H 50G Performance Test - v2.1 vs. v2.0
      • DM Benchmark Report
  • Concepts
    • Architecture
    • Key Features
      • Horizontal Scalability
      • MySQL Compatible Syntax
      • Replicate from and to MySQL
      • Distributed Transactions with Strong Consistency
      • Cloud Native Architecture
      • Minimize ETL with HTAP
      • Fault Tolerance & Recovery with Raft
      • Automatic Rebalancing
      • Deployment and Orchestration with Ansible, Kubernetes, Docker
      • JSON Support
      • Spark Integration
      • Read Historical Data Without Restoring from Backup
      • Fast Import and Restore of Data
      • Hybrid of Column and Row Storage
      • SQL Plan Management
      • Open Source
      • Online Schema Changes
  • How-to
    • Get Started
      • Start a Local Cluster
        • From Binary
      • Explore SQL with TiDB
      • Import Example Database
      • Read Historical Data
      • TiDB Binlog Tutorial
      • TiDB Data Migration Tutorial
      • TiDB Lightning Tutorial
      • TiSpark Quick Start Guide
    • Deploy
      • Hardware Recommendations
      • From Binary Tarball
        • For Testing Environments
        • For Production Environments
      • Orchestrated Deployment
        • Ansible Deployment (Recommended)
        • Ansible Offline Deployment
        • Docker Deployment
      • Geographic Redundancy
        • Overview
        • Configure Location Awareness
      • Data Migration with Ansible
    • Configure
      • Time Zone
      • Memory Control
    • Secure
      • Transport Layer Security (TLS)
        • Enable TLS For MySQL Clients
        • Enable TLS Between TiDB Components
      • Generate Self-signed Certificates
    • Monitor
      • Overview
      • Monitor a TiDB Cluster
    • Migrate
      • Overview
      • Migrate from MySQL
        • Migrate the Full Data
        • Migrate the Incremental Data
      • Migrate from Aurora
      • Migrate from CSV
    • Maintain
      • Common Ansible Operations
      • Backup and Restore
      • Identify Slow Queries
    • Scale
      • Scale using Ansible
      • Scale a TiDB Cluster
    • Upgrade
      • Upgrade to TiDB 2.1
    • Troubleshoot
      • Troubleshoot Cluster Setup
      • Troubleshoot TiDB Lightning
  • Reference
    • SQL
      • MySQL Compatibility
      • SQL Language Structure
        • Literal Values
        • Schema Object Names
        • Keywords and Reserved Words
        • User-Defined Variables
        • Expression Syntax
        • Comment Syntax
      • Data Types
        • Overview
        • Default Values
        • Numeric Types
          • `BIT`
          • `BOOL|BOOLEAN`
          • `TINYINT`
          • `SMALLINT`
          • `MEDIUMINT`
          • `INT|INTEGER`
          • `BIGINT`
          • `DECIMAL`
          • `FLOAT`
          • `DOUBLE`
        • Date and Time Types
          • `DATE`
          • `DATETIME`
          • `TIMESTAMP`
          • `TIME`
          • `YEAR`
        • String Types
          • `CHAR`
          • `VARCHAR`
          • `TEXT`
          • `LONGTEXT`
          • `BINARY`
          • `VARBINARY`
          • `TINYBLOB`
          • `BLOB`
          • `MEDIUMBLOB`
          • `LONGBLOB`
          • `ENUM`
          • `SET`
        • JSON Type
      • Functions and Operators
        • Function and Operator Reference
        • Type Conversion in Expression Evaluation
        • Operators
        • Control Flow Functions
        • String Functions
        • Numeric Functions and Operators
        • Date and Time Functions
        • Bit Functions and Operators
        • Cast Functions and Operators
        • Encryption and Compression Functions
        • Information Functions
        • JSON Functions
        • Aggregate (GROUP BY) Functions
        • Miscellaneous Functions
        • Precision Math
      • SQL Statements
        • `ADD COLUMN`
        • `ADD INDEX`
        • `ADMIN`
        • `ALTER DATABASE`
        • `ALTER TABLE`
        • `ALTER USER`
        • `ANALYZE TABLE`
        • `BEGIN`
        • `COMMIT`
        • `CREATE DATABASE`
        • `CREATE INDEX`
        • `CREATE TABLE LIKE`
        • `CREATE TABLE`
        • `CREATE USER`
        • `DEALLOCATE`
        • `DELETE`
        • `DESC`
        • `DESCRIBE`
        • `DO`
        • `DROP COLUMN`
        • `DROP DATABASE`
        • `DROP INDEX`
        • `DROP TABLE`
        • `DROP USER`
        • `EXECUTE`
        • `EXPLAIN ANALYZE`
        • `EXPLAIN`
        • `FLUSH PRIVILEGES`
        • `FLUSH STATUS`
        • `FLUSH TABLES`
        • `GRANT <privileges>`
        • `INSERT`
        • `KILL [TIDB]`
        • `LOAD DATA`
        • `MODIFY COLUMN`
        • `PREPARE`
        • `RENAME INDEX`
        • `RENAME TABLE`
        • `REPLACE`
        • `REVOKE <privileges>`
        • `ROLLBACK`
        • `SELECT`
        • `SET [NAMES|CHARACTER SET]`
        • `SET PASSWORD`
        • `SET TRANSACTION`
        • `SET [GLOBAL|SESSION] <variable>`
        • `SHOW CHARACTER SET`
        • `SHOW COLLATION`
        • `SHOW [FULL] COLUMNS FROM`
        • `SHOW CREATE TABLE`
        • `SHOW DATABASES`
        • `SHOW ENGINES`
        • `SHOW ERRORS`
        • `SHOW [FULL] FIELDS FROM`
        • `SHOW GRANTS`
        • `SHOW INDEXES [FROM|IN]`
        • `SHOW INDEX [FROM|IN]`
        • `SHOW KEYS [FROM|IN]`
        • `SHOW PRIVILEGES`
        • `SHOW [FULL] PROCESSSLIST`
        • `SHOW SCHEMAS`
        • `SHOW [FULL] TABLES`
        • `SHOW TABLE REGIONS`
        • `SHOW TABLE STATUS`
        • `SHOW [GLOBAL|SESSION] VARIABLES`
        • `SHOW WARNINGS`
        • `SPLIT REGION`
        • `START TRANSACTION`
        • `TRACE`
        • `TRUNCATE`
        • `UPDATE`
        • `USE`
      • Constraints
      • Generated Columns
      • Character Set
    • Configuration
      • tidb-server
        • MySQL System Variables
        • TiDB Specific System Variables
        • Configuration Flags
        • Configuration File
      • pd-server
        • Configuration Flags
      • tikv-server
        • Configuration Flags
    • Security
      • Security Compatibility with MySQL
      • The TiDB Access Privilege System
      • TiDB User Account Management
    • Transactions
      • Overview
      • Transaction Model
      • Isolation Levels
    • System Databases
      • `mysql`
      • `information_schema`
    • Errors Codes
    • Supported Client Drivers
    • Garbage Collection (GC)
    • Performance
      • Overview
      • Understanding the Query Execution Plan
      • Introduction to Statistics
      • Optimizer Hints
      • Tune TiKV
    • Key Monitoring Metrics
      • Overview
      • TiDB
      • PD
      • TiKV
    • Alert Rules
    • Best Practices
      • HAProxy Best Practices
      • PD Scheduling Best Practices
    • TiSpark
    • TiDB Binlog
      • Overview
      • Deploy
      • Maintain
      • Monitor
      • Upgrade
      • Reparo
      • Binlog Slave Client
      • FAQ
    • Tools
      • Mydumper
      • Syncer
      • Loader
      • TiDB Data Migration
        • Overview
          • DM Overview
          • Restrictions
          • DM-worker
          • DM Relay Log
        • Features
          • Table Routing
          • Black and White Lists
          • Binlog Event Filter
          • Replication Delay Monitoring
          • Sharding Support
            • Introduction
            • Restrictions
            • Handle Sharding DDL Locks Manually
        • Usage Scenarios
          • Simple Scenario
          • Shard Merge Scenario
          • Shard Merge Best Practices
        • Deploy
        • Configure
          • Overview
          • Task Configuration
        • Manage the DM Cluster
          • Cluster Operations
          • Cluster Upgrade
        • Manage Replication Tasks
          • Manage Tasks
          • Precheck Tasks
          • Query Task Status
          • Skip or Replace Abnormal SQL Statements
        • Monitor
        • Migrate from MySQL compatible database
          • Migrate from Aurora
        • Troubleshoot
          • DM Troubleshooting
          • Error Description
          • Error Handling
        • FAQ
      • TiDB Lightning
        • Overview
        • Deployment
        • Checkpoints
        • Table Filter
        • CSV Support
        • Monitor
        • Troubleshoot
        • FAQ
      • sync-diff-inspector
      • PD Control
      • PD Recover
      • TiKV Control
      • TiDB Control
      • Download
  • FAQs
    • TiDB FAQs
    • TiDB Lightning FAQs
    • Upgrade FAQs
  • Support
    • Support Resources
    • Report an Issue
  • Contribute
    • Contribute to TiDB
    • Improve the Docs
  • Adopters
  • Roadmap
  • Releases
    • v2.1
      • 2.1.18
      • 2.1.17
      • 2.1.16
      • 2.1.15
      • 2.1.14
      • 2.1.13
      • 2.1.12
      • 2.1.11
      • 2.1.10
      • 2.1.9
      • 2.1.8
      • 2.1.7
      • 2.1.6
      • 2.1.5
      • 2.1.4
      • 2.1.3
      • 2.1.2
      • 2.1.1
      • 2.1 GA
      • 2.1 RC5
      • 2.1 RC4
      • 2.1 RC3
      • 2.1 RC2
      • 2.1 RC1
      • 2.1 Beta
    • v2.0
      • 2.0.11
      • 2.0.10
      • 2.0.9
      • 2.0.8
      • 2.0.7
      • 2.0.6
      • 2.0.5
      • 2.0.4
      • 2.0.3
      • 2.0.2
      • 2.0.1
      • 2.0
      • 2.0 RC5
      • 2.0 RC4
      • 2.0 RC3
      • 2.0 RC1
      • 1.1 Beta
      • 1.1 Alpha
    • v1.0
      • 1.0.8
      • 1.0.7
      • 1.0.6
      • 1.0.5
      • 1.0.4
      • 1.0.3
      • 1.0.2
      • 1.0.1
      • 1.0
      • Pre-GA
      • RC4
      • RC3
      • RC2
      • RC1

Data Migration Shard Merge Scenario

This document shows how to use Data Migration (DM) in the shard merge scenario where the sharded schemas and sharded tables data of three upstream MySQL instances need to be replicated to a downstream TiDB cluster.

Upstream instances

Assume that the upstream schemas are as follows:

  • Instance 1

    Schema Tables
    user information, log_north, log_bak
    store_01 sale_01, sale_02
    store_02 sale_01, sale_02
  • Instance 2

    Schema Tables
    user information, log_east, log_bak
    store_01 sale_01, sale_02
    store_02 sale_01, sale_02
  • Instance 3

    Schema Tables
    user information, log_south, log_bak
    store_01 sale_01, sale_02
    store_02 sale_01, sale_02

Replication requirements

  1. Merge the user.information table of three upstream instances to the downstream user.information table in TiDB.
  2. Merge the user.log_{north|south|east} table of three upstream instances to the downstream user.log_{north|south|east} table in TiDB.
  3. Merge the store_{01|02}.sale_{01|02} table of three upstream instances to the downstream store.sale table in TiDB.
  4. Filter out all the deletion operations in the user.log_{north|south|east} table of three upstream instances.
  5. Filter out all the deletion operations in the user.information table of three upstream instances.
  6. Filter out all the deletion operations in the store_{01|02}.sale_{01|02} table of three upstream instances.
  7. Filter out the user.log_bak table of three upstream instances.
  8. Because the store_{01|02}.sale_{01|02} tables have auto-increment primary keys of the bigint type, the conflict occurs when these tables are merged into TiDB. The following text will show you solutions to resolve and avoid the conflict.

Downstream instances

Assume that the downstream schema after replication is as follows:

Schema Tables
user information, log_north, log_east, log_south
store sale

Replication solution

  • To satisfy the replication Requirements #1 and #2, configure the table routing rule as follows:

    routes:
      ...
      user-route-rule:
        schema-pattern: "user"
        target-schema: "user"
  • To satisfy the replication Requirement #3, configure the table routing rule as follows:

    routes:
      ...
      store-route-rule:
        schema-pattern: "store_*"
        target-schema: "store"
      sale-route-rule:
        schema-pattern: "store_*"
        table-pattern: "sale_*"
        target-schema: "store"
        target-table:  "sale"
  • To satisfy the replication Requirements #4 and #5, configure the binlog event filtering rule as follows:

    filters:
      ...
      user-filter-rule:
        schema-pattern: "user"
        events: ["truncate table", "drop table", "delete", "drop database"]
        action: Ignore

    Note:

    The replication Requirements #4, #5 and #7 indicate that all the deletion operations in the user schema are filtered out, so a schema level filtering rule is configured here. However, the deletion operations of future tables in the user schema will also be filtered out.

  • To satisfy the replication Requirement #6, configure the binlog event filter rule as follows:

    filters:
      ...
      sale-filter-rule:
        schema-pattern: "store_*"
        table-pattern: "sale_*"
        events: ["truncate table", "drop table", "delete"]
        action: Ignore
      store-filter-rule:
        schema-pattern: "store_*"
        events: ["drop database"]
        action: Ignore
  • To satisfy the replication Requirement #7, configure the black and white table lists as follows:

    black-white-list:
      log-bak-ignored:
        ignore-tales:
        - db-name: "user"
          tbl-name: "log_bak"
  • To satisfy the replication Requirement #8, first refer to handling conflicts of auto-increment primary key to solve conflicts. This guarantees that data is successfully replicated to the downstream when the primary key value of one sharded table is duplicate with that of another sharded table. Then, configure ignore-checking-items to skip checking the conflict of auto-increment primary key:

    ignore-checking-items: ["auto_increment_ID"]

Replication task configuration

The complete configuration of the replication task is shown as below. For more details, see Data Migration Task Configuration File.

name: "shard_merge"
task-mode: all
meta-schema: "dm_meta"
remove-meta: false
ignore-checking-items: ["auto_increment_ID"]

target-database:
  host: "192.168.0.1"
  port: 4000
  user: "root"
  password: ""

mysql-instances:
  -
    source-id: "instance-1"
    route-rules: ["user-route-rule", "store-route-rule", "sale-route-rule"]
    filter-rules: ["user-filter-rule", "store-filter-rule" , "sale-filter-rule"]
    black-white-list:  "log-bak-ignored"
    mydumper-config-name: "global"
    loader-config-name: "global"
    syncer-config-name: "global"

  -
    source-id: "instance-2"
    route-rules: ["user-route-rule", "store-route-rule", "sale-route-rule"]
    filter-rules: ["user-filter-rule", "store-filter-rule" , "sale-filter-rule"]
    black-white-list:  "log-bak-ignored"
    mydumper-config-name: "global"
    loader-config-name: "global"
    syncer-config-name: "global"
  -
    source-id: "instance-3"
    route-rules: ["user-route-rule", "store-route-rule", "sale-route-rule"]
    filter-rules: ["user-filter-rule", "store-filter-rule" , "sale-filter-rule"]
    black-white-list:  "log-bak-ignored"
    mydumper-config-name: "global"
    loader-config-name: "global"
    syncer-config-name: "global"

# Other common configs shared by all instances.

routes:
  user-route-rule:
    schema-pattern: "user"
    target-schema: "user"
  store-route-rule:
    schema-pattern: "store_*"
    target-schema: "store"
  sale-route-rule:
    schema-pattern: "store_*"
    table-pattern: "sale_*"
    target-schema: "store"
    target-table:  "sale"

filters:
  user-filter-rule:
    schema-pattern: "user"
    events: ["truncate table", "drop table", "delete", "drop database"]
    action: Ignore
  sale-filter-rule:
    schema-pattern: "store_*"
    table-pattern: "sale_*"
    events: ["truncate table", "drop table", "delete"]
    action: Ignore
  store-filter-rule:
    schema-pattern: "store_*"
    events: ["drop database"]
    action: Ignore

black-white-list:
  log-bak-ignored:
    ignore-tales:
    - db-name: "user"
      tbl-name: "log_bak"

mydumpers:
  global:
    threads: 4
    chunk-filesize: 64
    skip-tz-utc: true

loaders:
  global:
    pool-size: 16
    dir: "./dumped_data"

syncers:
  global:
    worker-count: 16
    batch: 100
    max-retry: 100
"Data Migration Shard Merge Scenario" was last updated Nov 27 2019: *: remove column mapping from dm docs (#1618) (81822cc)
Edit this page Request docs changes

What’s on this page

Product

  • TiDB
  • TiSpark
  • Roadmap

Docs

  • Quick Start
  • Best Practices
  • FAQ
  • TiDB Tools
  • Release Notes

Resources

  • Blog
  • Weekly
  • GitHub
  • TiDB Community

Company

  • About
  • Careers
  • News
  • Contact Us
  • Privacy Policy
  • Terms of Service

Connect

  • Twitter
  • LinkedIn
  • Reddit
  • Google Group
  • Stack Overflow

© 2019 PingCAP. All Rights Reserved.

中文