This document shows how to use Data Migration (DM) in the shard merge scenario where the sharded schemas and sharded tables data of three upstream MySQL instances need to be replicated to a downstream TiDB cluster.
Assume that the upstream schemas are as follows:
Instance 1
Schema | Tables |
---|---|
user | information, log_north, log_bak |
store_01 | sale_01, sale_02 |
store_02 | sale_01, sale_02 |
Instance 2
Schema | Tables |
---|---|
user | information, log_east, log_bak |
store_01 | sale_01, sale_02 |
store_02 | sale_01, sale_02 |
Instance 3
Schema | Tables |
---|---|
user | information, log_south, log_bak |
store_01 | sale_01, sale_02 |
store_02 | sale_01, sale_02 |
user
.information
table of three upstream instances to the downstream user
.information
table in TiDB.user
.log_{north|south|east}
table of three upstream instances to the downstream user
.log_{north|south|east}
table in TiDB.store_{01|02}
.sale_{01|02}
table of three upstream instances to the downstream store
.sale
table in TiDB.user
.log_{north|south|east}
table of three upstream instances.user
.information
table of three upstream instances.store_{01|02}
.sale_{01|02}
table of three upstream instances.user
.log_bak
table of three upstream instances.store_{01|02}
.sale_{01|02}
tables have auto-increment primary keys of the bigint type, the conflict occurs when these tables are merged into TiDB. The following text will show you solutions to resolve and avoid the conflict.Assume that the downstream schema after replication is as follows:
Schema | Tables |
---|---|
user | information, log_north, log_east, log_south |
store | sale |
To satisfy the replication Requirements #1 and #2, configure the table routing rule as follows:
routes:
...
user-route-rule:
schema-pattern: "user"
target-schema: "user"
To satisfy the replication Requirement #3, configure the table routing rule as follows:
routes:
...
store-route-rule:
schema-pattern: "store_*"
target-schema: "store"
sale-route-rule:
schema-pattern: "store_*"
table-pattern: "sale_*"
target-schema: "store"
target-table: "sale"
To satisfy the replication Requirements #4 and #5, configure the binlog event filtering rule as follows:
filters:
...
user-filter-rule:
schema-pattern: "user"
events: ["truncate table", "drop table", "delete", "drop database"]
action: Ignore
Note:
The replication Requirements #4, #5 and #7 indicate that all the deletion operations in the
user
schema are filtered out, so a schema level filtering rule is configured here. However, the deletion operations of future tables in theuser
schema will also be filtered out.
To satisfy the replication Requirement #6, configure the binlog event filter rule as follows:
filters:
...
sale-filter-rule:
schema-pattern: "store_*"
table-pattern: "sale_*"
events: ["truncate table", "drop table", "delete"]
action: Ignore
store-filter-rule:
schema-pattern: "store_*"
events: ["drop database"]
action: Ignore
To satisfy the replication Requirement #7, configure the black and white table lists as follows:
black-white-list:
log-bak-ignored:
ignore-tales:
- db-name: "user"
tbl-name: "log_bak"
To satisfy the replication Requirement #8, first refer to handling conflicts of auto-increment primary key to solve conflicts. This guarantees that data is successfully replicated to the downstream when the primary key value of one sharded table is duplicate with that of another sharded table. Then, configure ignore-checking-items
to skip checking the conflict of auto-increment primary key:
ignore-checking-items: ["auto_increment_ID"]
The complete configuration of the replication task is shown as below. For more details, see Data Migration Task Configuration File.
name: "shard_merge"
task-mode: all
meta-schema: "dm_meta"
remove-meta: false
ignore-checking-items: ["auto_increment_ID"]
target-database:
host: "192.168.0.1"
port: 4000
user: "root"
password: ""
mysql-instances:
-
source-id: "instance-1"
route-rules: ["user-route-rule", "store-route-rule", "sale-route-rule"]
filter-rules: ["user-filter-rule", "store-filter-rule" , "sale-filter-rule"]
black-white-list: "log-bak-ignored"
mydumper-config-name: "global"
loader-config-name: "global"
syncer-config-name: "global"
-
source-id: "instance-2"
route-rules: ["user-route-rule", "store-route-rule", "sale-route-rule"]
filter-rules: ["user-filter-rule", "store-filter-rule" , "sale-filter-rule"]
black-white-list: "log-bak-ignored"
mydumper-config-name: "global"
loader-config-name: "global"
syncer-config-name: "global"
-
source-id: "instance-3"
route-rules: ["user-route-rule", "store-route-rule", "sale-route-rule"]
filter-rules: ["user-filter-rule", "store-filter-rule" , "sale-filter-rule"]
black-white-list: "log-bak-ignored"
mydumper-config-name: "global"
loader-config-name: "global"
syncer-config-name: "global"
# Other common configs shared by all instances.
routes:
user-route-rule:
schema-pattern: "user"
target-schema: "user"
store-route-rule:
schema-pattern: "store_*"
target-schema: "store"
sale-route-rule:
schema-pattern: "store_*"
table-pattern: "sale_*"
target-schema: "store"
target-table: "sale"
filters:
user-filter-rule:
schema-pattern: "user"
events: ["truncate table", "drop table", "delete", "drop database"]
action: Ignore
sale-filter-rule:
schema-pattern: "store_*"
table-pattern: "sale_*"
events: ["truncate table", "drop table", "delete"]
action: Ignore
store-filter-rule:
schema-pattern: "store_*"
events: ["drop database"]
action: Ignore
black-white-list:
log-bak-ignored:
ignore-tales:
- db-name: "user"
tbl-name: "log_bak"
mydumpers:
global:
threads: 4
chunk-filesize: 64
skip-tz-utc: true
loaders:
global:
pool-size: 16
dir: "./dumped_data"
syncers:
global:
worker-count: 16
batch: 100
max-retry: 100