This document introduces the sharding support feature provided by Data Migration (DM). This feature allows you to merge and replicate the data of tables with the same table schema in the upstream MySQL or MariaDB instances into one same table in the downstream TiDB. It supports not only replicating the upstream DML statements, but also coordinating to replicate the table schema change using DDL statements in multiple upstream sharded tables.
To merge and replicate data from the sharded tables, you must configure the
is-sharding: trueitem in the task configuration file.
DM has the following sharding DDL usage restrictions:
table_1before you add
column B, then you cannot add
table_2before you add
column A. Executing the DDL statements in a different order is not supported.
DM-worker-2, then other DM-workers that have executed the DDL statements pause their replication task and wait for
DM-worker-2to receive the upstream DDL statements.
DROP TABLEstatement of upstream sharded tables.
RENAME TABLE, but with the following limitations (Online DDL is supported in another solution):
RENAME TABLEstatement can only involve a single
CREATEa new table to a sharding group where DDL statements are being executed, you have to make sure that the table schema is the same as the newly modified table schema.
table_2have two columns (a, b) initially, and have three columns (a, b, c) after the sharding DDL operation, so after the replication the newly created table should also have three columns (a, b, c).
Currently, DM uses the binlog in the
ROW format to perform the replication task. The binlog does not contain the table schema information. When you use the
ROW binlog to replicate data, if you have not replicated multiple upstream tables into the same downstream table, then there only exist DDL operations of one upstream table that can update the table schema of the downstream table. The
ROW binlog can be considered to have the nature of self-description. During the replication process, the DML statements can be constructed accordingly with the column values and the downstream table schema.
However, in the process of merging and replicating sharded tables, if DDL statements are executed on the upstream tables to modify the table schema, then you need to perform extra operations to replicate the DDL statements so as to avoid the inconsistency between the DML statements produced by the column values and the actual downstream table schema.
Here is a simple example:
In the above example, the merging process is simplified, where only two MySQL instances exist in the upstream and each instance has only one table. When the replication begins, the table schema version of two sharded tables is marked as
schema V1, and the table schema version after executing DDL statements is marked as
Now assume that in the replication process, the binlog data received from the two upstream sharded tables has the following time sequence:
schema V1from the two sharded tables.
t1, the sharding DDL events from instance 1 are received.
t2on, the Syncer unit receives the DML events of
schema V2from instance 1; but from instance 2, it still receives the DML events of
t3, the sharding DDL events from instance 2 are received.
t4on, the Syncer unit receives the DML events of
schema V2from instance 2 as well.
Assume that the DDL statements of sharded tables are not processed during the replication process. After DDL statements of instance 1 are replicated to the downstream, the downstream table schema is changed to
schema V2. But for instance 2, the Syncer unit in DM-worker is still receiving DML events of
schema V1 from
t3. Therefore, when the DML statements of
schema V1 are replicated to the downstream, the inconsistency between the DML statements and the table schema can cause errors and the data cannot be replicated successfully.
This section shows how DM replicates DDL statements in the process of merging sharded tables based on the above example in the background section.
In this example,
DM-worker-1 replicates the data from MySQL instance 1 and
DM-worker-2 replicates the data from MySQL instance 2.
DM-master coordinates the DDL replication among multiple DM-workers. Starting from
DM-worker-1 receiving the DDL statements, the DDL replication process is simplified as follows:
DM-worker-1receives the DDL statement from MySQL instance 1 at
t1, pauses the data replication of the corresponding DDL and DML statements, and sends the DDL information to
DM-masterdecides that the replication of this DDL statement needs to be coordinated based on the received DDL information, creates a lock for this DDL statement, sends the DDL lock information back to
DM-worker-1as the owner of this lock at the same time.
DM-worker-2continues replicating the DML statement until it receives the DDL statement from MySQL instance 2 at
t3, pauses the data replication of this DDL statement, and sends the DDL information to
DM-masterdecides that the lock of this DDL statement already exists based on the received DDL information, and sends the lock information directly to
DM-masterdecides that it has received this DDL statement of all upstream sharded tables to be merged, and requests the owner of the DDL lock (
DM-worker-1) to replicate this DDL statement to the downstream.
DM-worker-1verifies the DDL statement execution request based on the DDL lock information received at Step #2, replicates this DDL statement to the downstream, and sends the results to
DM-master. If this operation is successful,
DM-worker-1continues replicating the subsequent (starting from the binlog at
t2) DML statements.
DM-masterreceives the response from the lock owner that the DDL is successfully executed, and requests all other DM-workers (
DM-worker-2) that are waiting for the DDL lock to ignore this DDL statement and then continue to replicate the subsequent (starting from the binlog at
t4) DML statements.
The characteristics of DM handling the sharding DDL replication among multiple DM-workers can be concluded as follows:
DM-masterto coordinate DDL replication. The group members are DM-workers that handle each sub-task divided from the replication task).
DM-mastercreates or updates the DDL lock based on the DDL information received from each DM-worker and the sharding group information.
In the above example, only one sharded table needs to be merged in the upstream MySQL instance corresponding to each DM-worker. But in actual scenarios, there might be multiple sharded tables in multiple sharded schemas to be merged in one MySQL instance. And when this happens, it becomes more complex to coordinate the sharding DDL replication.
Assume that there are two sharded tables, namely
table_2, to be merged in one MySQL instance:
Because data comes from the same MySQL instance, all the data is obtained from the same binlog stream. In this case, the time sequence is as follows:
schema V1from both sharded tables when the replication begins.
t1, the Syncer unit in DM-worker receives the DDL statements of
t3, the received data includes the DML statements of
table_1and the DML statements of
t3, the Syncer unit in DM-worker receives the DDL statements of
t4on, the Syncer unit in DM-worker receives the DML statements of
schema V2from both tables.
If the DDL statements are not processed particularly during the data replication, when the DDL statement of
table_1 is replicated to the downstream and changes the downstream table schema, the DML statement of
schema V1 from
table_2 cannot be replicated successfully. Therefore, within a single DM-worker, a logical sharding group similar to that within
DM-master is created, except that members of this group are different sharded tables in the same upstream MySQL instance.
But when a DM-worker coordinates the replication of the sharding group within itself, it is not totally the same as that performed by
DM-master. The reasons are as follows:
table_1, it cannot pause the replication and needs to continue parsing the binlog to get the subsequent DDL statements of
table_2. This means it needs to continue parsing between
t3, the DML statements of
table_1cannot be replicated to the downstream until the sharding DDL statement is replicated and successfully executed.
In DM, the simplified replication process of sharding DDL statements within the DM worker is as follows:
t1, the DM-worker records the DDL information and the current position of the binlog.
schema V2schema that belongs to
table_1, and replicates the DML statement with the
schema V1schema that belongs to
table_2to the downstream.
t3, the DM-worker records the DDL information and the current position of the binlog.
schema V2schema that belongs to
table_1to the downstream, and ignores the DML statement with the
schema V1schema that belongs to
You can conclude from the above analysis that DM mainly uses two-level sharding groups for coordination and control when handling replication of the sharding DDL. Here is the simplified process:
DM-mastercoordinates the DDL replication of the sharding group composed of the DM-workers based on the received DDL information.
DM-masterrequests the DDL lock owner (a specific DM-worker) to execute the DDL statement.
DM-master. Then the owner restarts the replication of the previously ignored DML statements during the internal coordination of DDL replication.
DM-masterconfirms that the owner has successfully executed the DDL statement, it asks all other DM-workers to continue the replication.