Data Migration or DM is an exciting new toolchain that is currently available as an Alpha release. What does it do? DM simplifies many of the provisioning steps when setting up TiDB to be a replica of a MySQL or MariaDB system.
So lets say I have the following topology of a MySQL master with slaves (draw picture).
I can have DM first create a backup of this slave using mydumper, and import it into TIDB. At the same time, DM will track the MySQL systems binary log coordinates and then start shipping them to TiDB once the restore is complete. So you can think of it as similar to setting up a new replication slave, with the provisioning included.
Let's look at the architecture of DM, as it works differently to MySQL replication.
So to start with we have a DM-master which handles the central coordination of the DM system. We will get into what coordination is in just a second.
And then we have one or more DM workers. DM workers execute tasks, such as creating the initial mydumper backup, importing it into TiDB, or acting as a MySQL slave and then pushing the changes into MySQL.
Now I know you may be asking, why does it have this architecture? And the answer is that it supports deploying multiple workers. So for example lets consider the case that my MySQL system is sharded (draws picture).
In this scenario we can have DM merge the sharded data into a single-view of the world that is no longer sharded inside of TiDB. Or more accurately the data is transparently sharded into regions inside TiKV.
Here is where the master server comes into play as well. As the MySQL system is sharded, there might be schema changes that are being applied to it in a non atomic fashion. That is to say that shard #1 finished applying a DDL change before shard #2 and shard #3 finished last.
DM supports a shard DDL lock to ensure operations are performed in the correct order. The locking mechanism is centralized in the DM master.
So to summarize the DM architecture, there are three components or activities: