Replicate Data Using Data Migration

This guide shows how to replicate data using the Data Migration (DM) tool.

Step 1: Deploy the DM cluster

It is recommended to deploy the DM cluster using DM-Ansible. For detailed deployment, see Deploy Data Migration Using DM-Ansible.

Note:

  • For database related passwords in all the DM configuration files, use the passwords encrypted by dmctl. If a database password is empty, it is unnecessary to encrypt it. See Encrypt the upstream MySQL user password using dmctl.
  • The user of the upstream and downstream databases must have the corresponding read and write privileges.

Step 2: Check the cluster information

After the DM cluster is deployed using DM-Ansible, the configuration information is like what is listed below.

  • The configuration information of related components in the DM cluster:

    Component IP Service port
    DM-worker 172.16.10.72 10081
    DM-worker 172.16.10.73 10081
    DM-master 172.16.10.71 11080
  • The information of upstream and downstream database instances:

    Database instance IP Port Username Encrypted password
    Upstream MySQL-1 172.16.10.81 3306 root VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=
    Upstream MySQL-2 172.16.10.82 3306 root VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU=
    Downstream TiDB 172.16.10.83 4000 root
  • The configuration in the DM-master process configuration file {ansible deploy}/conf/dm-master.toml:

    # Master configuration.
    
    [[deploy]]
    mysql-instance = "172.16.10.81:3306"
    dm-worker = "172.16.10.72:10081"
    
    [[deploy]]
    mysql-instance = "172.16.10.82:3306"
    dm-worker = "172.16.10.73:10081"
    

Step 3: Configure the data replication task

The following example assumes that you need to replicate all the test_table table data in the test_db database of both the upstream MySQL-1 and MySQL-2 instances, to the downstream test_table table in the test_db database of TiDB, in the full data plus incremental data mode.

You can refer to the task.yaml.example task configuration template in {ansible deploy}/conf, and then copy, edit, and generate the task.yaml task configuration file as below:

# The task name. You need to use a different name for each of the multiple tasks that
# run simultaneously.
name: "test"
# The full data plus incremental data (all) replication mode
task-mode: "all"
# Disables the heartbeat replication delay calculation
disable-heartbeat: true

# The downstream TiDB configuration information
target-database:
  host: "172.16.10.83"
  port: 4000
  user: "root"
  password: ""

# All the upstream MySQL instances that the current task needs to use
mysql-instances:
-
  config:                 # MySQL-1 configuration
    host: "172.16.10.81"
    port: 3306
    user: "root"
    # The ciphertext generated by a certain encryption of the plaintext `123456`.
    # The ciphertext generated by each encryption is different.
    password: "VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU="  
  # The instance ID of MySQL-1, corresponding to the `mysql-instance` in "dm-master.toml"
  instance-id: "172.16.10.81:3306"
  # The configuration item name of the black and white lists of the name of the
  # database/table to be replicated, used to quote the global black and white
  # lists configuration that is set in the global black-white-list map below.
  black-white-list: "global"
  # The configuration item name of mydumper, used to quote the global mydumper configuration
  mydumper-config-name: "global"

-
  config:                 # MySQL-2 configuration
    host: "172.16.10.82"
    port: 3306
    user: "root"
    password: "VjX8cEeTX+qcvZ3bPaO4h0C80pe/1aU="
  instance-id: "172.16.10.82:3306"
  black-white-list: "global"
  mydumper-config-name: "global"

# The global configuration of black and white lists. Each instance can quote it by the
# configuration item name.
black-white-list:
  global:
    do-tables:                        # The upstream tables to be replicated
    - db-name: "test_db"              # The database name of the table to be replicated
      tbl-name: "test_table"          # The name of the table to be replicated

# mydumper global configuration. Each instance can quote it by the configuration item name.
mydumpers:
  global:
    mydumper-path: "./bin/mydumper"   # The file path of the mydumper binary
    extra-args: "-B test_db -T test_table"  # Only dumps the "test_table" table of the "test_db" database

Step 4: Start the data replication task

  1. Come to the dmctl directory /home/tidb/dm-ansible/resource/bin/.

  2. Run the following command to start dmctl.

    ./dmctl --master-addr 172.16.10.71:11080
    
  3. Run the following command to start the data replication tasks.

    # `task.yaml` is the configuration file that is edited above.
    start-task task.yaml
    
    • If the above command returns the following result, it indicates the task is successfully started.

      {
          "result": true,
          "msg": "",
          "workers": [
              {
                  "result": true,
                  "worker": "172.16.10.72:10081",
                  "msg": ""
              },
              {
                  "result": true,
                  "worker": "172.16.10.73:10081",
                  "msg": ""
              }
          ]
      }
      
    • If the above command returns other information, you can edit the configuration according to the prompt, and then run the start-task task.yaml command to restart the task.

Step 5: Check the data replication task

If you need to check the task state or whether a certain data replication task is running in the DM cluster, run the following command in the dmctl directory:

query-status

Step 6: Stop the data replication task

If you do not need to replicate data any more, run the following command in the dmctl directory to stop the task:

# `test` is the task name that you set in the `name` configuration item of
# the `task.yaml` configuration file.
stop-task test

Step 7: Monitor the task and check logs

Assuming that Prometheus and Grafana are successfully deployed along with the DM cluster deployment using DM-Ansible, and the Grafana address is 172.16.10.71, you can open http://172.16.10.71:3000 in a browser, choose the DM dashboard to check the related monitoring metrics.

While the DM cluster is running, DM-master, DM-worker, and dmctl output the monitoring metrics information through logs. The log directory of each component is as follows:

  • DM-master log directory: It is specified by the --log-file DM-master process parameter. For DM-Ansible deployment, the log directory is {ansible deploy}/log/dm-master.log in the DM-master node.
  • DM-worker log directory: It is specified by the --log-file DM-worker process parameter. For DM-Ansible deployment, the log directory is {ansible deploy}/log/dm-worker.log in the DM-master node.
  • dmctl log directory: It is the same as the binary directory of dmctl.