TiDB Tool Insights: How DM Handles DML?

TiDB’s one-click horizontal scaling feature helps users eliminate the complexity brought by sharding and its associated query and maintenance operations. However, this complexity shifts to the data migration process during the transition from a sharded architecture to TiDB. The TiDB Data Migration (DM) tool allows users to merge and migrate sharded data.

This article introduces the core processing unit of DM, Sync. The content includes binlog reading, filtering, routing, transformation, optimization, and execution logic.

From the diagram, we can roughly understand the logical processing flow of binlog replication:

Read binlog events from MySQL/MariaDB or relay log.
Process and transform binlog events:
- Binlog Filter: Filter binlog events based on binlog expressions configured through filters.
- Routing: Transform database/table names based on routing rules configured through routes.
- Expression Filter: Filter binlog events based on SQL expressions configured through expression-filter.
Optimize DML execution:
- Compactor: Enabled through syncer. compact, this function merges multiple operations on the same record (with the same primary key) into one operation.
- Causality: Detect conflicts among different records (with different primary keys) and distribute them to different groups for concurrent processing.
- Merger: Merge multiple binlog events into a single DML, enabled through syncer.multiple-rows.
Execute DML to the downstream.
Periodically save binlog position/gtid to the checkpoint.

Optimization Logic

Compactor

DM captures changes in the records based on the upstream binlog and synchronizes these changes to the downstream. When multiple changes (insert/update/delete) are made to the same record in a short period in the upstream, DM can use the Compactor to compress these changes into a single change. This reduces the pressure on the downstream and improves throughput. For example:

If an insert is followed by an update on the same record, DM will only replicate the insert with the updated values.
If an update is followed by another update on the same record, DM will only replicate the final state of the record.
If a delete is followed by an insert on the same record, DM will treat it as a replace operation.
If an insert is followed by a delete on the same record, DM will ignore both operations as they cancel each other out.

Causality

The MySQL binlog sequential synchronization model requires binlog events to be synchronized individually in the order they are recorded. This sequential synchronization cannot meet the high QPS and low synchronization latency requirements, and not all operations involved in the binlog are conflicting.

DM uses a conflict detection mechanism to identify binlog events that must be executed in order. While ensuring the sequential execution of these binlog events, DM maximizes the concurrent execution of other binlog events to meet performance requirements.

Causality uses an algorithm similar to the union-find algorithm to classify each DML operation, grouping related DML operations.

Merger

In the MySQL binlog protocol, each binlog corresponds to a change operation on a single row of data. DM can merge multiple binlog entries into a single DML statement for execution downstream, reducing network interactions. For example:

Execution Logic

DML Generation

DM embeds a schema tracker to record the schema information of both upstream and downstream databases. When a DDL statement is received, DM updates the table structure in the internal schema tracker. When a DML statement is received, DM generates the corresponding DML based on the schema tracker. The detailed logic is as follows:

When starting a full + incremental task, Sync uses the table structure dumped during the upstream full sync as the initial upstream table structure.
Since MySQL binlog does not record table structure information when starting an incremental task, Sync uses the downstream corresponding table structure as the initial upstream table structure.
Because the upstream and downstream table structures may differ (e.g., the downstream has additional columns or different primary keys), DM records the primary and unique keys of the downstream corresponding tables to ensure data synchronization correctness.
When generating DML, DM uses the upstream table structure recorded in the schema tracker to generate the columns of the DML statement, uses the column values recorded in the binlog to create the column values of the DML statement, and uses the downstream primary/unique keys recorded in the schema tracker to generate the WHERE clause of the DML statement. If the table structure has no unique keys, DM will use all column values recorded in the binlog as the WHERE clause.

Worker Count

Causality can use conflict detection algorithms to divide binlog into multiple groups for concurrent downstream execution. DM controls the number of concurrent workers by setting worker-count. Increasing the concurrency can effectively improve data synchronization throughput when the downstream TiDB’s CPU usage is low. This is configured through syncer.worker-count.

Batch

DM accumulates multiple DML statements into a single transaction for execution downstream. When the DML Worker receives a DML, it is added to a buffer. When the buffer reaches a preset threshold, or no DML has been received for a long time, the buffered DMLs are executed downstream. This is configured through syncer.batch.

Operators and Task

Checkpoint

From the process flow diagram, we can see that DML execution and checkpoint updates are not atomic. By default, DM updates the checkpoint every 30 seconds. Additionally, since multiple DML worker processes exist, the checkpoint process calculates the latest binlog position of all DML workers’ synchronization progress and uses this as the current checkpoint. All binlog entries before this position are guaranteed to have been successfully executed downstream.

Transaction Consistency

As described above, DM performs data synchronization at the “row level.” An upstream transaction is split into multiple rows and distributed to different DML Workers for concurrent execution. When a DM synchronization task reports an error and pauses or a user manually pauses the task, the downstream might be inconsistent. Some DML statements from an upstream transaction might have been synchronized downstream, while others have not. To ensure the downstream is as consistent as possible when a task is paused, DM, starting from v5.3.0, will wait for all DML statements of an upstream transaction to be synchronized downstream before pausing the task. This waiting time is 10 seconds. If the upstream transaction is not fully synchronized downstream within 10 seconds, the downstream transaction may still be inconsistent.

Safemode

From the above execution logic, we can see that DML execution and checkpoint writing operations are not synchronized. Also, checkpoint writing and downstream data writing are not atomic. When DM exits abnormally, the checkpoint might only be recorded as a recovery point before the exit. Therefore, when the synchronization task restarts, DM might re-write some data, meaning DM provides “at-least-once processing” logic, where the same data might be processed more than once. To ensure data reentrancy, DM enters safe mode during abnormal restarts. The specific logic is as follows:

When a DM task is paused normally, all DML statements in memory are synchronized downstream, and the checkpoint is refreshed. The task will not enter safe mode upon normal restart because all data before the checkpoint has been synchronized downstream, and data after the checkpoint has not been synchronized so that no data will be reprocessed.
When a task is paused abnormally, DM will try to synchronize all DML statements in memory downstream. This might fail (due to downstream data conflicts, etc.). Then, DM records the latest binlog position pulled from the upstream, noted as safemode_exit_point, and refreshes this point along with the checkpoint downstream. After recovery, the following situations may occur: a. checkpoint == safemode_exit_point: This means all DML statements were synchronized downstream before DM paused, and the task does not enter safe mode upon restart. b. checkpoint < safemode_exit_point: Some DML statements failed to execute downstream during the pause, so the checkpoint is still an older point. The binlog entries between the checkpoint and safemode_exit_point will be executed in safe mode as they might have been synchronized once already. c. safemode_exit_point does not exist: The safemode_exit_point refresh failed or the DM process was forcibly terminated. In this case, DM cannot determine which data might be reprocessed. Therefore, after task recovery, safe mode will be enabled for two checkpoint intervals (default is one minute), after which safe mode will be disabled and normal synchronization will resume.

During safe mode, DM performs the following transformations to ensure data reentrancy:

Convert upstream INSERT statements to REPLACE statements.
Convert upstream UPDATE statements to DELETE + REPLACE statements.

Exactly-Once Processing

From the description above, we can see that DM’s logic of splitting transactions and synchronizing concurrently raises some issues, such as the downstream being in an inconsistent state, the synchronization order differing from the upstream, and data reentrancy issues (safe mode REPLACE statements may incur performance losses, and repeated processing is unacceptable if the downstream needs to capture data changes, e.g., CDC).

Therefore, we are considering implementing “exactly-once processing” logic. If you want to join us, please visit TiDB Community for a discussion.