TiDB Lightning TiDB-backend

TiDB Lightning supports two backends: Importer and TiDB. It determines how tidb-lightning delivers data into the target cluster.

The Importer-backend (default) requires tidb-lightning to first encode the SQL or CSV data into KV pairs, and relies on the external tikv-importer program to sort these KV pairs and ingest directly into the TiKV nodes.

The TiDB-backend requires tidb-lightning to encode these data into SQL INSERT statements, and has these statements executed directly on the TiDB node.

Back end Importer TiDB
Speed Fast (~300 GB/hr) Slow (~50 GB/hr)
Resource usage High Low
ACID respected while importing No Yes
Target tables Must be empty Can be populated

Deployment for TiDB-backend

When using the TiDB-backend, you no longer need tikv-importer. Compared with the standard deployment procedure, the TiDB-backend deployment has the following two differences:

  • Steps involving tikv-importer can all be skipped.
  • The configuration must be changed to indicate the TiDB-backend is used.

Hardware requirements

The speed of TiDB Lightning using TiDB-backend is limited by the SQL processing speed of TiDB. Therefore, even a lower-end machine may max out the possible performance. The recommended hardware configuration is:

  • 16 logical cores CPU
  • An SSD large enough to store the entire data source, preferring higher read speed
  • 1 Gigabit network card

Ansible deployment

  1. The [importer_server] section in inventory.ini can be left blank.

    # keep empty
  2. The tikv_importer_port setting in group_vars/all.yml is ignored, and the file group_vars/importer_server.yml does not need to be changed. But you need to edit conf/tidb-lightning.yml and change the backend setting to tidb.

        backend: "tidb"   # <-- change this
  3. Bootstrap and deploy the cluster as usual.

  4. Mount the data source for TiDB Lightning as usual.

  5. Start tidb-lightning as usual.

Manual deployment

You do not need to download and configure tikv-importer.

Before running tidb-lightning, add the following lines into the configuration file:

backend = "tidb"

or supplying the --backend tidb arguments when executing tidb-lightning.

Conflict resolution

The TiDB-backend supports importing to an already-populated table. However, the new data might cause a unique key conflict with the old data. You can control how to resolve the conflict by using this task configuration.

backend = "tidb"
on-duplicate = "replace" # or "error" or "ignore"
Setting Behavior on conflict Equivalent SQL statement
replace New entries replace old ones REPLACE INTO ...
ignore Keep old entries and ignore new ones INSERT IGNORE INTO ...
error Abort import INSERT INTO ...

Migrating from Loader to TiDB Lightning TiDB-backend

TiDB Lightning using the TiDB-backend can completely replace functions of Loader. The following list shows how to translate Loader configurations into TiDB Lightning configurations.

LoaderTiDB Lightning

# logging
log-level = "info"
log-file = "loader.log"

# Prometheus
status-addr = ":8272"

# concurrency
pool-size = 16
# logging
level = "info"
file = "tidb-lightning.log"

# Prometheus
pprof-port = 8289

# concurrency (better left as default)
#region-concurrency = 16

# checkpoint database

checkpoint-schema = "tidb_loader"

# checkpoint storage
enable = true
schema = "tidb_lightning_checkpoint"
# by default the checkpoint is stored in
# a local file, which is more efficient.
# but you could still choose to store the
# checkpoints in the target database with
# this setting:
#driver = "mysql"

# use the TiDB-backend
backend = "tidb"

# data source directory
dir = "/data/export/"
# data source directory
data-source-dir = "/data/export"
# TiDB connection parameters
host = ""
port = 4000

user = "root"
password = ""

#sql-mode = ""
# TiDB connection parameters
host = ""
port = 4000
status-port = 10080  # <- this is required
user = "root"
password = ""

#sql-mode = ""