TiDB Lightning “TiDB” Back End

TiDB Lightning supports two back ends: “Importer” and “TiDB”. It determines how tidb-lightning delivers data into the target cluster.

The “Importer” back end (default) requires tidb-lightning to first encode the SQL or CSV data into KV pairs, and relies on the external tikv-importer program to sort these KV pairs and ingest directly into the TiKV nodes.

The “TiDB” back end requires tidb-lightning to encode these data into SQL INSERT statements, and has these statements executed directly on the TiDB node.

Back end “Importer” “TiDB”
Speed Fast (~300 GB/hr) Slow (~50 GB/hr)
Resource usage High Low
ACID respected while importing No Yes
Target tables Must be empty Can be populated

Deployment for “TiDB” back end

When using the “TiDB” back end, you no longer need tikv-importer. Compared with the standard deployment procedure, the “TiDB” back end deployment has the following two differences:

  • Steps involving tikv-importer can all be skipped.
  • The configuration must be changed to indicate the “TiDB” back end is used.

Ansible deployment

  1. The [importer_server] section in inventory.ini can be left blank.

    # keep empty
  2. The tikv_importer_port setting in group_vars/all.yml is ignored, and the file group_vars/importer_server.yml does not need to be changed. But you need to edit conf/tidb-lightning.yml and change the backend setting to tidb.

        backend: "tidb"   # <-- change this
  3. Bootstrap and deploy the cluster as usual.

  4. Mount the data source for TiDB Lightning as usual.

  5. Start tidb-lightning as usual.

Manual deployment

You do not need to download and configure tikv-importer.

Before running tidb-lightning, add the following lines into the configuration file:

backend = "tidb"

or supplying the --backend tidb arguments when executing tidb-lightning.

Conflict resolution

The “TiDB” back end supports importing to an already-populated table. However, the new data might cause a unique key conflict with the old data. You can control how to resolve the conflict by using this task configuration.

backend = "tidb"
on-duplicate = "replace" # or "error" or "ignore"
Setting Behavior on conflict Equivalent SQL statement
replace New entries replace old ones REPLACE INTO ...
ignore Keep old entries and ignore new ones INSERT IGNORE INTO ...
error Abort import INSERT INTO ...

Migrating from Loader to TiDB Lightning “TiDB” back end

TiDB Lightning using the “TiDB” back end can completely replace functions of Loader. The following list shows how to translate Loader configurations into TiDB Lightning configurations.

LoaderTiDB Lightning

# logging
log-level = "info"
log-file = "loader.log"

# Prometheus
status-addr = ":8272"

# concurrency
pool-size = 16
# logging
level = "info"
file = "tidb-lightning.log"

# Prometheus
pprof-port = 8289

# concurrency (better left as default)
#region-concurrency = 16

# checkpoint database

checkpoint-schema = "tidb_loader"

# checkpoint storage
enable = true
schema = "tidb_lightning_checkpoint"
# by default the checkpoint is stored in
# a local file, which is more efficient.
# but you could still choose to store the
# checkpoints in the target database with
# this setting:
#driver = "mysql"

# use the "TiDB" back end
backend = "tidb"

# data source directory
dir = "/data/export/"
# data source directory
data-source-dir = "/data/export"
# TiDB connection parameters
host = ""
port = 4000

user = "root"
password = ""

#sql-mode = ""
# TiDB connection parameters
host = ""
port = 4000
status-port = 10080  # <- this is required
user = "root"
password = ""

#sql-mode = ""