Interaction Test on Online Workloads and ADD INDEX
Operations
Test purpose
This document tests the interaction effects between online workloads and ADD INDEX
operations in the OLTP scenario.
Test version, time, and place
TiDB version: v3.0.1
Time: July, 2019
Place: Beijing
Test environment
This test runs in a Kubernetes cluster deployed with 3 TiDB instances, 3 TiKV instances and 3 PD instances.
Component |
GitHash |
TiDB |
9e4e8da3c58c65123db5f26409759fe1847529f8 |
TiKV |
4151dc8878985df191b47851d67ca21365396133 |
PD |
811ce0b9a1335d1b2a049fd97ef9e186f1c9efc1 |
Sysbench version:1.0.17
TiDB parameter configuration
TiDB, TiKV and PD all use the default TiDB Operator configuration.
Cluster topology
Machine IP |
Deployment instance |
172.31.8.8 |
Sysbench |
172.31.7.69, 172.31.5.152, 172.31.11.133 |
PD |
172.31.4.172, 172.31.1.155, 172.31.9.210 |
TiKV |
172.31.7.80, 172.31.5.163, 172.31.11.123 |
TiDB |
Online workloads simulation using Sysbench
Use Sysbench to import a table with 2,000,000 rows of data into the Kubernetes cluster.
Execute the following command to import data:
sysbench oltp_common \
--threads=16 \
--rand-type=uniform \
--db-driver=mysql \
--mysql-db=sbtest \
--mysql-host=$tidb_host \
--mysql-port=$tidb_port \
--mysql-user=root \
prepare --tables=1 --table-size=2000000
Execute the following command to run the test:
sysbench $testname \
--threads=$threads \
--time=300000 \
--report-interval=15 \
--rand-type=uniform \
--rand-seed=$RANDOM \
--db-driver=mysql \
--mysql-db=sbtest \
--mysql-host=$tidb_host \
--mysql-port=$tidb_port \
--mysql-user=root \
run --tables=1 --table-size=2000000
- Start the
oltp_read_write
test.
- Perform at the same time with step 1: use
alter table sbtest1 add index c_idx(c)
to add an index.
- Perform at the end of step 2: when the index is added successfully, stop the
oltp_read_write
test.
- Get the duration of
alter table ... add index
and the average TPS and QPS of Sysbench in this period.
- Gradually increase the value of two parameters
tidb_ddl_reorg_worker_cnt
and tidb_ddl_reorg_batch_size
, and then repeat step 1-4.
Test results
Test result of oltp_read_write
without ADD INDEX
operations
sysbench TPS |
sysbench QPS |
350.31 |
6806 |
tidb_ddl_reorg_batch_size = 32
tidb_ddl_reorg_worker_cnt |
add_index_durations(s) |
sysbench TPS |
sysbench QPS |
1 |
402 |
338.4 |
6776 |
2 |
266 |
330.3 |
6001 |
4 |
174 |
288.5 |
5769 |
8 |
129 |
280.6 |
5612 |
16 |
90 |
263.5 |
5273 |
32 |
54 |
229.2 |
4583 |
48 |
57 |
230.1 |
4601 |

tidb_ddl_reorg_batch_size = 64
tidb_ddl_reorg_worker_cnt |
add_index_durations(s) |
sysbench TPS |
sysbench QPS |
1 |
264 |
269.4 |
5388 |
2 |
163 |
266.2 |
5324 |
4 |
105 |
272.5 |
5430 |
8 |
78 |
262.5 |
5228 |
16 |
57 |
215.5 |
4308 |
32 |
42 |
185.2 |
3715 |
48 |
45 |
189.2 |
3794 |

tidb_ddl_reorg_batch_size = 128
tidb_ddl_reorg_worker_cnt |
add_index_durations(s) |
sysbench TPS |
sysbench QPS |
1 |
171 |
289.1 |
5779 |
2 |
110 |
274.2 |
5485 |
4 |
79 |
250.6 |
5011 |
8 |
51 |
246.1 |
4922 |
16 |
39 |
171.1 |
3431 |
32 |
35 |
130.8 |
2629 |
48 |
35 |
120.5 |
2425 |

tidb_ddl_reorg_batch_size = 256
tidb_ddl_reorg_worker_cnt |
add_index_durations(s) |
sysbench TPS |
sysbench QPS |
1 |
145 |
283.0 |
5659 |
2 |
96 |
282.2 |
5593 |
4 |
56 |
236.5 |
4735 |
8 |
45 |
194.2 |
3882 |
16 |
39 |
149.3 |
2893 |
32 |
36 |
113.5 |
2268 |
48 |
33 |
86.2 |
1715 |

tidb_ddl_reorg_batch_size = 512
tidb_ddl_reorg_worker_cnt |
add_index_durations(s) |
sysbench TPS |
sysbench QPS |
1 |
135 |
257.8 |
5147 |
2 |
78 |
252.8 |
5053 |
4 |
49 |
222.7 |
4478 |
8 |
36 |
145.4 |
2904 |
16 |
33 |
109 |
2190 |
32 |
33 |
72.5 |
1503 |
48 |
33 |
54.2 |
1318 |

tidb_ddl_reorg_batch_size = 1024
tidb_ddl_reorg_worker_cnt |
add_index_durations(s) |
sysbench TPS |
sysbench QPS |
1 |
111 |
244.3 |
4885 |
2 |
78 |
228.4 |
4573 |
4 |
54 |
168.8 |
3320 |
8 |
39 |
123.8 |
2475 |
16 |
36 |
59.6 |
1213 |
32 |
42 |
93.2 |
1835 |
48 |
51 |
115.7 |
2261 |

tidb_ddl_reorg_batch_size = 2048
tidb_ddl_reorg_worker_cnt |
add_index_durations(s) |
sysbench TPS |
sysbench QPS |
1 |
918 |
243.3 |
4855 |
2 |
1160 |
209.9 |
4194 |
4 |
342 |
185.4 |
3707 |
8 |
1316 |
151.0 |
3027 |
16 |
795 |
30.5 |
679 |
32 |
1130 |
26.69 |
547 |
48 |
893 |
27.5 |
552 |

tidb_ddl_reorg_batch_size = 4096
tidb_ddl_reorg_worker_cnt |
add_index_durations(s) |
sysbench TPS |
sysbench QPS |
1 |
3042 |
200.0 |
4001 |
2 |
3022 |
203.8 |
4076 |
4 |
858 |
195.5 |
3971 |
8 |
3015 |
177.1 |
3522 |
16 |
837 |
143.8 |
2875 |
32 |
942 |
114 |
2267 |
48 |
187 |
54.2 |
1416 |

Test conclusion
When you perform frequent write operations (this test involves UPDATE
, INSERT
and DELETE
operations) to the target column of the ADD INDEX
statement, the default ADD INDEX
configuration has a significant impact on the online workload of the system. It is mainly because of the write conflicts caused by the concurrent ADD INDEX
operation and column update. The performance of the system is as follows:
- As the value of
tidb_ddl_reorg_worker_cnt
and tidb_ddl_reorg_batch_size
parameters increase, the value of TiKV_prewrite_latch_wait_duration
increases significantly, slowing down the write speed.
- When the value of
tidb_ddl_reorg_worker_cnt
and tidb_ddl_reorg_batch_size
are very large, you can execute the admin show ddl
command to see multiple retry attempts of the DDL job, such as Write conflict, txnStartTS 410327455965380624 is stale [try again later], ErrCount:38, SnapshotVersion: 410327228136030220
. In this situation, the ADD INDEX
operation takes a very long time to complete.
- Start the
oltp_read_only
test.
- Perform at the same time with step 1: use
alter table sbtest1 add index c_idx(c)
to add an index.
- Perform at the end of step 2: when the index is added successfully, stop the
oltp_read_only
test.
- Get the duration of
alter table ... add index
and the average TPS and QPS of Sysbench in this period.
- Gradually increase the value of two parameters
tidb_ddl_reorg_worker_cnt
and tidb_ddl_reorg_batch_size
, and then repeat step 1-4.
Test results
Test result of oltp_read_only
without ADD INDEX
operations
sysbench TPS |
sysbench QPS |
550.9 |
8812.8 |
tidb_ddl_reorg_batch_size = 32
tidb_ddl_reorg_worker_cnt |
add_index_durations(s) |
sysbench TPS |
sysbench QPS |
1 |
376 |
548.9 |
8780 |
2 |
212 |
541.5 |
8523 |
4 |
135 |
538.6 |
8549 |
8 |
114 |
536.7 |
8393 |
16 |
77 |
533.9 |
8292 |
32 |
46 |
533.4 |
8103 |
48 |
46 |
532.2 |
8074 |

tidb_ddl_reorg_batch_size = 1024
tidb_ddl_reorg_worker_cnt |
add_index_durations(s) |
sysbench TPS |
sysbench QPS |
1 |
91 |
536.8 |
8316 |
2 |
52 |
533.9 |
8165 |
4 |
40 |
522.4 |
7947 |
8 |
36 |
510 |
7860 |
16 |
33 |
485.5 |
7704 |
32 |
31 |
467.5 |
7516 |
48 |
30 |
562.1 |
7442 |

tidb_ddl_reorg_batch_size = 4096
tidb_ddl_reorg_worker_cnt |
add_index_durations(s) |
sysbench TPS |
sysbench QPS |
1 |
103 |
502.2 |
7823 |
2 |
63 |
486.5 |
7672 |
4 |
52 |
467.4 |
7516 |
8 |
39 |
452.5 |
7302 |
16 |
35 |
447.2 |
7206 |
32 |
30 |
441.9 |
7057 |
48 |
30 |
440.1 |
7004 |

Test conclusion
When you only perform query operations to the target column of the ADD INDEX
statement, the effect of ADD INDEX
operations on online workloads is not obvious.
Test plan 3: the target column of the ADD INDEX
statement is irrelevant to online workloads
- Start the
oltp_read_write
test.
- Perform at the same time with step 1: use
alter table test add index pad_idx(pad)
to add an index.
- Perform at the end of step 2: when the index is added successfully, stop the
oltp_read_only
test.
- Get the duration of
alter table ... add index
and the average TPS and QPS of Sysbench in this period.
- Gradually increase the value of two parameters
tidb_ddl_reorg_worker_cnt
and tidb_ddl_reorg_batch_size
, and then repeat step 1-4.
Test results
Test result of oltp_read_write
without ADD INDEX
operations
sysbench TPS |
sysbench QPS |
350.31 |
6806 |
tidb_ddl_reorg_batch_size = 32
tidb_ddl_reorg_worker_cnt |
add_index_durations(s) |
sysbench TPS |
sysbench QPS |
1 |
372 |
350.4 |
6892 |
2 |
207 |
344.2 |
6700 |
4 |
140 |
343.1 |
6672 |
8 |
121 |
339.1 |
6579 |
16 |
76 |
340 |
6607 |
32 |
42 |
343.1 |
6695 |
48 |
42 |
333.4 |
6454 |

tidb_ddl_reorg_batch_size = 1024
tidb_ddl_reorg_worker_cnt |
add_index_durations(s) |
sysbench TPS |
sysbench QPS |
1 |
94 |
352.4 |
6794 |
2 |
50 |
332 |
6493 |
4 |
45 |
330 |
6456 |
8 |
36 |
325.5 |
6324 |
16 |
32 |
312.5 |
6294 |
32 |
32 |
300.6 |
6017 |
48 |
31 |
279.5 |
5612 |

tidb_ddl_reorg_batch_size = 4096
tidb_ddl_reorg_worker_cnt |
add_index_durations(s) |
sysbench TPS |
sysbench QPS |
1 |
116 |
325.5 |
6324 |
2 |
65 |
312.5 |
6290 |
4 |
50 |
300.6 |
6017 |
8 |
37 |
279.5 |
5612 |
16 |
34 |
250.4 |
5365 |
32 |
32 |
220.2 |
4924 |
48 |
33 |
214.8 |
4544 |

Test conclusion
When the target column of the ADD INDEX
statement is irrelevant to online workloads, the effect of ADD INDEX
operations on the workload is not obvious.
Summary
- When you perform frequent write operations (including
INSERT
, DELETE
and UPDATE
operations) to the target column of the ADD INDEX
statement, the default ADD INDEX
configuration causes relatively frequent write conflicts, which has a great impact on online workloads. At the same time, the ADD INDEX
operation takes a long time to complete due to continuous retry attempts. In this test, you can modify the product of tidb_ddl_reorg_worker_cnt
and tidb_ddl_reorg_batch_size
to 1⁄32 of the default value. For example, you can set tidb_ddl_reorg_worker_cnt
to 4
and tidb_ddl_reorg_batch_size
to 256
for better performance.
- When only performing query operations to the target column of the
ADD INDEX
statement or the target column is not directly related to online workloads, you can use the default ADD INDEX
configuration.