TiDB on Arm-based Kubernetes Cluster Achieves Up to 25% Better Price-Performance Ratio than x86

2021-03-10Ron XingProduct

Author: Ron Xing (Customer Support Engineer at PingCAP)

Transcreator: Caitin Chen; Editor: Tom Dewan

TiDB on EKS Arm vs. x86 benchmark

Benchmark purpose

The following tests compare the performance of TiDB, a MySQL compatible NewSQL database, running on an Arm-based Amazon Elastic Kubernetes Service (EKS) cluster and on an x86-based EKS cluster. The tests use an Online Transactional Processing (OLTP) workload, and the benchmarking tools are TPC-C and sysbench.

Benchmark environment

The test used two EKS clusters with the following instance types and topology.

Instance types

The following table summarizes all the EC2 instances we used in the EKS clusters as well as the benchmark node.

Instance size (CPU Architecture)vCPUsMemory (GiB)Instance storage (GB)Network bandwidth (Gbps)EBS bandwidth (Mbps)
c6g.large(Arm)48EBS-onlyUp to 10Up to 4,750
c6g.2xlarge(Arm)816EBS-onlyUp to 10Up to 4,750
r6g.2xlarge(Arm)864EBS-onlyUp to 10Up to 4,750
c5.large(x86)48EBS-onlyUp to 10Up to 4,750
c5.2xlarge(x86)816EBS-onlyUp to 10Up to 4,750
r5.2xlarge(x86)864EBS-onlyUp to 10Up to 4,750
c5.4xlarge(x86)1632EBS-only104,750

Storage types

The following table summarizes the disk that we used for different components.

Service typeStorageSize (GB)IOPSThroughput (MiB/s)Instances
TiKVEBS gp31,00016,0006003
PDEBS gp2501501281
TPC-C/sysbenchEBS gp31,50016,0001,0001

Topology

We used one c5.4xlarge EC2 instance as a benchmark node where TPC-C and sysbench are deployed.

Each EKS cluster consists of seven worker nodes and one admin (control plane) node. Seven worker nodes serve as dedicated TiDB, TiKV, and PD nodes.

Cluster/processorService typeEC2 typeInstances
Cluster 1: Graviton2 ArmTiDBc6g.2xlarge3
TiKVr6g.2xlarge3
PDc6g.large1
Admin nodec6g.large1
Cluster 2: Intel Xeon Platinum 8000 seriesTiDBc5.2xlarge3
TiKVr5.2xlarge3
PDc5.large1
Admin nodec5.large1

Software version

The TiDB cluster software versions and the sysbench tool version are listed below.

Service typeSoftware version
TiDBv4.0.10
TiKVv4.0.10
PDv4.0.10
TPC-C (embedded with TiUP)v1.0.8
sysbenchv1.0.20

Cost

In the following cost examples:

  • Cost calculations are based on the on-demand rate for instances in the US West (Oregon) and Asia Pacific (Singapore) regions** **in US dollars (USD) per month.
  • Monthly calculations are based on 730 hours of usage per month.

Storage

In the following table, the total cost per month includes a daily snapshot.

Volume typeSize (GB)IOPSThroughput (MiB/s)Instances Monthly cost (US)Monthly cost (APAC)
gp31,000

16,000

600

3

648.27

746.58

gp250

100

128

1

9.75

10.75

EC2

We used the following Graviton2 Arm processor based instances. Note that unit prices are based on the rate in both the US region and the APAC region.

Service typeEC2 typeInstancesUS regionAPAC region
Unit price (USD/hr)Monthly costUnit price (USD/hr) Monthly cost
TiDBc6g.2xlarge

3

0.272

595.68

0.3136

686.78

TiKVr6g.2xlarge

3

0.4032

883.01

0.4864

1,065.21

PDc6g.large

1

0.068

49.64

0.0784

57.23

Control planec6g.large

1

0.068

49.64

0.0784

57.23

Total

1,577.97

1,866.46

Here are the configurations and costs of our Intel Xeon Platinum 8000 series processors based instances:

Service typeEC2 typeInstancesUS regionAPAC region
Unit price (USD/hr)Monthly costUnit price (USD/hr) Monthly cost
TiDBc5.2xlarge

3

0.34

744.60

0.392

858.48

TiKVr5.2xlarge

3

0.504

1,103.76

0.608

1,331.52

PDc5.large

1

0.085

62.05

0.098

71.54

Control planec5.large

1

0.085

62.05

0.098

71.54

Total

1,972.46

2,333.08

Total cost

The following tables summarize the total costs per month in the US West (Oregon) and Asian Pacific (Singapore) regions. All costs are in US dollars per month.

AWS US West (Oregon)

CPU typeEC2 costStorage costEKS costTotal cost
Arm

1,577.97

740.86

73.00

2,391.83

x86

1,972.46

740.86

73.00

2,786.32

AWS Asia Pacific (Singapore)

CPU typeEC2 costStorage costEKS costTotal cost
Arm

1,866.46

856.71

73.00

2,796.17

x86

2,333.08

856.71

73.00

3,262.79

Preparation

To deploy a TiDB cluster on an x86-based EKS cluster, follow the steps in Deploy TiDB on AWS EKS.

To deploy a TiDB cluster on an Arm-based EKS cluster, follow the steps in TiDB Deployment on Graviton2-based EKS. The following temporary Arm images are used for benchmarking:

  • pingcap2021/tidb-operator:v1.1.11
  • pingcap2021/pd:v4.0.10
  • pingcap2021/tikv:v4.0.10
  • pingcap2021/tidb:v4.0.10
  • pingcap2021/tidb-monitor-initializer:v4.0.10

Above images are temporary and not meant for a production environment. Stay tuned for the official Arm images.

TPC-C benchmark

As you review the following benchmark tests, keep in mind that these are preliminary results. They should not be considered official TPC-C results.

To facilitate benchmarking, TiUP has integrated the bench component, which provides two workloads for stress testing: TPC-C and TPC-H. The commands and flags are as follows:

tiup bench 
Starting component `bench`: /Users/joshua/.tiup/components/bench/v0.0.1/bench 
Benchmark database with different workloads

Usage:
  tiup bench [command]

Available Commands:
  help        Help about any command
  tpcc
  tpch

Flags:
      --count int           Total execution count, 0 means infinite
  -D, --db string           Database name (default "test")
  -d, --driver string       Database driver: mysql
      --dropdata            Cleanup data before prepare
  -h, --help                help for /Users/joshua/.tiup/components/bench/v0.0.1/bench
  -H, --host string         Database host (default "127.0.0.1")
      --ignore-error        Ignore error when running workload
      --interval duration   Output interval time (default 10s)
      --isolation int       Isolation Level 0: Default, 1: ReadUncommitted, 
                            2: ReadCommitted, 3: WriteCommitted, 4: RepeatableRead, 
                            5: Snapshot, 6: Serializable, 7: Linerizable
      --max-procs int       runtime.GOMAXPROCS
  -p, --password string     Database password
  -P, --port int            Database port (default 4000)
      --pprof string        Address of pprof endpoint
      --silence             Do not print error when running workload
      --summary             Print summary TPM only, or also print current TPM when running workload
  -T, --threads int         Thread concurrency (default 16)
      --time duration       Total execution time (default 2562047h47m16.854775807s)
  -U, --user string         Database user (default "root")

For TPC-C, the TiUP bench component supports the following commands and flags to run the test:

tiup bench tpcc
Available Commands:
  check       Check data consistency for the workload
  cleanup     Cleanup data for the workload
  prepare     Prepare data for the workload
  run         Run workload

Flags:
      --check-all        Run all consistency checks
  -h, --help             help for tpcc
      --output string    Output directory for generating csv file when preparing data
      --parts int        Number to partition warehouses (default 1)
      --tables string    Specified tables for generating file, separated by ','. Valid only if output is set. If this flag is not set, generate all tables by default.
      --warehouses int   Number of warehouses (default 10)

TPC-C workloads

This table summarizes the workloads we used, both in terms of the number of warehouses and the data sizes.

WorkloadWarehousesData size
Lagre15000~500 GB
Large210000~1 TB

TPC-C test procedures

  1. On the benchmark VM (c5.4xlarge), deploy the latest version of TiUP.

  2. Create warehouses. You should specify the hostname as the load balancer's DNS name since we deployed the TiDB in EKS and exposed the database service as the LoadBalancer type:

    tiup bench tpcc --warehouses 10000 --host xxxxxxxxxxx.elb.us-west-2.amazonaws.com prepare
  3. Run the TPC-C test for different threads. (We used 150, 300, 500, 800, and 1000). Each test runs for 30 minutes.

    tiup bench tpcc --warehouses 10000 --host xxxxxxxxxxx.elb.us-west-2.amazonaws.com --threads 150 --time 10m run
  4. Note the tpmC result for each test case. The following is sample output:

    Finished
    [Summary] DELIVERY - Takes(s): 1796.7, Count: 88527, TPM: 2956.3, Sum(ms): 242093840, Avg(ms): 2734, 90th(ms): 4000, 99th(ms): 8000, 99.9th(ms): 8000
    [Summary] DELIVERY_ERR - Takes(s): 1796.7, Count: 133, TPM: 4.4, Sum(ms): 206560, Avg(ms): 1553, 90th(ms): 4000, 99th(ms): 4000, 99.9th(ms): 8000
    [Summary] NEW_ORDER - Takes(s): 1798.8, Count: 1002915, TPM: 33453.0, Sum(ms): 916326214, Avg(ms): 913, 90th(ms): 1500, 99th(ms): 2000, 99.9th(ms): 4000
    [Summary] NEW_ORDER_ERR - Takes(s): 1798.8, Count: 319, TPM: 10.6, Sum(ms): 118662, Avg(ms): 371, 90th(ms): 1000, 99th(ms): 1500, 99.9th(ms): 1500
    [Summary] ORDER_STATUS - Takes(s): 1798.9, Count: 89022, TPM: 2969.3, Sum(ms): 4346202, Avg(ms): 48, 90th(ms): 80, 99th(ms): 160, 99.9th(ms): 512
    [Summary] ORDER_STATUS_ERR - Takes(s): 1798.9, Count: 1, TPM: 0.0, Sum(ms): 19, Avg(ms): 19, 90th(ms): 20, 99th(ms): 20, 99.9th(ms): 20
    [Summary] PAYMENT - Takes(s): 1798.9, Count: 956516, TPM: 31903.7, Sum(ms): 628421123, Avg(ms): 656, 90th(ms): 1000, 99th(ms): 1500, 99.9th(ms): 2000
    [Summary] PAYMENT_ERR - Takes(s): 1798.9, Count: 201, TPM: 6.7, Sum(ms): 46899, Avg(ms): 233, 90th(ms): 512, 99th(ms): 1000, 99.9th(ms): 1000
    [Summary] STOCK_LEVEL - Takes(s): 1798.9, Count: 89370, TPM: 2980.8, Sum(ms): 6052088, Avg(ms): 67, 90th(ms): 112, 99th(ms): 256, 99.9th(ms): 512
    [Summary] STOCK_LEVEL_ERR - Takes(s): 1798.9, Count: 3, TPM: 0.1, Sum(ms): 342, Avg(ms): 114, 90th(ms): 192, 99th(ms): 192, 99.9th(ms): 192
    tpmC: 33453.0
  5. To proceed with the next workload, change the number of warehouses.

    tiup bench tpcc --warehouses 5000 --host xxxxxxxxxxx.elb.us-west-2.amazonaws.com --threads 150 --time 10m run
  6. Beginning at step 2, repeat this procedure for different numbers of warehouses.

TPC-C benchmark results

The following table shows the results for a large1 workload:

tpmC1503005008001000
x86

30603.8

33270.4

36813.8

36388.6

34956.3

Arm

34114.3

36715.2

42493.1

41275.0

41426.9

(Arm-x86)/x86

11.47%

10.35%

15.43%

13.43%

18.51%

TPC-C Arm vs. x86 on EKS for a large1 workload

TPC-C Arm vs. x86 on EKS for a large1 workload

The following table shows the results for a large2 workload:

tpmC1503005008001,000
x86

28624.3

33464.3

31478.6

33892.9

31562.9

Arm

29858.5

35259.4

34088.0

35899.8

33453.0

(Arm-x86)/x86

4.31%

5.36%

8.29%

5.92%

5.99%

TPC-C Arm vs. x86 on EKS for a large2 workload

TPC-C Arm vs. x86 on EKS for a large2 workload

TPC-C price-performance ratio

In the following price-performance table:

  • The tpmC values are derived from the average tpmC among 300, 500, and 800 threads.
  • The total system costs for the US and APAC regions reflect the estimated five year hardware cost.
  • The price-performance ratios compare x86 and Graviton2 Arm processors. A lower cost for more performance is better.
  • All costs are in US dollars.
WorkloadtpmCUS regionAPAC region
Total system costPrice-performanceTotal system costPrice-performance
x86Large1

35,490.93

162,208.80

4.57

189,804.60

5.35

ArmLarge1

40,161.10

138,539.40

3.45

161,807.40

4.03

(Arm-x86)/x86

-24.52%

-24.66%

x86Large232,945.27

162,208.80

4.92

189,804.60

5.76

ArmLarge235,082.40

138,539.40

3.95

161,807.40

4.61

(Arm-x86)/x86

-19.79%

-19.94%

As the following graphic shows, when we compare the absolute tpmC performance under each workload, the Arm-based system outperforms the x86-based system by 5%-15%. After we factor in the hardware cost, the Arm price-performance ratio is up to 25% lower than x86.

TPC-C price-performance ratio

TPC-C price-performance ratio

Sysbench

We will be using oltp_read_write.lua to test the performance for the OLTP workload.

Sysbench workloads

  • Read (75%) and Write (25%)
  • Table: 16
  • Table size: 10 M rows per table
  • Data size: around 100 GB

Sysbench test procedures

  1. Deploy the latest version of sysbench on the benchmark VM (c5.4xlarge).

  2. Set the sysbench configurations and save them in the configuration file. The following is a sample configuration file:

    mysql-host={TIDB_HOST}
    mysql-port=4000
    mysql-user=root
    mysql-password=password
    mysql-db=sbtest
    time=600
    threads=8 # set to 8 while importing the data
    report-interval=10
    db-driver=mysql
  3. Before you import the data, it is necessary to make some settings to TiDB. Execute the following command in MySQL client:

    set global tidb_disable_txn_auto_retry = off;
  4. Use BR to import the prepared data in S3.

  5. Change the thread in config file to 300.

  6. Run the sysbench test:

    sysbench --config-file=config oltp_read_write --tables=16 --table-size=10000000 run
  7. Note the test results. The following are sample test results:

    SQL statistics:
        queries performed:
            read:                            10457930
            write:                           2966386
            other:                           1515584
            total:                           14939900
        transactions:                        746995 (2489.02 per sec.)
        queries:                             14939900 (49780.45 per sec.)
        ignored errors:                      0      (0.00 per sec.)
        reconnects:                          0      (0.00 per sec.)
    
    General statistics:
        total time:                          300.1146s
        total number of events:              746995
    
    Latency (ms):
            min:                                   43.84
            avg:                                  120.50
            max:                                  381.92
            95th percentile:                      153.02
            sum:                             90013122.40
    
    Threads fairness:
        events (avg/stddev):           2489.9833/235.76
        execution time (avg/stddev):   300.0437/0.03
  8. Beginning at step 5, repeat this procedure and set the thread to 600 and 900.

Sysbench benchmark results

Results for 300, 600, and 900 threads are listed below. Since the QPS and TPS in sysbench are proportional, we will only compare the TPS in our test.

Metrics300600900
x86 P95 latency (ms)

155.80

282.25

427.07

Arm P95 latency (ms)

147.61

267.41

383.33

x86 QPS

48814.94

52342.03

52413.17

Arm QPS

51892.58

55886.22

57465.90

x86 TPS

2440.75

2617.10

2620.66

Arm TPS

2594.63

2794.31

2873.30

For TPS: (Arm-x86)/x86

6.30%

6.77%

9.64%

Sysbench Arm vs. x86 on EKS

Sysbench Arm vs. x86 on EKS

Sysbench price-performance ratio

In the following price-performance table:

  • TPS values are derived from the average TPS among 300, 600, and 900 threads.
  • The total system cost reflects the estimated five year hardware cost. All costs are in US dollars.
  • Price-performance compares x86 and Graviton2 Arm processors. A lower value is better. That indicates a lower cost for more performance.
TPSUS regionAPAC region
Total system costPrice-performanceTotal system costPrice-performance
x86

2,559.50

162,208.80

63.38

189,804.60

74.16

Arm

2,754.08

138,539.40

50.30

161,807.40

58.75

(Arm-x86)/x86

-16.00%

-16.21%

Conclusion

Benchmarking results from both TPC-C and sysbench have shown that the Graviton2 processor has better performance compared to the x86 processor. In some cases, Graviton2 outperforms x86 by up to 18%. After factoring in the hardware cost, the Graviton2 processor has a better price-performance ratio than the x86—on average 20% lower. The results are only based on the workloads (100 GB, 500 GB, 1 TB) we tested. In the future, we will include more complex and larger workloads with better EBS storage support.

Benchmark

Ready to get started with TiDB?