Book a Demo Start Instantly
tidb-on-eks-arm-vs-x86-benchmark.jpg

Author: Ron Xing (Customer Support Engineer at PingCAP)

Transcreator: Caitin Chen; Editor: Tom Dewan

Benchmark purpose

The following tests compare the performance of TiDB, a MySQL compatible NewSQL database, running on an Arm-based Amazon Elastic Kubernetes Service (EKS) cluster and on an x86-based EKS cluster. The tests use an Online Transactional Processing (OLTP) workload, and the benchmarking tools are TPC-C and sysbench. For more information on the inspiration of the benchmark, please refer to AWS Startups Blog.

Benchmark environment

The test used two EKS clusters with the following instance types and topology.

Instance types

The following table summarizes all the EC2 instances we used in the EKS clusters as well as the benchmark node.

Instance size (CPU Architecture) vCPUs Memory (GiB) Instance storage (GB) Network bandwidth (Gbps) EBS bandwidth (Mbps)
c6g.large(Arm) 4 8 EBS-only Up to 10 Up to 4,750
c6g.2xlarge(Arm) 8 16 EBS-only Up to 10 Up to 4,750
r6g.2xlarge(Arm) 8 64 EBS-only Up to 10 Up to 4,750
c5.large(x86) 4 8 EBS-only Up to 10 Up to 4,750
c5.2xlarge(x86) 8 16 EBS-only Up to 10 Up to 4,750
r5.2xlarge(x86) 8 64 EBS-only Up to 10 Up to 4,750
c5.4xlarge(x86) 16 32 EBS-only 10 4,750

Storage types

The following table summarizes the disk that we used for different components.

Service type Storage Size (GB) IOPS Throughput (MiB/s) Instances
TiKV EBS gp3 1,000 16,000 600 3
PD EBS gp2 50 150 128 1
TPC-C/sysbench EBS gp3 1,500 16,000 1,000 1

Topology

We used one c5.4xlarge EC2 instance as a benchmark node where TPC-C and sysbench are deployed.

Each EKS cluster consists of seven worker nodes and one admin (control plane) node. Seven worker nodes serve as dedicated TiDB, TiKV, and PD nodes.

To better understand the role of each component and architecture diagram, please refer to AWS Startups Blog.

Cluster/processor Service type EC2 type Instances
Cluster 1: Graviton2 Arm TiDB c6g.2xlarge 3
TiKV r6g.2xlarge 3
PD c6g.large 1
Admin node c6g.large 1
Cluster 2: Intel Xeon Platinum 8000 series TiDB c5.2xlarge 3
TiKV r5.2xlarge 3
PD c5.large 1
Admin node c5.large 1

Software version

The TiDB cluster software versions and the sysbench tool version are listed below.

Service type Software version
TiDB v4.0.10
TiKV v4.0.10
PD v4.0.10
TPC-C (embedded with TiUP) v1.0.8
sysbench v1.0.20

Cost

In the following cost examples:

  • Cost calculations are based on the on-demand rate for instances in the US West (Oregon) and Asia Pacific (Singapore) regions in US dollars (USD) per month.
  • Monthly calculations are based on 730 hours of usage per month.

Storage

In the following table, the total cost per month includes a daily snapshot.

Volume type Size (GB) IOPS Throughput (MiB/s) Instances Monthly cost (US) Monthly cost (APAC)
gp3 1,000

16,000

600

3

648.27

746.58

gp2 50

100

128

1

9.75

10.75

EC2

We used the following Graviton2 Arm processor based instances. Note that unit prices are based on the rate in both the US region and the APAC region.

Service type EC2 type Instances US region APAC region
Unit price (USD/hr) Monthly cost Unit price (USD/hr) Monthly cost
TiDB c6g.2xlarge

3

0.272

595.68

0.3136

686.78

TiKV r6g.2xlarge

3

0.4032

883.01

0.4864

1,065.21

PD c6g.large

1

0.068

49.64

0.0784

57.23

Control plane c6g.large

1

0.068

49.64

0.0784

57.23

Total

1,577.97

1,866.46

Here are the configurations and costs of our Intel Xeon Platinum 8000 series processors based instances:

Service type EC2 type Instances US region APAC region
Unit price (USD/hr) Monthly cost Unit price (USD/hr) Monthly cost
TiDB c5.2xlarge

3

0.34

744.60

0.392

858.48

TiKV r5.2xlarge

3

0.504

1,103.76

0.608

1,331.52

PD c5.large

1

0.085

62.05

0.098

71.54

Control plane c5.large

1

0.085

62.05

0.098

71.54

Total

1,972.46

2,333.08

Total cost

The following tables summarize the total costs per month in the US West (Oregon) and Asian Pacific (Singapore) regions. All costs are in US dollars per month.

AWS US West (Oregon)

CPU type EC2 cost Storage cost EKS cost Total cost
Arm

1,577.97

740.86

73.00

2,391.83

x86

1,972.46

740.86

73.00

2,786.32

AWS Asia Pacific (Singapore)

CPU type EC2 cost Storage cost EKS cost Total cost
Arm

1,866.46

856.71

73.00

2,796.17

x86

2,333.08

856.71

73.00

3,262.79

Preparation

To deploy a TiDB cluster on an x86-based EKS cluster, follow the steps in Deploy TiDB on AWS EKS.

To deploy a TiDB cluster on an Arm-based EKS cluster, follow the steps in TiDB Deployment on Graviton2-based EKS. The following temporary Arm images are used for benchmarking:

  • pingcap2021/tidb-operator:v1.1.11
  • pingcap2021/pd:v4.0.10
  • pingcap2021/tikv:v4.0.10
  • pingcap2021/tidb:v4.0.10
  • pingcap2021/tidb-monitor-initializer:v4.0.10

Above images are temporary and not meant for a production environment. Stay tuned for the official Arm images.

TPC-C benchmark

As you review the following benchmark tests, keep in mind that these are preliminary results. They should not be considered official TPC-C results.

To facilitate benchmarking, TiUP has integrated the bench component, which provides two workloads for stress testing: TPC-C and TPC-H. The commands and flags are as follows:

tiup bench
Starting component `bench`: /Users/joshua/.tiup/components/bench/v0.0.1/bench
Benchmark database with different workloads

Usage:
  tiup bench [command]

Available Commands:
  help        Help about any command
  tpcc
  tpch

Flags:
      --count int           Total execution count, 0 means infinite
  -D, --db string           Database name (default "test")
  -d, --driver string       Database driver: mysql
      --dropdata            Cleanup data before prepare
  -h, --help                help for /Users/joshua/.tiup/components/bench/v0.0.1/bench
  -H, --host string         Database host (default "127.0.0.1")
      --ignore-error        Ignore error when running workload
      --interval duration   Output interval time (default 10s)
      --isolation int       Isolation Level 0: Default, 1: ReadUncommitted,
                            2: ReadCommitted, 3: WriteCommitted, 4: RepeatableRead,
                            5: Snapshot, 6: Serializable, 7: Linerizable
      --max-procs int       runtime.GOMAXPROCS
  -p, --password string     Database password
  -P, --port int            Database port (default 4000)
      --pprof string        Address of pprof endpoint
      --silence             Do not print error when running workload
      --summary             Print summary TPM only, or also print current TPM when running workload
  -T, --threads int         Thread concurrency (default 16)
      --time duration       Total execution time (default 2562047h47m16.854775807s)
  -U, --user string         Database user (default "root")

For TPC-C, the TiUP bench component supports the following commands and flags to run the test:

tiup bench tpcc
Available Commands:
  check       Check data consistency for the workload
  cleanup     Cleanup data for the workload
  prepare     Prepare data for the workload
  run         Run workload

Flags:
      --check-all        Run all consistency checks
  -h, --help             help for tpcc
      --output string    Output directory for generating csv file when preparing data
      --parts int        Number to partition warehouses (default 1)
      --tables string    Specified tables for generating file, separated by ','. Valid only if output is set. If this flag is not set, generate all tables by default.
      --warehouses int   Number of warehouses (default 10)

TPC-C workloads

This table summarizes the workloads we used, both in terms of the number of warehouses and the data sizes.

Workload Warehouses Data size
Lagre1 5000 ~500 GB
Large2 10000 ~1 TB

TPC-C test procedures

  1. On the benchmark VM (c5.4xlarge), deploy the latest version of TiUP.
  2. Create warehouses. You should specify the hostname as the load balancer’s DNS name since we deployed the TiDB in EKS and exposed the database service as the LoadBalancer type:
    tiup bench tpcc --warehouses 10000 --host xxxxxxxxxxx.elb.us-west-2.amazonaws.com prepare
    
  3. Run the TPC-C test for different threads. (We used 150, 300, 500, 800, and 1000). Each test runs for 30 minutes.
    tiup bench tpcc --warehouses 10000 --host xxxxxxxxxxx.elb.us-west-2.amazonaws.com --threads 150 --time 10m run
    
  4. Note the tpmC result for each test case. The following is sample output:
    Finished
    [Summary] DELIVERY - Takes(s): 1796.7, Count: 88527, TPM: 2956.3, Sum(ms): 242093840, Avg(ms): 2734, 90th(ms): 4000, 99th(ms): 8000, 99.9th(ms): 8000
    [Summary] DELIVERY_ERR - Takes(s): 1796.7, Count: 133, TPM: 4.4, Sum(ms): 206560, Avg(ms): 1553, 90th(ms): 4000, 99th(ms): 4000, 99.9th(ms): 8000
    [Summary] NEW_ORDER - Takes(s): 1798.8, Count: 1002915, TPM: 33453.0, Sum(ms): 916326214, Avg(ms): 913, 90th(ms): 1500, 99th(ms): 2000, 99.9th(ms): 4000
    [Summary] NEW_ORDER_ERR - Takes(s): 1798.8, Count: 319, TPM: 10.6, Sum(ms): 118662, Avg(ms): 371, 90th(ms): 1000, 99th(ms): 1500, 99.9th(ms): 1500
    [Summary] ORDER_STATUS - Takes(s): 1798.9, Count: 89022, TPM: 2969.3, Sum(ms): 4346202, Avg(ms): 48, 90th(ms): 80, 99th(ms): 160, 99.9th(ms): 512
    [Summary] ORDER_STATUS_ERR - Takes(s): 1798.9, Count: 1, TPM: 0.0, Sum(ms): 19, Avg(ms): 19, 90th(ms): 20, 99th(ms): 20, 99.9th(ms): 20
    [Summary] PAYMENT - Takes(s): 1798.9, Count: 956516, TPM: 31903.7, Sum(ms): 628421123, Avg(ms): 656, 90th(ms): 1000, 99th(ms): 1500, 99.9th(ms): 2000
    [Summary] PAYMENT_ERR - Takes(s): 1798.9, Count: 201, TPM: 6.7, Sum(ms): 46899, Avg(ms): 233, 90th(ms): 512, 99th(ms): 1000, 99.9th(ms): 1000
    [Summary] STOCK_LEVEL - Takes(s): 1798.9, Count: 89370, TPM: 2980.8, Sum(ms): 6052088, Avg(ms): 67, 90th(ms): 112, 99th(ms): 256, 99.9th(ms): 512
    [Summary] STOCK_LEVEL_ERR - Takes(s): 1798.9, Count: 3, TPM: 0.1, Sum(ms): 342, Avg(ms): 114, 90th(ms): 192, 99th(ms): 192, 99.9th(ms): 192
    tpmC: 33453.0
    
  5. To proceed with the next workload, change the number of warehouses.
    tiup bench tpcc --warehouses 5000 --host xxxxxxxxxxx.elb.us-west-2.amazonaws.com --threads 150 --time 10m run
    
  6. Beginning at step 2, repeat this procedure for different numbers of warehouses.

TPC-C benchmark results

The following table shows the results for a large1 workload:

tpmC 150 300 500 800 1000
x86

30603.8

33270.4

36813.8

36388.6

34956.3

Arm

34114.3

36715.2

42493.1

41275.0

41426.9

(Arm-x86)/x86

11.47%

10.35%

15.43%

13.43%

18.51%

TPC-C Arm vs. x86 on EKS for a large1 workload
TPC-C Arm vs. x86 on EKS for a large1 workload

The following table shows the results for a large2 workload:

tpmC 150 300 500 800 1,000
x86

28624.3

33464.3

31478.6

33892.9

31562.9

Arm

29858.5

35259.4

34088.0

35899.8

33453.0

(Arm-x86)/x86

4.31%

5.36%

8.29%

5.92%

5.99%

TPC-C Arm vs. x86 on EKS for a large2 workload
TPC-C Arm vs. x86 on EKS for a large2 workload

TPC-C price-performance ratio

In the following price-performance table:

  • The tpmC values are derived from the average tpmC among 300, 500, and 800 threads.
  • The total system costs for the US and APAC regions reflect the estimated five year hardware cost.
  • The price-performance ratios compare x86 and Graviton2 Arm processors. A lower cost for more performance is better.
  • All costs are in US dollars.
Workload tpmC US region APAC region
Total system cost Price-performance Total system cost Price-performance
x86 Large1

35,490.93

162,208.80

4.57

189,804.60

5.35

Arm Large1

40,161.10

138,539.40

3.45

161,807.40

4.03

(Arm-x86)/x86

-24.52%

-24.66%

x86 Large2 32,945.27

162,208.80

4.92

189,804.60

5.76

Arm Large2 35,082.40

138,539.40

3.95

161,807.40

4.61

(Arm-x86)/x86

-19.79%

-19.94%

As the following graphic shows, when we compare the absolute tpmC performance under each workload, the Arm-based system outperforms the x86-based system by 5%-15%. After we factor in the hardware cost, the Arm price-performance ratio is up to 25% lower than x86.

TPC-C price-performance ratio
TPC-C price-performance ratio

Sysbench

We will be using oltp_read_write.lua to test the performance for the OLTP workload.

Sysbench workloads

  • Read (75%) and Write (25%)
  • Table: 16
  • Table size: 10 M rows per table
  • Data size: around 100 GB

Sysbench test procedures

  1. Deploy the latest version of sysbench on the benchmark VM (c5.4xlarge).
  2. Set the sysbench configurations and save them in the configuration file. The following is a sample configuration file:
    mysql-host={TIDB_HOST}
    mysql-port=4000
    mysql-user=root
    mysql-password=password
    mysql-db=sbtest
    time=600
    threads=8 # set to 8 while importing the data
    report-interval=10
    db-driver=mysql
    
  3. Before you import the data, it is necessary to make some settings to TiDB. Execute the following command in MySQL client:
    set global tidb_disable_txn_auto_retry = off;
    
  4. Use BR to import the prepared data in S3.
  5. Change the thread in config file to 300.
  6. Run the sysbench test:
    sysbench --config-file=config oltp_read_write --tables=16 --table-size=10000000 run
    
  7. Note the test results. The following are sample test results:
    SQL statistics:
        queries performed:
            read:                            10457930
            write:                           2966386
            other:                           1515584
            total:                           14939900
        transactions:                        746995 (2489.02 per sec.)
        queries:                             14939900 (49780.45 per sec.)
        ignored errors:                      0      (0.00 per sec.)
        reconnects:                          0      (0.00 per sec.)
    
    General statistics:
        total time:                          300.1146s
        total number of events:              746995
    
    Latency (ms):
            min:                                   43.84
            avg:                                  120.50
            max:                                  381.92
            95th percentile:                      153.02
            sum:                             90013122.40
    
    Threads fairness:
        events (avg/stddev):           2489.9833/235.76
        execution time (avg/stddev):   300.0437/0.03
    
  8. Beginning at step 5, repeat this procedure and set the thread to 600 and 900.

Sysbench benchmark results

Results for 300, 600, and 900 threads are listed below. Since the QPS and TPS in sysbench are proportional, we will only compare the TPS in our test.

Metrics 300 600 900
x86 P95 latency (ms)

155.80

282.25

427.07

Arm P95 latency (ms)

147.61

267.41

383.33

x86 QPS

48814.94

52342.03

52413.17

Arm QPS

51892.58

55886.22

57465.90

x86 TPS

2440.75

2617.10

2620.66

Arm TPS

2594.63

2794.31

2873.30

For TPS: (Arm-x86)/x86

6.30%

6.77%

9.64%

Sysbench Arm vs. x86 on EKS
Sysbench Arm vs. x86 on EKS

Sysbench price-performance ratio

In the following price-performance table:

  • TPS values are derived from the average TPS among 300, 600, and 900 threads.
  • The total system cost reflects the estimated five year hardware cost. All costs are in US dollars.
  • Price-performance compares x86 and Graviton2 Arm processors. A lower value is better. That indicates a lower cost for more performance.
TPS US region APAC region
Total system cost Price-performance Total system cost Price-performance
x86

2,559.50

162,208.80

63.38

189,804.60

74.16

Arm

2,754.08

138,539.40

50.30

161,807.40

58.75

(Arm-x86)/x86

-20.64%

-20.77%

Conclusion

Benchmarking results from both TPC-C and sysbench have shown that the Graviton2 processor has better performance compared to the x86 processor. In some cases, Graviton2 outperforms x86 by up to 18%. After factoring in the hardware cost, the Graviton2 processor has a better price-performance ratio than the x86—on average 20% lower. The results are only based on the workloads (100 GB, 500 GB, 1 TB) we tested. For limitations and future plans on the benchmark, please refer to AWS Startups Blog.


Book a Demo


Have questions? Let us know how we can help.

Contact Us
TiDB Dedicated

TiDB Dedicated

A fully-managed cloud DBaaS for predictable workloads

TiDB Dedicated

TiDB Serverless

A fully-managed cloud DBaaS for auto-scaling workloads