HTAP Summit 2024 session replays are now live!Access Session Replays
20221214-102726

TiDB Cloud gives developers the full power of TiDB, a distributed SQL database, in a fully-managed cloud instance. The TiDB Cloud Serverless Tier is a fully-managed, serverless HTAP database service. It enables developers to deploy their infrastructure at scale in the most cost-efficient way without managing server infrastructure. 

ProxySQL is a high-performance, open-source SQL proxy. It is fully compatible with MySQL and includes a number of features that make it an ideal choice for a wide range of applications.

This article will show you how to integrate TiDB Cloud with ProxySQL. You will also learn how to configure ProxySQL’s Query Caching feature and how it can benefit application developers.

Before you begin

Before you try the steps in this article, make sure you have installed the following components and have basic understanding of them:

Additionally, the code snippets shown throughout this blog post follow macOS standards. However, if you are using a different platform, you might want to consider a similar syntax.

Getting started

After you have downloaded and installed the prerequisites, you may proceed with the instructions as given below.

Create a TiDB Cloud Serverless Tier cluster

Create a free TiDB Serverless Tier cluster. Copy and save the cluster hostname, username, and port for later use. You can find this information on the Cluster page as shown below.

Prepare ProxySQL configuration files

  1. Launch a new terminal window. In our example, we call it TW1.
  2. Clone the integration example code repository of TiDB and ProxySQL and navigate to the caching-tidb-cloud-connect folder:
git clone https://github.com/pingcap-inc/tidb-proxysql-integration.git

cd tidb-proxysql-integration/example/caching-tidb-cloud-connect

3. Generate the ProxySQL configuration files:

python3 proxysql-config.py

When prompted, enter Endpoint for the Serverless Tier host,  User for the Serverless Tier username, and Root Password for the Serverless Tier password.

If the operation is successful, you’ll see the following:

[Begin] generating configuration files..
tidb-cloud-connect.cnf generated successfully.
proxysql-prepare.sql generated successfully.
proxysql-connect.py generated successfully.
[End] all files generated successfully and placed in the current folder.

There should be three new files in the current folder.  

Configure ProxySQL

  1. Pull the ProxySQL docker image and start a ProxySQL container:

docker compose up -d

2. Integrate TiDB Cloud Serverless Cluster with ProxySQL. The following command executes proxysql-prepare.sql inside the ProxySQL Admin interface:

docker compose exec proxysql sh -c "mysql -uadmin -padmin -h127.0.0.1 -P6032 < ./proxysql-prepare.sql"

The proxysql-prepare.sql script:

  •   Adds a user using the username and password of your cluster.
  •   Assigns the user to the monitoring account.
  •   Adds your TiDB Serverless Tier cluster to the list of hosts.
  •   Enables a secure connection between ProxySQL and TiDB Serverless.

For detailed information, review the proxysql-prepare.sql file or refer to the ProxySQL documentation.

 Your output should be similar to the following:

*************************** 1. row ***************************
hostgroup_id: 0
Hostname: xxxxx.xxxx.xxxxx.aws.tidbcloud.com
port: 4000
gtid_port: 0
status: ONLINE
weight: 1
compression: 0
max_connections: 1000
max_replication_lag: 0
use_ssl: 1
max_latency_ms: 0

You have now established a connection between ProxySQL and TiDB Cloud.

Connect to your TiDB Cloud cluster using ProxySQL

Execute the following Python script to launch the MySQL client and establish a connection:

python3 proxysql-connect.py

If you see the mysql> prompt, congratulations! You are ready to execute SQL commands. Let’s check it out by executing the following SQL query. It fetches and sorts the total number of github events by year:

SELECT COUNT(*) AS event_count, event_year  FROM gharchive_dev.github_events GROUP BY event_year ORDER BY event_year DESC;

The following are the query results:

+----------+------------+
event_count | event_year |
+----------+------------+
406019      |  2021      |
36654       |  2020      |
46468       |  2019      |
46159       |  2018      |
49821       |  2017      |
47640       |  2016      |
41908       |  2015      |
11419       |  2014      |
+----------+------------+
8 rows in set (0.21 sec)

Notes:

  • Leave TW1 open because we will use it later.
  • If you are unable to connect to the cluster, review the newly-generated ProxySQL configuration file files in the previous step. Make sure that the TiDB Cloud cluster information you provided is available and correct. 

Query Caching using ProxySQL

Now that we have the integration in place, let’s configure Query Caching in ProxySQL. Generally speaking, Query caching is the process of caching the results obtained from running a SQL statement. This approach boosts application performance and increases efficiency due to faster response time. 

Log into the ProxySQL Admin interface

Launch a new terminal window. (We’ll call ours TW2). Execute the following command:

mysql -u radmin -pradmin -h 127.0.0.1 -P16032 --prompt 'ProxySQL Admin>'

Review and extract query digest information

  1. Execute the “digest stats” query: 

SELECT count_star,sum_time,hostgroup,digest,digest_text FROM stats_mysql_query_digest_reset ORDER BY sum_time DESC\G;

When hostgroup is 0 it indicates that the query was executed against a TiDB cluster.

2. Copy the digest value 0xA153BB2585B37095 of the query that we executed in the previous Connect to your TiDB Cloud cluster using ProxySQL step. You will need this value while configuring the query rule in the next step. 

********************** 1. row ***************************************
count_star: 1
sum_time: 736148
hostgroup: 0
digest: 0xA153BB2585B37095
digest_text: SELECT COUNT(*) AS event_count,event_year FROM gharchive_dev.github_events GROUP BY event_year ORDER BY event_year DESC

Notes:

  • The table stats_mysql_query_digest_reset is shipped by default with ProxySQL. It stores the statistics of every query executed via ProxySQL.
  • The digest value will be different for different queries. For any query you want to cache, you will need the corresponding digest value. 

Configure a query rule

ProxySQL uses query rules as a configuration mechanism to identify and cache the inbound traffic as well as to specify the cache Time to Live (TTL). We will create a query rule record with a TTL value of 5,000 milliseconds. The ProxySQL hostgroup logically groups database clusters. In our case, we have only one TiDB Cloud cluster, so we’ll use a value of 0.

The following statement creates a single query rule record.

INSERT INTO mysql_query_rules(rule_id,active,digest,cache_ttl,destination_hostgroup,apply) VALUES (1,1,'0xA153BB2585B37095',5000,0,1);
load mysql query rules to runtime;
save mysql query rules to disk;
SELECT rule_id,active,digest,cache_ttl,destination_hostgroup,apply FROM mysql_query_rules\G;

Executing the above statement will create a single query rule record as shown below:  

********************** 1. row *************************************** 
rule_id: 1
active: 1
digest: 0xA153BB2585B37095
cache_ttl: 5000
destination_hostgroup: 0
apply: 1
1 row in set (0.00 sec)

Review caching

Now that you have the caching rules set up, it’s time to review the caching behavior.

  1. Execute the github events by year” query six times with a one-second interval in the first terminal window (TW1):

SELECT COUNT(*) AS event_count, event_year FROM gharchive_dev.github_events GROUP BY event_year ORDER BY event_year DESC;

2. Switch to the second terminal window (TW2), run the “digest stats query”:

SELECT count_star,sum_time,hostgroup,digest,digest_text FROM stats_mysql_query_digest_reset ORDER BY sum_time DESC\G;

3. Review and analyze the query digest statistics.

What you see below are two rows of statistical results.The first row indicates that the query was executed on TiDB Cluster (hostgroup : 0). It took nearly 0.21 seconds. Execution of the query happened only once.

The second row indicates that the query was executed locally (hostgroup:-1) and for the next five attempts (count_star:5) the time taken was 0 seconds. 

*************************************** 1. row ***************************************
count_star: 1
sum_time: 217789
hostgroup: 0
digest: 0xA153BB2585B37095
digest_text: SELECT COUNT(*) AS event_count,event_year FROM   gharchive_dev.github_events GROUP BY event_year ORDER BY event_year DESC
 
*************************************** 2. row ***************************************
count_star: 5
sum_time: 0
hostgroup: -1
digest: 0xA153BB2585B37095
digest_text: SELECT COUNT(*) AS event_count,event_year FROM  gharchive_dev.github_events GROUP BY event_year ORDER BY event_year DESC
2 rows in set (0.01 sec)

Summary

As you can see, query caching can improve the read latency significantly. In our example latency went from  0.21 seconds to 0 seconds. Many such latency improvements can significantly boost application performance, giving users a seamless experience.

To give you a visual context, we plotted several query execution times to showcase reduced latency with caching. T(ime)1, T7, and T13 are the only times when the query results were fetched from a remote TiDB Cluster; the remaining queries used locally-cached result sets.

Query caching vs No query caching

Although ProxySQL Caching is an awesome feature, it has limitations. We’ve listed some below. 

  • It is impossible to define query cache invalidation other than configuring Cache TTL.
  • Query Cache does not support Prepared Statements.

For a comprehensive set of limitations, see the ProxySQL documentation.

Conclusion

We hope you found this blog useful. If you want to learn more about TiDB Cloud and ProxySQL integration and get started, see our documentation. If you have any questions or feedback, feel free to contact us through Twitter, LinkedIn, or our Slack Channel.

Keep reading: 
Develop Apps Faster with the Integration of Vercel and TiDB Cloud
Announcing Partnership with HashiCorp to Automate Cloud Database Provisioning
Use TiDB Cloud as the Database for Your Java Application

Edited by: Tom Dewan, Calvin Weng


Book a Demo


Want to explore TiDB without installing any software? Go to TiDB Playground

Spin up a Serverless database with 25GiB free resources.

Start Right Away

Have questions? Let us know how we can help.

Contact Us

TiDB Cloud Dedicated

A fully-managed cloud DBaaS for predictable workloads

TiDB Cloud Serverless

A fully-managed cloud DBaaS for auto-scaling workloads