How to Use SQL EXPLAIN for Better Query Performance

Understanding SQL EXPLAIN

What is SQL EXPLAIN?

SQL EXPLAIN is a powerful tool used in SQL databases to analyze and understand the execution plan of a query without actually executing it. When you prepend an SQL statement with the keyword EXPLAIN, the database returns a detailed description of its execution strategy instead of the actual results. This helps database administrators and developers optimize and troubleshoot SQL queries effectively. For instance, in the context of TiDB, a distributed SQL database, the EXPLAIN statement can be invaluable in pinpointing which operations are most resource-intensive and how data flows through the nodes.

How SQL EXPLAIN Works

SQL EXPLAIN works by breaking down the query into its operational components and displaying how the database intends to execute them. It considers various factors, such as table joins, indices, and filtering criteria, and presents this plan in a tabular format. This plan is composed of several key attributes:

id: Unique identifier for each operation.
estRows: Estimated number of rows the operation will process.
task: Specifies where the operation will be executed (e.g., root, cop[tikv], cop[tiflash]).
access object: Displays the table, partition, or index that will be accessed.
operator info: Additional details about each operator.

For example, in TiDB, the output might show which tasks are performed on the TiDB server and which are distributed across the TiKV or TiFlash nodes, providing insights into the parallelism of query execution.

Benefits of Using SQL EXPLAIN

Query Optimization

The primary benefit of using SQL EXPLAIN is query optimization. By understanding how a query is executed, you can identify bottlenecks and inefficiencies. For example, seeing a full table scan instead of an index scan might indicate that an index is missing or not being used effectively. In such cases, adding the appropriate index can significantly improve query performance.

Performance Insights

SQL EXPLAIN provides profound insights into query performance by revealing the inner workings of query execution. This allows you to:

Identify Slow Operations: Recognize which parts of your query require the most time and resources.
Optimize Resource Utilization: Ensure that the database uses indexes, joins, and aggregations efficiently.
Benchmarking and Monitoring: Compare execution plans over time to monitor the impact of changes in your database or application.

Using SQL EXPLAIN for Optimization

Steps to Use SQL EXPLAIN

1.Writing the EXPLAIN Statement

Writing an EXPLAIN statement involves simply prepending your SQL query with the keyword EXPLAIN. For example:

EXPLAIN SELECT * FROM t WHERE a = 1;

The statement does not execute the query but returns the execution plan, which provides valuable insights.

2.Interpreting the Output

Interpreting the output of an EXPLAIN statement requires understanding the key attributes. Let’s break down a basic example in TiDB:

EXPLAIN SELECT * FROM t WHERE a = 1;

+-------------------------------+---------+-----------+---------------------+---------------------------------------------+
| id                            | estRows | task      | access object       | operator info                               |
+-------------------------------+---------+-----------+---------------------+---------------------------------------------+
| IndexLookUp_10                | 10.00   | root      |                     |                                             |
| ├─IndexRangeScan_8(Build)     | 10.00   | cop[tikv] | table:t, index:a(a) | range:[1,1], keep order:false, stats:pseudo |
| └─TableRowIDScan_9(Probe)     | 10.00   | cop[tikv] | table:t             | keep order:false, stats:pseudo              |
+-------------------------------+---------+-----------+---------------------+---------------------------------------------+

IndexLookUp_10: Indicates an Index Lookup operation.
estRows: Estimated number of rows (10 in this case).
task: Indicates operations executed by the TiDB server (root) and TiKV nodes (cop[tikv]).
access object: Table and index used.
operator info: Additional details (e.g., index range, order preservation).

Practical Examples

Example with SELECT Statement

Suppose you want to select trips from a bike-sharing database taken on July 1, 2017:

EXPLAIN SELECT count(*) FROM trips WHERE start_date BETWEEN '2017-07-01 00:00:00' AND '2017-07-01 23:59:59';

+------------------------------+----------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------+
| id                           | estRows  | task      | access object | operator info                                                                                                          |
+------------------------------+----------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------+
| StreamAgg_20                 | 1.00     | root      |               | funcs:count(Column#13)->Column#11                                                                                      |
| └─TableReader_21             | 1.00     | root      |               | data:StreamAgg_9                                                                                                       |
|   └─StreamAgg_9              | 1.00     | cop[tikv] |               | funcs:count(1)->Column#13                                                                                              |
|     └─Selection_19           | 250.00   | cop[tikv] |               | ge(bikeshare.trips.start_date, 2017-07-01 00:00:00.000000), le(bikeshare.trips.start_date, 2017-07-01 23:59:59.000000) |
|       └─TableFullScan_18     | 10000.00 | cop[tikv] | table:trips   | keep order:false, stats:pseudo                                                                                         |
+------------------------------+----------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------+

Selection_19: Filters trips by the specified date range.
StreamAgg_20: Aggregates results to count the rows.

This output helps identify that the query can be optimized by implementing appropriate indexing strategies.

Example with JOIN Operations

Consider a more complex scenario involving JOINs:

EXPLAIN SELECT COUNT(*) FROM t1 a JOIN t1 b ON a.id = b.id;

+----------------------------------------+---------+--------------+---------------+------------------------------------------------+
| id                                     | estRows | task         | access object | operator info                                  |
+----------------------------------------+---------+--------------+---------------+------------------------------------------------+
| StreamAgg_15                           | 1.00    | root         |               | funcs:count(1)->Column#7                       |
| └─TableReader_47                       | 9.00    | root         |               | data:ExchangeSender_46                         |
|   └─ExchangeSender_46                  | 9.00    | cop[tiflash] |               | ExchangeType: PassThrough                      |
|     └─HashJoin_43                      | 9.00    | cop[tiflash] |               | inner join, equal:[eq(test.t1.id, test.t1.id)] |
|       ├─ExchangeReceiver_20(Build)     | 6.00    | cop[tiflash] |               |                                                |
|       │ └─ExchangeSender_19            | 6.00    | cop[tiflash] |               | ExchangeType: Broadcast                        |
|       │   └─Selection_18               | 6.00    | cop[tiflash] |               | not(isnull(test.t1.id))                        |
|       │     └─TableFullScan_17         | 6.00    | cop[tiflash] | table:a       | keep order:false                               |
|       └─Selection_22(Probe)            | 6.00    | cop[tiflash] |               | not(isnull(test.t1.id))                        |
|         └─TableFullScan_21             | 6.00    | cop[tiflash] | table:b       | keep order:false                               |
+----------------------------------------+---------+--------------+---------------+------------------------------------------------+

Here:

HashJoin_43: Indicates that a hash join operation is used.
ExchangeSender_46: Sends data between nodes using the PassThrough exchange type.
Selection_18 and Selection_22: Filter out NULL values in the join condition.

By analyzing this output, you can determine whether the join strategy used is optimal and whether other approaches, such as index joins or nested loops, might be more efficient.

Advanced Tips and Best Practices

Common Pitfalls

Misinterpretation of Results

One common pitfall is misinterpreting the results of an EXPLAIN output. It’s crucial to understand that the presence of an index lookup or hash join does not guarantee optimal performance in every context. Always consider the specific data distribution and query patterns in your database.

Ignoring Execution Environment

Another common mistake is ignoring the execution environment. The performance of a query can vary based on the database’s hardware, network latency, and concurrent workloads. Therefore, it’s essential to consider the broader execution context when interpreting EXPLAIN results.

Best Practices

Regular Use of EXPLAIN

Regular use of EXPLAIN is a best practice for maintaining database performance. By incorporating EXPLAIN into your development workflow, you can proactively identify and address performance issues before they impact your application.

Combining with Other Tools

EXPLAIN should be used in conjunction with other diagnostic tools. For example, combining EXPLAIN with EXPLAIN ANALYZE in TiDB provides runtime information alongside the execution plan, offering a complete picture of query performance:

EXPLAIN ANALYZE SELECT * FROM t1;

This command executes the query and provides actual runtime statistics, allowing you to compare expected performance with actual results.

Conclusion

Using SQL EXPLAIN effectively can significantly enhance your ability to optimize and troubleshoot SQL queries. By understanding how SQL EXPLAIN works, interpreting its output correctly, and incorporating best practices, you can ensure that your database queries run efficiently and reliably. In the context of distributed SQL databases like TiDB, leveraging SQL EXPLAIN is crucial for maintaining optimal performance and managing resources across the cluster. Regularly analyzing and tuning your queries using EXPLAIN will lead to a more responsive and robust database environment.

Last updated June 13, 2024