HTAP Summit 2024 session replays are now live!Access Session Replays
query data from Tidb

Authors: Zhiyuan Liang, Qizhi Wang (Senior Software Engineers at PingCAP)
Editors: Fendy Feng, Tom Dewan

The database is a crucial component for building applications. It stores and analyzes the huge amounts of data coming in from outside tools or generated when people use the application. By retrieving data that fits certain criteria, queries can help you make the best use of that data. 

TiDB is an open source, distributed, and highly-scalable NewSQL database. It has an established record in supporting many data-intensive applications. In this tutorial, we’ll use TiDB as an example and show you how to query data from a relational SQL database with SQL or Java. 

Before you get started

Before getting started to query data, you need to build a TiDB cluster, import data into this cluster, and then connect to TiDB. 

  1. Build a TiDB cluster. You can either create a local TiDB cluster or a TiDB Cloud free cluster, which is a fully-managed Database-as-a-Service (DBaaS).
  2. Import data into TiDB. We’ll use the sample data from the Bookshop application, a popular virtual bookstore. If you’ve built a local TiDB cluster, use TiUP to import this data; if you’ve built a TiDB Cloud cluster, just click the Import button on the TiDB Cloud page and follow the instructions that followed to finish the data importing process. For a detailed guide, see Import Table Structures and Data from Bookshop to TiDB.
  3. Connect to TiDB. TiDB is fully MySQL compatible, so you can choose any SQL client drivers or object relational mapping frameworks to connect to TiDB. For a detailed guide on TiDB connection, see Connect to TiDB.

After you’ve fully prepared, you can start to make queries. In the following sections, we’ll elaborate on how to query data from TiDB via SQL or Java. 

Execute a simple query

The Bookshop application dataset includes an authors table, which stores basic information about their writers. To query author information from TiDB, use the SELECT ... FROM … SQL statement.

For example, by executing the following SQL statement in a MySQL client, you can get the id and name of all the authors stored in TiDB.

SELECT id, name FROM authors;

The results TiDB returns are: 

+------------+--------------------------+
| id         | name                     |
+------------+--------------------------+
|       6357 | Adelle Bosco             |
|     345397 | Chanelle Koepp           |
|     807584 | Clementina Ryan          |
|     839921 | Gage Huel                |
|     850070 | Ray Armstrong            |
|     850362 | Ford Waelchi             |
|     881210 | Jayme Gutkowski          |
|    1165261 | Allison Kuvalis          |
|    1282036 | Adela Funk               |
...
| 4294957408 | Lyla Nitzsche            |
+------------+--------------------------+
20000 rows in set (0.05 sec)

In Java, to get the same results:

  1. You should first define how to store the author’s basic information by declaring an Author class.
  2. Select the appropriate data type from the Java language according to the data types and value range TiDB supports to store the corresponding data. For example, you can use a variable of:
    • Type Int to store data of type int.
    • Type Long to store data of type bigint.
    • Type Short to store data of type tinyint.
    • Type String to store data of type varchar.
public class Author {
    private Long id;
    private String name;
    private Short gender;
    private Short birthYear;
    private Short deathYear;
    public Author() {}
     // Skip the getters and setters.
}

3.After connecting to TiDB using the JDBC driver, create a Statement object with conn.createStatus().

4.Call stmt.executeQuery("query_sql") to initiate a database query request to TiDB.

5.The query results will be stored in a ResultSet object. With theResultSet traversed, the returned results can be mapped to the Author object.

public class AuthorDAO {
    // Omit initialization of instance variables...
    public List<Author> getAuthors() throws SQLException {
        List<Author> authors = new ArrayList<>();
        try (Connection conn = ds.getConnection()) {
            Statement stmt = conn.createStatement();
            ResultSet rs = stmt.executeQuery("SELECT id, name FROM authors");
            while (rs.next()) {
                Author author = new Author();
                author.setId( rs.getLong("id"));
                author.setName(rs.getString("name"));
                authors.add(author);
            }
        }
        return authors;
    }
}

Filter results

To filter query results, use the WHERE statement. 

For example, by executing the following SQL statement, you can filter the authors born in 1998.

SELECT * FROM authors WHERE birth_year = 1998;

In Java, you can use the same SQL statement to handle data query requests with dynamic parameters. To deal with such queries, we recommend that you use a prepared statement as shown below instead of a normal statement.

public List<Author> getAuthorsByBirthYear(Short birthYear) throws SQLException {
    List<Author> authors = new ArrayList<>();
    try (Connection conn = ds.getConnection()) {
        PreparedStatement stmt = conn.prepareStatement("""
        SELECT * FROM authors WHERE birth_year = ?;
        """);
        stmt.setShort(1, birthYear);
        ResultSet rs = stmt.executeQuery();
        while (rs.next()) {
            Author author = new Author();
            author.setId( rs.getLong("id"));
            author.setName(rs.getString("name"));
            authors.add(author);
        }
    }
    return authors;
}

Sort results

To sort results from tables stored in TiDB, you can use the ORDER BY statement. 

For example, by executing the SQL statement below, you can sort all authors in the authors table in descending order (DESC) according to their birth year. 

SELECT id, name, birth_year
FROM authors
ORDER BY birth_year DESC;

You can also use the ORDER BY statement in Java:

public List<Author> getAuthorsSortByBirthYear() throws SQLException {
    List<Author> authors = new ArrayList<>();
    try (Connection conn = ds.getConnection()) {
        Statement stmt = conn.createStatement();
        ResultSet rs = stmt.executeQuery("""
            SELECT id, name, birth_year
            FROM authors
            ORDER BY birth_year DESC;
            """);

        while (rs.next()) {
            Author author = new Author();
            author.setId(rs.getLong("id"));
            author.setName(rs.getString("name"));
            authors.add(author);
        }
    }
    return authors;
}

In both cases, TiDB returns:

+-----------+------------------------+------------+
| id        | name                   | birth_year |
+-----------+------------------------+------------+
| 83420726  | Terrance Dach          | 2000       |
| 57938667  | Margarita Christiansen | 2000       |
| 77441404  | Otto Dibbert           | 2000       |
| 61338414  | Danial Cormier         | 2000       |
| 49680887  | Alivia Lemke           | 2000       |
| 45460101  | Itzel Cummings         | 2000       |
| 38009380  | Percy Hodkiewicz       | 2000       |
| 12943560  | Hulda Hackett          | 2000       |
| 1294029   | Stanford Herman        | 2000       |
| 111453184 | Jeffrey Brekke         | 2000       |
...
300000 rows in set (0.23 sec)

Limit the number of query results

To limit the number of query results, you can use the LIMIT statement. 

For example, to get the id and name of the 10 youngest authors and display them in descending order of their birth year, execute the following SQL statement:

SELECT id, name, birth_year
FROM authors
ORDER BY birth_year DESC
LIMIT 10;

You can also use the LIMIT statement in Java:

public List<Author> getAuthorsWithLimit(Integer limit) throws SQLException {
    List<Author> authors = new ArrayList<>();
    try (Connection conn = ds.getConnection()) {
        PreparedStatement stmt = conn.prepareStatement("""
            SELECT id, name, birth_year
            FROM authors
            ORDER BY birth_year DESC
            LIMIT ?;
            """);
        stmt.setInt(1, limit);
        ResultSet rs = stmt.executeQuery();
        while (rs.next()) {
            Author author = new Author();
            author.setId(rs.getLong("id"));
            author.setName(rs.getString("name"));
            authors.add(author);
        }
    }
    return authors;
}

In both cases, TiDB returns: 

+-----------+------------------------+------------+
| id        | name                   | birth_year |
+-----------+------------------------+------------+
| 83420726  | Terrance Dach          | 2000       |
| 57938667  | Margarita Christiansen | 2000       |
| 77441404  | Otto Dibbert           | 2000       |
| 61338414  | Danial Cormier         | 2000       |
| 49680887  | Alivia Lemke           | 2000       |
| 45460101  | Itzel Cummings         | 2000       |
| 38009380  | Percy Hodkiewicz       | 2000       |
| 12943560  | Hulda Hackett          | 2000       |
| 1294029   | Stanford Herman        | 2000       |
| 111453184 | Jeffrey Brekke         | 2000       |
+-----------+------------------------+------------+
10 rows in set (0.11 sec)

Using the LIMIT statement cut the query time by more than 50%—from 0.23 s to 0.11 s. 

Aggregate queries

To better understand the overall data situation, you can combine the GROUP BY statement with an aggregate function like the COUNT function to aggregate query results.

For example, to know how many authors were born in each year and which year has the most authors, execute the following statement:

SELECT birth_year, COUNT (DISTINCT id) AS author_count
FROM authors
GROUP BY birth_year
ORDER BY author_count DESC;

You can also use the Java Group BY statement and COUNT function:

public class AuthorCount {
    private Short birthYear;
    private Integer authorCount;

    public AuthorCount() {}

     // Skip the getters and setters.
}

public List<AuthorCount> getAuthorCountsByBirthYear() throws SQLException {
    List<AuthorCount> authorCounts = new ArrayList<>();
    try (Connection conn = ds.getConnection()) {
        Statement stmt = conn.createStatement();
        ResultSet rs = stmt.executeQuery("""
            SELECT birth_year, COUNT(DISTINCT id) AS author_count
            FROM authors
            GROUP BY birth_year
            ORDER BY author_count DESC;
            """);

        while (rs.next()) {
            AuthorCount authorCount = new AuthorCount();
            authorCount.setBirthYear(rs.getShort("birth_year"));
            authorCount.setAuthorCount(rs.getInt("author_count"));
            authorCounts.add(authorCount);
        }
    }
    return authorCount;
}

In both cases, TiDB returns: 

+------------+--------------+
| birth_year | author_count |
+------------+--------------+
|       1932 |          317 |
|       1947 |          290 |
|       1939 |          282 |
|       1935 |          289 |
|       1968 |          291 |
|       1962 |          261 |
|       1961 |          283 |
|       1986 |          289 |
|       1994 |          280 |
...
|       1972 |          306 |
+------------+--------------+
71 rows in set (0.00 sec)

In addition to the COUNT function, TiDB also supports other aggregate functions. For more information, see Aggregate (GROUP BY) Functions.

Wrapping Up

In this tutorial, you’ve learned how to query data from TiDB with SQL statements or Java commands. I hope you find this guide helpful. If you have any questions, contact us through Twitter or our Slack channel. You can also join TiDB Internals to share your thoughts and feedback.

If you want to know more developer guides related to TiDB or TiDB Cloud, see TiDB Developer Guide or TiDB Cloud Developer Guide

Keep reading:
Building a Web Application with Spring Boot and TiDB
Using Retool and TiDB Cloud to Build a Real-Time Kanban in 30 Minutes
Analytics on TiDB Cloud with Databricks


Book a Demo


Want to explore TiDB without installing any software? Go to TiDB Playground

Experience modern data infrastructure firsthand.

Try TiDB Serverless

Have questions? Let us know how we can help.

Contact Us

TiDB Cloud Dedicated

A fully-managed cloud DBaaS for predictable workloads

TiDB Cloud Serverless

A fully-managed cloud DBaaS for auto-scaling workloads