HTAP Summit 2024 session replays are now live!Access Session Replays

TiDB, a MySQL-compatible database, has introduced a powerful feature for handling high-dimensional data: Vector Search Indexes. This post will explore how TiDB implements these indexes using the Hierarchical Navigable Small World (HNSW) method, and how they can be utilized for efficient nearest neighbor searches.

What are Vector Search Indexes?

Vector Search Indexes are designed to facilitate efficient approximate nearest neighbor (ANN) searches in a vector space. This is particularly useful for applications involving high-dimensional data like image recognition, recommendation systems, and natural language processing. TiDB’s implementation allows such queries to be completed in milliseconds, vastly improving performance over traditional brute force methods.

Join the waitlist for the private beta of built-in vector search in TiDB Serverless.

Join Now

Creating a HNSW Vector Index in TiDB

TiDB supports the creation of HNSW Vector Indexes using the following SQL syntax:

CREATE TABLE vector_table_with_index (
    id INT PRIMARY KEY,
    doc TEXT,
    embedding VECTOR(3) COMMENT "hnsw(distance=cosine)"
);

Note: The syntax for creating the HNSW Index may change in future releases. It is crucial to specify the distance metric (e.g., cosine or L2) when creating the vector index.

Limitations and Compatibility

Currently, TiDB only supports creating vector indexes with L2 and cosine distances during the table creation. The ability to add or drop vector indexes using DDL commands post-creation is not available yet but is planned for future updates.

Utilizing Vector Indexes

Vector Indexes can be used in SQL queries to perform k-nearest neighbor searches. Here’s an example:

SELECT *
FROM vector_table_with_index
ORDER BY Vec_Cosine_Distance(embedding, '[1, 2, 3]')
LIMIT 10;

It’s important to use the same distance metric defined when creating the index to leverage its benefits fully.

Integration with ORMs

TiDB provides support for various Python ORMs, enabling easier integration into applications:

Performance Analysis

To analyze the performance and ensure the Vector Index is being used, you can use the EXPLAIN or EXPLAIN ANALYZE statements in TiDB:

EXPLAIN SELECT * FROM vector_table_with_index
ORDER BY Vec_Cosine_Distance(embedding, '[1, 2, 3]')
LIMIT 10;

Best Practices

To ensure optimal performance, especially when indexes are “cold” (not recently accessed), it’s recommended to “warm up” the index by running similar queries beforehand. Additionally, managing the data set size by using fewer dimensions or compression techniques can help maintain high performance.

Conclusion

Vector Search Indexes in TiDB offer a robust solution for efficiently handling complex queries involving high-dimensional data. By leveraging these indexes, developers can significantly enhance the performance of their applications, making real-time data interaction more feasible.

Real Demos of TiDB Vector Search


Last updated June 3, 2024

Spin up a Serverless database with 25GiB free resources.

Start Right Away