As the landscape of artificial intelligence evolves, leveraging large language models (LLMs) for semantic search and AI-driven applications has become increasingly prevalent. One critical component of this new wave is vector search, which allows for the retrieval of data based on its semantic meaning rather than simple keyword matching. This post explores how to store and query vectors in a MySQL-compatible SQL database, TiDB Serverless, which offers robust vector search capabilities.

Introduction to Vector Search

Vector search is a technique used to find similar items in a dataset based on their vector representations. These vectors, often referred to as embeddings, capture the semantic essence of the data. Whether you’re working with text, images, or other data types, embeddings enable the comparison and retrieval of similar items through distance metrics like cosine similarity or Euclidean distance.

Setting Up TiDB for Vector Search

TiDB Serverless is a distributed SQL database that integrates vector search functionalities, enabling the storage and querying of vector embeddings directly within the database. Here’s a step-by-step guide to getting started:

  1. Sign Up for TiDB Serverless:
    1. Join the waitlist for the private beta of built-in vector search in TiDB Serverless.
    2. Visit tidbcloud.com and sign up.
    3. Select the eu-central-1 region as vector search is currently available there.
    4. Follow the instructions to create a TiDB Serverless cluster with vector support enabled.
  2. Connect to Your TiDB Cluster:
    1. Navigate to the Clusters page on the TiDB Cloud console.
    2. Click on your cluster name to access its overview page.
    3. Use the “Connect” option to get the connection details.

Try TiDB Serverless with Vector Search

Join the waitlist for the private beta of built-in vector search.

Try TiDB Serverless with Vector Search

Join the waitlist for the private beta of built-in vector search.

Creating and Managing Vector Data in TiDB

To illustrate how to store and query vectors, let’s walk through creating a table, inserting data, and performing a vector search.

Create a Table with Vector Field:

CREATE TABLE vector_table (
    id INT PRIMARY KEY,
    doc TEXT,
    embedding VECTOR(3)
);

Insert Vector Data:

INSERT INTO vector_table (id, doc, embedding) VALUES
(1, 'apple', '[1,1,1]'),
(2, 'banana', '[1,1,2]'),
(3, 'dog', '[2,2,2]');

Querying for Nearest Neighbors:

To find vectors closest to a given vector [1,1,3], you can use cosine similarity:

SELECT * FROM vector_table ORDER BY vec_cosine_distance(embedding, '[1,1,3]') LIMIT 3;

Creating an HNSW Index:

To optimize vector searches, you can create an HNSW (Hierarchical Navigable Small World) index:

CREATE TABLE vector_table_with_index (
    id INT PRIMARY KEY,
    doc TEXT,
    embedding VECTOR(3) COMMENT 'hnsw(distance=cosine)'
);

Using TiDB Vector Functions

TiDB supports various vector functions for calculating distances and transforming vectors:

  • Vector Distance Functions:
SELECT vec_cosine_distance('[1,1,1]', '[1,2,3]');
SELECT vec_l1_distance('[1,1,1]', '[1,2,3]');
SELECT vec_l2_distance('[1,1,1]', '[1,2,3]');
  • Vector Transformation Functions:
SELECT vec_from_text('[1,1,1]');
SELECT vec_as_text('[1,1,1]');
  • Norm Calculation:
SELECT vec_l2_norm('[1,2,3]');

Integrating with AI Frameworks

TiDB seamlessly integrates with AI frameworks like LangChain and LlamaIndex, making it easier to implement advanced AI applications. These integrations allow for efficient vector storage and retrieval, essential for AI-driven workflows.

Conclusion

Storing and querying vectors in a MySQL-compatible database like TiDB offers a powerful way to leverage semantic search capabilities in your applications. With robust support for vector functions and integration with popular AI frameworks, TiDB provides a comprehensive solution for handling vector data in the age of LLMs.

Try to store vectors with MySQL SQL grammar in TiDB Serverless.

Join the Waitlist

 

Additional Resources:


Last updated May 21, 2024

Spin up a Serverless database with 25GiB free resources.

Start Right Away