HTAP Summit 2024 session replays are now live!Access Session Replays

In today’s data-driven world, businesses and developers seek advanced solutions to leverage vast amounts of data effectively. Integrating Azure OpenAI with TiDB’s Vector Search capability presents an innovative approach to achieving semantic search and similarity search across various data types, such as text, images, and videos. This article explores the synergy between Azure OpenAI and TiDB Vector Search, providing a comprehensive guide to harnessing their combined power.

Understanding TiDB Vector Search

TiDB Vector Search enables semantic search, allowing users to find related data based on meaning rather than simple keyword matches. This is achieved through vector embeddings, which represent data as points in a semantic space. The distance between these points indicates their similarity, making it possible to perform tasks like image recognition, recommendation systems, and more.

Key Features of TiDB Vector Search:

  • Semantic Search: Search for data based on meaning, improving accuracy and relevance.
  • Versatility: Applicable to texts, images, videos, and other data types.
  • Unified Storage: Store data and their embeddings together in TiDB for seamless querying.

Azure OpenAI: A Brief Overview

Azure OpenAI provides access to OpenAI’s powerful language models, enabling developers to integrate advanced natural language processing (NLP) capabilities into their applications. These models can generate embeddings that are crucial for semantic search applications.

Integrating Azure OpenAI with TiDB Vector Search

By combining Azure OpenAI’s embedding generation with TiDB’s vector search, developers can create robust semantic search solutions. Here’s a step-by-step guide to setting up this integration.

1.Setting Up TiDB Serverless Cluster with Vector Search:

  • Sign Up: Create an account on TiDB Cloud.
  • Create Cluster: Follow the tutorial to create a TiDB Serverless cluster with vector support in the eu-central-1 region.
  • Connection Setup: Connect to the cluster using the provided connection details.

2.Generating Embeddings with Azure OpenAI:

  • Install Dependencies: Ensure you have Python 3.6+, and install necessary libraries.
pip install openai peewee pymysql tidb_vector
  • Environment Configuration:
export OPENAI_API_KEY="your_openai_api_key"
export TIDB_HOST="your_tidb_host"
export TIDB_USERNAME="your_tidb_username"
export TIDB_PASSWORD="your_tidb_password"

3.Inserting and Querying Data in TiDB:

  • Table Creation:
CREATE TABLE vector_table (id INT PRIMARY KEY, doc TEXT, embedding VECTOR(1536));
  • Insert Data:
import openai
from peewee import Model, MySQLDatabase, TextField
from tidb_vector.peewee import VectorField

# Initialize OpenAI client and TiDB connection
client = openai.OpenAI(api_key="your_openai_api_key")
db = MySQLDatabase('test', user="your_tidb_username", password="your_tidb_password", host="your_tidb_host", port=4000)

# Define model
class DocModel(Model):
    text = TextField()
    embedding = VectorField(dimensions=1536)
    class Meta:
        database = db
        table_name = "vector_table"

db.connect()
db.create_tables([DocModel])

# Generate embeddings and insert data
documents = ["Example text 1", "Example text 2", "Example text 3"]
embeddings = [client.embeddings.create(input=doc, model="text-embedding-ada-002").data for doc in documents]
data_source = [{"text": doc, "embedding": emb['embedding']} for doc, emb in zip(documents, embeddings)]
DocModel.insert_many(data_source).execute()

4.Performing Semantic Search:

  • Query for Similar Data:
question = "Find related examples"
question_embedding = client.embeddings.create(input=question, model="text-embedding-ada-002").data[0]['embedding']
related_docs = DocModel.select(DocModel.text, DocModel.embedding.cosine_distance(question_embedding).alias("distance")).order_by(SQL("distance")).limit(3)

for doc in related_docs:
    print(doc.distance, doc.text)

Conclusion

Integrating Azure OpenAI with TiDB Vector Search unlocks powerful capabilities for semantic search and data retrieval. This combination enables developers to build advanced applications that understand and process data in a more meaningful way.

Ready to explore the potential of TiDB and Azure OpenAI? Start by creating your TiDB Serverless cluster today at TiDB Cloud and join the AI revolution in data management and semantic search.


Last updated June 26, 2024

Spin up a Serverless database with 25GiB free resources.

Start Right Away