Introduction to NewSQL Databases

The Evolution from SQL to NewSQL: Meeting Modern Data Needs

Databases have been pivotal in the computing industry, enabling businesses and organizations to store, manage, and retrieve data efficiently. Traditional relational databases (RDBMS), like MySQL and PostgreSQL, have been the bedrock of data-centric applications, providing structured query language (SQL) support, ACID (Atomicity, Consistency, Isolation, Durability) transactions, and robust data integrity.

However, the exponential growth of data and the rise of distributed systems demanded a new approach to database management. This brought forth NoSQL databases, which offered scalability and flexibility by forgoing the traditional relational model. While NoSQL databases solved certain challenges related to scale and unstructured data, they compromised on aspects like strong consistency and the use of SQL, which remained industry standards for querying and manipulating data.

Enter NewSQL, a term coined to describe a new class of databases that inherit the consistency and transactional properties of traditional RDBMS but also provide the scalability demanded by modern applications. NewSQL databases aim to bridge the gap between the robust transaction guarantees of SQL databases and the horizontal scalability of NoSQL databases. They achieve this through innovative architectures, such as distributed computing, consensus algorithms like Raft, and hybrid storage engines.

A visual timeline showing the evolution from SQL to NewSQL, highlighting key developments and their impact on database technology.

Characteristics of NewSQL Databases

NewSQL databases distinguish themselves by combining the best of both SQL and NoSQL worlds. Here are some of the key characteristics:

  1. Scalability: NewSQL databases are designed to scale out horizontally across multiple nodes without sacrificing performance. This ensures that they can handle large-scale data and a high volume of transactions efficiently.
  2. Consistency: Unlike many NoSQL databases that adopt eventual consistency, NewSQL databases maintain strong consistency. This ensures that all operations appear to take place in a consistent and atomic manner.
  3. Availability: High availability is a core feature, achieved through replication and distributed architectures. Even if some nodes fail, the database remains operational.
  4. ACID Transactions: NewSQL databases support true ACID transactions across distributed environments, which is essential for applications requiring reliable and consistent data operations.
  5. SQL Compatibility: They maintain support for SQL, making it easier for developers to leverage their existing SQL knowledge and tools, ensuring a smoother transition from traditional RDBMS.

Key Players in the NewSQL Space

Several databases have emerged under the NewSQL umbrella, each bringing unique features and innovations to the table. Here are some notable NewSQL databases:

  • CockroachDB: Inspired by Google’s Spanner, CockroachDB offers horizontal scaling and strong consistency through the use of the Raft consensus algorithm. It is designed for global distribution, providing high availability and resilience to failures.
  • Google Spanner: A globally distributed and highly scalable database service. Spanner provides strong consistency and supports distributed ACID transactions. It uses TrueTime API to synchronize servers and manage timestamps globally.
  • YugabyteDB: An open-source distributed SQL database that provides high availability and resilience. YugabyteDB supports multi-model data, offering the flexibility to handle various data types and workloads.
  • TiDB: Developed by PingCAP, TiDB separates computing and storage. It supports hybrid transactional and analytical processing (HTAP) and provides strong consistency, horizontal scalability, and high availability.

Core Features of TiDB

Horizontal Scalability and High Availability

One of TiDB’s standout features is its ability to scale horizontally with ease. This means adding more nodes to the cluster can handle increased load without significant configuration changes or downtime. TiDB achieves this through its architecture, which separates computing from storage.

  • Separation of Compute and Storage: TiDB uses TiKV, a distributed key-value storage engine, along with TiDB servers handling SQL processing and query execution. This separation allows each layer to scale independently.
  • Automatic Sharding and Load Balancing: Data in TiDB is automatically divided into smaller chunks called Regions, which are then evenly distributed across TiKV nodes. Placement Driver (PD), as a cluster manager, ensures data is balanced and rebalance Regions as needed.
  • High Availability with Raft: TiDB uses the Raft consensus algorithm to replicate data across multiple nodes, ensuring high availability and fault tolerance. Even if some nodes fail, TiDB remains operational, maintaining data consistency and availability.

Example of creating a table and verifying scalable write operations:

CREATE TABLE users (
    id BIGINT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(255) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL
);

-- Insert Records
INSERT INTO users (name, email) VALUES ('Alice', 'alice@example.com');
INSERT INTO users (name, email) VALUES ('Bob', 'bob@example.com');

Distributed Transaction Support with Strong Consistency

Distributed transactions are critical for applications that require reliable and consistent operations across multiple nodes. TiDB provides robust support for distributed transactions, ensuring strong consistency through its transaction model.

  1. Two-Phase Commit Protocol:

    • TiDB employs a two-phase commit protocol inspired by Google Percolator, ensuring that distributed transactions are atomic and consistent.
  2. Optimistic and Pessimistic Transactions:

    • TiDB supports both optimistic and pessimistic transaction modes. Optimistic transactions are ideal for low-conflict environments, while pessimistic transactions prevent conflicts by locking resources during the transaction.
  3. Snapshot Isolation:

    • TiDB provides snapshot isolation to ensure that transactions read from a consistent snapshot and do not interfere with each other.

Example of a distributed transaction in TiDB:

BEGIN;
INSERT INTO orders (user_id, product_id, quantity, total) VALUES (1, 101, 2, 39.98);
UPDATE inventory SET quantity = quantity - 2 WHERE product_id = 101;
COMMIT;

HTAP (Hybrid Transactional and Analytical Processing) Capabilities

TiDB’s HTAP capabilities allow it to handle both transactional and analytical workloads efficiently, making it suitable for real-time data analytics.

  1. TiFlash for Analytical Processing:

    • TiFlash is TiDB’s columnar storage engine designed for analytical queries. It replicates data from TiKV in near real-time, ensuring that analytical processing can be done without impacting transactional performance.
  2. Real-Time Data Replication:

    • TiDB ensures that data is replicated in real-time between TiKV and TiFlash, maintaining strong consistency and enabling up-to-date data processing for analytical queries.
  3. Seamless Integration:

    • Users can interact with TiDB using standard SQL for both transactional and analytical queries, avoiding the need for separate systems for different workloads.

Example of an HTAP query leveraging TiFlash:

-- Analytical Query on TiFlash
SELECT product_id, SUM(quantity) as total_sold
FROM sales
GROUP BY product_id
ORDER BY total_sold DESC;

Compatibility with MySQL and Easy Integration

TiDB’s compatibility with MySQL makes it an attractive choice for organizations looking to migrate from MySQL without significant changes to their application code or ecosystem.

  1. MySQL 5.7 Protocol Compatibility:

    • TiDB supports the MySQL 5.7 protocol and most commonly used MySQL features, which simplifies migrations and integration with existing applications and tools.
  2. Data Migration Tools:

    • PingCAP provides a series of data migration tools to help transfer data from MySQL to TiDB seamlessly. These tools ensure data consistency and minimal downtime during migration.
  3. Standard SQL Support:

    • TiDB supports standard SQL, meaning developers can utilize their existing SQL knowledge to work with TiDB, making the transition smooth and reducing the learning curve.

Example of migrating a MySQL database to TiDB:

# Using the TiDB Data Migration (DM) tool
tiup dmctl --master-addr $master_ip:$master_port operate-source create ./source1.yaml
tiup dmctl --master-addr $master_ip:$master_port start-task ./task.yaml

Comparative Analysis: TiDB vs. Other NewSQL Databases

Comparison with CockroachDB

Deployment and Ease of Setup

Both TiDB and CockroachDB are designed to be highly scalable and easy to deploy. However, there are notable differences in their deployment and setup processes.

  • TiDB:

    • TiDB can be deployed using TiUP, an integrated deployment tool that simplifies the installation and management of TiDB clusters. Users can deploy, upgrade, and manage TiDB clusters with minimal configuration.
  • CockroachDB:

    • CockroachDB provides a straightforward setup process with a single binary, making it easy to install and configure. The deployment process is user-friendly, and CockroachDB offers various deployment options, including self-managed clusters and CockroachCloud.

Performance Metrics

Performance is a critical aspect when comparing databases, particularly in terms of read/write latency and throughput.

  • TiDB:

    • TiDB offers excellent performance, especially in environments requiring high transactional throughput and real-time analytics. The separation of compute and storage allows TiDB to scale efficiently.
  • CockroachDB:

    • CockroachDB is known for its low-latency transactions and strong consistency. It leverages the Raft consensus algorithm to ensure high availability and data integrity. CockroachDB’s performance is competitive, particularly in transactional workloads.

Backup and Restore Features

  • TiDB:

    • TiDB provides robust backup and restore features using tools like BR (Backup & Restore) and Dumpling. These tools allow users to perform logical and physical backups and support incremental backups and cloud storage integrations.
  • CockroachDB:

    • CockroachDB also offers robust backup and restore capabilities. It supports incremental backups, full backups, and point-in-time recovery. CockroachDB’s backup features are designed to ensure data integrity and minimize downtime during restoration.

Example: Performing a backup in TiDB

# Perform a full backup using BR (Backup & Restore) tool
br backup full --pd $pd_address --storage "local:///path/to/backup"

Comparison with Google Spanner

Cost-Efficiency and Pricing Models

  • TiDB:

    • TiDB provides a flexible pricing model, making it suitable for various budgets. Users can choose between self-managed deployments or using TiDB Cloud, which offers a managed service with predictable pricing.
  • Google Spanner:

    • Google Spanner is offered as a managed service on Google Cloud Platform (GCP). While it provides excellent performance and global distribution, the pricing can be relatively high, making it more suitable for organizations with larger budgets.

Schema Management and Flexibility

  • TiDB:

    • TiDB offers flexible schema management with support for online schema changes. This allows users to modify the schema without significant downtime, making it ideal for dynamic applications.
  • Google Spanner:

    • Google Spanner also supports online schema changes, but with some limitations. Schema changes in Spanner require careful planning to avoid performance degradation during the modification process.

Global Distribution and Data Locality

  • TiDB:

    • TiDB supports multi-region deployments, ensuring high availability and low-latency access across different geographical locations. Data locality is managed by PD, ensuring optimal data distribution.
  • Google Spanner:

    • Google Spanner is renowned for its global distribution capabilities. It provides strong consistency and low latency across regions, making it an excellent choice for globally distributed applications. Spanner’s TrueTime API ensures synchronized timestamps across data centers.

Comparison with YugabyteDB

Multi-Model Support and Versatility

  • TiDB:

    • TiDB primarily provides relational data modeling with strong transactional guarantees. Its design focuses on hybrid transactional and analytical processing (HTAP).
  • YugabyteDB:

    • YugabyteDB offers multi-model support, including both SQL (YSQL) and NoSQL (YCQL) interfaces. This versatility allows it to handle diverse workloads and provides flexibility in data modeling.

Resilience to Failures and Replication Strategies

  • TiDB:

    • TiDB ensures resilience through the Raft consensus algorithm, which guarantees data consistency and high availability. Replication strategies are designed to handle node failures gracefully.
  • YugabyteDB:

    • YugabyteDB leverages the RAFT protocol for replication and fault tolerance. It provides strong consistency and automatic failover, ensuring minimal downtime during node failures.

Community Support and Ecosystem

  • TiDB:

    • TiDB boasts an active open-source community and extensive ecosystem tools. The community contributes to the continuous improvement of TiDB, ensuring a robust and evolving platform.
  • YugabyteDB:

    • YugabyteDB also has a growing community and ecosystem. It offers comprehensive documentation and community-driven support, enabling users to leverage community expertise and resources.

Unique Advantages of TiDB

Unified SQL Layer for Mixed Workloads

TiDB’s architecture, which separates computing from storage, provides a unified SQL layer capable of handling both transactional and analytical workloads seamlessly. This eliminates the need for separate databases for different types of queries, simplifying infrastructure and data management.

  • Hybrid Workload Handling:

    • TiDB can process OLTP workloads efficiently with TiKV, while TiFlash, its columnar storage engine, caters to OLAP queries. This hybrid capability ensures that real-time analytics can be performed without impacting transactional performance.
  • Example: Unified SQL Query

-- Transactional Query
INSERT INTO orders (user_id, product_id, quantity, total) VALUES (2, 103, 1, 19.99);

-- Analytical Query on TiFlash
SELECT product_id, SUM(quantity) as total_sold
FROM sales
GROUP BY product_id
ORDER BY total_sold DESC;

Comprehensive Ecosystem and Active Community Contributions

TiDB’s ecosystem includes various tools and utilities that enhance its functionality and make it easier to manage and maintain.

  • Data Migration Tools:

    • Tools like DM (Data Migration) simplify the process of migrating from other databases to TiDB, ensuring data integrity and minimal downtime.
  • Backup and Restore:

    • Robust backup and restore tools like BR (Backup & Restore) and Dumpling provide comprehensive backup solutions, supporting full and incremental backups.
  • Community and Contributions:

    • The active open-source community contributes regularly, ensuring continuous improvement and innovation. The collaborative environment fosters knowledge sharing and support.

Advanced Backup and Recovery Solutions

TiDB provides advanced backup and recovery solutions that ensure data safety and integrity.

  • Backup Tools:

    • TiDB offers BR (Backup & Restore) for efficient full and incremental backups. Dumpling is another tool for logical backups, providing flexibility in backup strategies.
  • Cloud Storage Integration:

    • TiDB’s backup tools support integration with cloud storage providers like AWS S3, enabling scalable and cost-effective backup storage solutions.

Example of a cloud backup using BR:

# Perform a backup to S3 using BR
br backup full --pd $pd_address --storage "s3://bucket-name/path/to/backup"

Real-Time Analytics with Minimal Overhead

TiDB’s HTAP capabilities enable real-time analytics with minimal overhead, ensuring up-to-date insights without compromising transactional performance.

  • TiFlash for Analytical Processing:

    • TiFlash provides columnar storage specifically designed for analytical queries. It replicates data from TiKV in real-time, allowing analytical processing to be conducted efficiently.
  • Data Consistency:

    • TiDB ensures strong consistency between TiKV and TiFlash, making sure that transactional and analytical queries work on the same consistent data state.

Example of a real-time analytical query using TiFlash:

-- Analytical Query on TiFlash
SELECT country, COUNT(*) as user_count
FROM users
GROUP BY country
ORDER BY user_count DESC;

Conclusion

TiDB represents a significant advancement in the realm of distributed databases, merging the transactional robustness of traditional RDBMSs with the scalability of modern distributed systems. Its unique combination of SQL compatibility, strong consistency, horizontal scalability, and HTAP capabilities makes it an ideal choice for a wide range of applications. Whether you’re dealing with high transactional workloads, real-time analytics, or a combination of both, TiDB offers a reliable and efficient solution, backed by a vibrant community and comprehensive ecosystem. As organizations continue to grapple with growing data requirements, TiDB stands out as a forward-looking, versatile database solution ready to meet the challenges of the future.


Last updated September 14, 2024

Experience modern data infrastructure firsthand.

Try TiDB Serverless