TiDB Platform

Transcript

When comparing TiDB to MySQL and discussing architecture, not everything is an apples-to-apples comparison. While TiDB speaks the MySQL protocol, it is important to clarify it does not share any MySQL source code in common.

TiDB is a NewSQL database that was inspired by the design of Google's internal Spanner project. It is distributed by design and consists of several components that together make what we call the TiDB Platform. So let's go over what each of those components are:

  • To start with, we have TiKV. TiKV is a distributed transactional key-value store. Its role in the TiDB platform is to provide scale out storage with native replication to provide high availability. Using MySQL terminology, you can think of every table in the TiDB architecture as automatically partitioned by range. In TiDB terminology, we call each range a “Region”, and automatically re-partition as required to keep the Regions at 96MB each (though this size is configurable depending on use case).

    TiKV keeps three copies of each Region by default, and uses Raft to ensure that this redundancy is always maintained. We won't go into Raft in this course but it is a popular consensus protocol used in distributed systems to maintain quorum.

    Also, don't let the KV name fool you; unlike other key-value stores, it is an ACID compliant store, and also preserves key order, so you can perform range operations like between X and Y. It forms the storage foundation for which the TiDB platform is built.

  • Sitting above TiKV is TiDB. TiDB is the component that speaks the MySQL protocol, and converts it to requests to be sent to TiKV. It doesn't store any data itself, which makes it stateless and very easy to scale. A simplified way of thinking of it is like an intelligent proxy which translates SQL into KV requests. It's worth emphasizing that we built this layer from scratch in order to take advantage of the distributed nature of TiKV, which I'll do a deep dive on when we talk about Coprocessor. Just remember for now that it's not a MySQL fork.

  • The last mandatory component to describe is Placement Driver or PD. PD is the cluster manager. It handles all of the TiKV Region rebalancing when there are data hotspots, or Regions that require merging. It is also responsible for the metadata such as SQL DDL. One of the critical services PD also provides is time synchronization for the TiDB platform. We will cover more on what that means and why it is required later.

So a quick recap on the mandatory components and their roles:

  • TiKV provides scale out storage. It natively replicates data between nodes and keeps redundancy.
  • TiDB provides a stateless SQL layer that speaks MySQL.
  • PD provides cluster management.

In this course, we will use the terms TiDB Platform to refer to all of the components collectively, and TiDB server to refer to the stateless SQL layer. While outside the scope of this course, it is technically possible to run TiKV without TiDB and just use a key-value interface directly to your data.