TiDB and MySQL: Behavior Differences

Resources

Transcript

In the previous video, we talked about TiDB compatibility with MySQL and how some features were not present. For example: stored procedures, triggers, views.

Let's now talk about something more subtle: the feature also exists in TiDB, but works differently. In many cases this is TiDB is a distributed database, and needs to work a little bit differently. Let's take a look at the TiDB manual and talk specifics.

Auto increment IDs

The auto-increment ID feature in TiDB is only guaranteed to be automatically incremental and unique but is not guaranteed to be allocated sequentially. Each TiDB server is allocated auto-increment IDs in batches of 30 thousand. It is not a good practice to rely on auto-increment values to be sequential, but there are some applications that do do it, for example - as a shortcut for sorting rows by date.

Performance schema

TiDB is instrumented for collecting various performance counters about its execution, but with TiDB nodes being stateless (and without having membership) you would only be able to see statistics for an individual server at runtime. So instead in TiDB, we use a combination of best-of-breed tools with Prometheus and Grafana to centralize metrics. We will cover this in more detail when we get to ‘monitoring’. For the time being, simply know that while you can read from performance_schema tables in TiDB most will return zero results.

Built-in functions

TiDB supports most of MySQL's built-in functions, such as those used when manipulating dates, manipulating strings and converting between formats. There are some exceptions to this such as geospatial functions which similar to not supporting spatial indexes, TiDB does not support yet. There are also some edge cases such as a SOUNDEX, where the function has not been added yet due to a relatively infrequent usage. At the time of recording, TiDB also does not support all of the JSON functions added in MySQL 5.7, and all aggregate functions. Both are in the process of being addressed.

DDL

DDL is a major advantage of TiDB over MySQL, one that is not spoken about enough. In TiDB DDL uses the online asynchronous schema change algorithm first written about in Google's F1. It ensures that your regular operations are not impacted while DDL is being performed on the TiDB Platform. We will cover DDL in more detail later in this course, but what is relevant for now is that TiDB does not allow you to create multiple indexes or columns at the same time, as you may be used to doing in MySQL. It also does not support adding, changing or dropping the primary key on a table.

Transaction Model

Internally TiDB uses optimistic locking for transactions, which means that some transaction conflicts will only be checked as part of a commit process. This is similar to Group Replication in MySQL (but not regular InnoDB + Replication), and it is important that your application check for errors from commit.

Large Transactions

Due to the distributed + 2-phase commit requirement of TiDB, large transactions that modify data can be particularly problematic. TiDB intentionally sets some limits on transaction sizes to limit impact:

  • Each Key-Value entry is no more than 6MB
  • The total number of Key-Value entry is no more than 300,000
  • The total size of Key-Value entry is no more than 100MB

(There are similar limits on Google Cloud Spanner, by which TiDB's architecture is inspired.)

Load data

Related to the point above, loading large files into a table in a single transaction is not ideally suited. TiDB commits rows from load data every 20,000 rows.

Storage engines

TiDB actually supports a similar architecture to MySQL, in which TiKV is the distributed storage engine that sits below TiDB. But in most cases, you will be using TiKV, and for compatibility reasons when you create a table it will look like it's InnoDB. TiKV does behave similarly to InnoDB in many respects since it is a transactional, strongly consistent engine. You can not mix-and-match engines, and there is for example no MyISAM-like equivalent in TiDB.

EXPLAIN

The output of the query execution plan returned from the EXPLAIN command differs from MySQL. We will walk through examples when we get to query optimization.

So that's it for major behavior differences. We covered a lot in this video, and if you have any lurking questions, I recommend giving the MySQL Compatibility documentation a read over before proceeding on.