Course Summary

Transcript

Thank you for joining me on our TiDB for MySQL DBAs course. I hope you had as much fun attending the course, as I did creating it.

In this video, I wanted to go over some of the major takeaways for each of the sections of the course. The goal in me doing so is that if I trigger something that is less familiar to you, you can use this as an opportunity to go back and watch the section again, or repeat the lab exercises.

Starting with the TiDB Platform:

Here we learned that the TiDB Platform consists of three major components, the TiDB server which is stateless and handles SQL processing, the TiKV server, which is a transactional Key-Value store, and PD which is the cluster manager. We also learned that there is an optional component called TiSpark which can be used to expose your TiKV data to Apache Spark for complex analytical workloads.

We discussed using the KOST stack for deployment, which includes Kubernetes and Operator. Kubernetes is the container orchestrator and Operator helps manage the TiDB platform as a Kubernetes deployment.

MySQL Compatibility:

In this section we learned about what it means to be MySQL Compatible. Is it network protocol, SQL syntax, both? With TiDB being a distributed database, are there exceptions to this where some features don't make sense to work the same.

Query Optimization:

In query optimization we looked at optimizing SQL statements. Both the format of EXPLAIN, and the potential execution strategies possible in TiDB's query optimizer differ from MySQL - so this section is worth revisiting if you are stuck on a slow query. We also took a look at TiDB's coprocessor, where parts of queries can be pushed down to TiKV for efficient execution.

HTAP:

In this section we revisited one of the TiDB platform's major advantages in that it can run both OLTP and OLAP queries on the same tech stack with minimal ETL. I tried to give some context between what queries should be sent to TiDB servers versus Spark, and directionally where that is heading.

Backup and Restore:

In this section we covered a little bit of theory on the differences between HA and DR, and that one of the common reasons to restore from backup is accidental deletes. In TiDB there is a handy feature called tidb_snapshot which allows you to view what the data looked like at an earlier point in time. We had a lab using this, as well as a full backup and restore.

Schema Changes:

DDL is a major advantage in TiDB over MySQL. This section describes how TiDB's DDL is online, while being distributed, using the asynchronous schema change algorithm first described in Google's F1 paper.

Scaling the TiDB Platform:

In this section we describe how to scale the Kubernetes Cluster, TiDB servers and TiKV servers. We practice it with a number of lab exercises too, so while the process is quite straight forward I recommend DBAs having some practice with it.

Updates, Upgrades and Migrations:

We first categorized changes as an Update, Upgrade or Migration and the motivations of each. For updates in particular, a rolling update is quite straight forward in TiDB. For migrations and upgrades we talked about some of the typical risks and suggested some tools that could be used for closer evaluation.

RocksDB:

In this section we described the RocksDB storage engine, which has some technical advantages and different performance characteristics than what you may be used to with InnoDB.

Monitoring and Observability:

In monitoring and observability we discussed how PD and TiKV use Raft consensus to ensure high availability, and what the TiDB Platform uses instead of performance_schema for observability (hint: it's Prometheus and Grafana).

Okay, that's it! In the next section we will quickly go over some recommendations of where you can go from here.