Optional Components: TiSpark


In the last video, we talked about the three core components that make up the TiDB Platform:

  • TiKV for distributed storage
  • The TiDB server for SQL processing
  • PD for cluster management

There is one additional component to the TiDB Platform which you can considered optional, and that's TiSpark.

TiSpark allows you to bypass the SQL layer and have Apache Spark connect to TiKV directly. Why would you want to do this? You might be running complex OLAP queries or machine learning workloads that are better suited to Spark than they are to the TiDB server.

Here we can also see one of the strong benefits of TiDB using a component based architecture: you can access your data between these two systems (Spark and MySQL) without having to move data between the two. You also have a third option, which is to access your data directly with a key-value interface.

The moving of data between systems is referred to as ETL, or Extract-Transform-Load. With the TiDB platform, you can run transactions and analytics at the same time, which means that any analytics you need to run on your data can be run immediately with little time delay.

In industry parlance, this feature is known as running Hybrid Transaction and Analytics Processing, or HTAP. It is an emerging space, and many analysts believe that over time the convenience of being able to use one system to do both workloads will win over single purpose system (either Online Transaction Processing (OLTP) or Online Analytical Processing (OLAP)).

We have a section on HTAP later in this course, where we will try and answer which queries are better suited for TiSpark versus the TiDB Server.

In the mean time, to tie this back to MySQL: if you are currently using MySQL plus another system such as Hadoop, HBase or a column store, the benefit of the TiDB Platform is that you can consolidate and simplify your architecture.