Scaling Patterns


So in just a second, we will have a lab where we will scale out both TiDB and TiKV, but before we get there, I wanted to cover at a higher level when you would scale one over the other, as well as other performance considerations.

So let's go over some scenarios:

  • TiDB servers are stateless. They make heavy use of CPUs and memory, and if you notice you are constrained on either, it might be time to scale out and add additional servers. This is the most likely reason you would scale out TiDB, but there could be another reason.

    You might have a scenario where you have too much variation in query execution time. Since one of the advantages of TiDB is that it is able to run HTAP queries (the Hybrid of Transactional and Analytics Processing), you may choose to add servers to reduce the latency of query execution. Usually in this scenario, you have more tolerance to variation with the Analytics queries, so one small tweak you can make to architecture is to have some of your TiDB servers responsible for OLTP queries and some responsible for OLAP. Again, because TiDB servers are stateless, you can manage them as they are a single pool, but have some of the servers behind one load balancer, and some behind another.

    To further reduce the variance introduced by potentially expensive analytics queries, you can also optionally configure the Analytics Processing servers to be low-priority. This priority setting is propagated to TiKV as well, so that OLTP queries will always take priority.

  • TiKV servers are usually constrained by the density of fast storage. It means that they do use some portion of memory and CPU, but often for large data sets the motivation to add additional TiKV servers is to expand the storage for the TiDB Platform.

    This is however workload dependent: there are cases where a data set may be smaller, but is actively queried proportionately higher. When you consider that TiKV's coprocessor also handles parts of query execution, there may be cases where you also expand, or change the server's specifications to accommodate these requirements.

  • In our deployment pattern (the KOST stack), we currently have 3 virtual machine instances in one Region which host our containers (or pods in Kubernetes parlance):

    • 2x TiDB servers
    • 3x TiKB servers
    • 3x PD servers
  • PD servers are mostly responsible for cluster management, and do not require scaling with cluster growth. There are 3 PD servers only for high availability reasons.

  • At some point after increasing the number of pods, we will also need to increase the size of our Kubernetes cluster. So in the following lab, we are going to be testing multiple types of scaling:

    • Scaling Kubernetes to 4 nodes
    • Scaling TiDB out to 5 pods
    • Scaling TiKV out to 4 pods
    • Scaling TiKV back in to 3 pods

Sound good? Let's proceed forward.