Observability

Resources

Transcript

If you come from a MySQL background, the built-in system for observability is performance_schema. There are some secondary observability commands, but in recent MySQL releases, this has become the gold standard.

Performance Schema is implemented as a storage engine inside of MySQL. Recent MySQL servers are very well instrumented with Performance Schema too: any time the server is allocating memory, waiting on locks or performing I/O: you can query that with regular SQL and find out!

There are a couple of quick drawbacks to it: Performance Schema uses memory, so metrics collected will be lost after server restart. The history retention is also limited, so on a busy server, some key metrics will be cycled out after a few minutes of activity.

TiDB does not use performance_schema, and not withstanding what was just mentioned, the real key reason is because TiDB is a distributed system.

Having only a single-server's metrics in isolation is not always enough to observe the running system. And thus, TiDB servers send their metrics to a centralized database that is specifically designed for metrics collection. I am of course talking about Prometheus.

If you are not yet familiar with Prometheus, it is a popular tool that along with Kubernetes and TiKV is part of the Cloud Native Compute Foundation projects. It is often used in combination with Grafana, which adds a layer of dashboards on top.

All of the TiDB Platform components support Prometheus integration, and when you deploy the TiDB Platform with TiDB Operator, they will also be installed and configured out of the box. We even provide built-in Grafana charts.

So in our lab, we will be logging into Grafana and observing key metrics about our TiDB Platform while we are performing operations.