RocksDB

Transcript

As I mentioned in our first introduction to TiKV, the storage is based on RocksDB. In fact, you could consider TiKV to be the equivalent to be “distributed RocksDB with built-in replication”.

So what is RocksDB? RocksDB is the MySQL equivalent of InnoDB. It is an embedded database library developed by the team at Facebook, who forked an earlier database named LevelDB (developed by Google).

In MySQL you can choose between various storage engine options, such as InnoDB or MyISAM. As a point of comparison, this is also possible in TiDB (the API that it uses to speak to TiKV could be implemented by another technology in future). In practical terms though, unless you are a TiDB developer, the storage engine you will use is TiKV.

So the next logical question is how does RocksDB differ from InnoDB? I'm glad you asked.

In Facebook's case, they are large users of RocksDB and have moved systems from InnoDB (developing their own MySQL storage engine called MyRocks). Let's look at their primary motivations:

  • They are predominately space constrained. Their existing InnoDB-based systems support higher QPS than they currently require. How much data can be stored on a server is limited by flash capacities. RocksDB has efficient compression, which helps increase the density of each server, and they have the ability to trade off CPU to do so.

  • Flash durability is limited by write cycles. A simplified comparison of RocksDB versus InnoDB, is that InnoDB is optimized for read operations and RocksDB is optimized for write operations. The lifetime durability of flash is limited in the number of writes cycles (not reads), so having data structures that are more efficient for writing helps increase the lifetime of the hardware.

So while this describes some of Facebook's motivations - let me also describe our motivations at PingCAP.

  • Storage Engines, similar to filesystems, can take a long time to mature. By selecting from the best of existing technologies, we are able to go to market with technology faster.

  • What was true with compression for Facebook is also true for many of PingCAP's customers. But as well as compressing well, another nice feature of RocksDB is that the degradation to performance can be much smaller than InnoDB for cases where indexes no longer fit in memory. This helps make TiDB suitable for very large data-sets.

Hopefully this helps provide some background context. In the next video we'll look into the structure in more detail.