Rust is a systems programming language sponsored by Mozilla Research. It moves fast and steady with a 6-week release cycle ever since its 1.0 version in May 2015.
See the following list for some of the features that most attract us:
The minimum runtime and efficient C bindings empower Rust to be as efficient as C and C++, thus making it very suitable for the systems programming field where high performance matters the most.
The powerful type system and unique life cycle management facilitate the memory management during the compiling, which ensures the memory and thread safety and makes the program run very fast after the compiling.
Rust provides pattern matching and type inference like a functional programming language and makes the code simple and elegant.
The macros and traits allow Rust to be highly abstract and save quite a few boilerplates during the engineering especially when it comes to the libraries.
Because of the excellent package management tool, Cargo, Rust has many types of libraries, such as Hyper for HTTP, Tokio and mio for asynchronous I/O, basically all the libraries that are required to construct a backend application.
Generally speaking, Rust is mainly used to develop server-side applications with high performance at this stage. In addition, its innovation in the type system and syntax gives it a unique edge in developing Domain-Specific Libraries (DSL).
As a new programming language, Rust is unique. To name just a few projects that are using Rust,
As one of the listed Friends of Rust, TiKV has been one of the top projects in Rust according to the Github trending.
TiKV is a distributed key-value database. It is the core component of the TiDB project and is the open source implementation of Google Spanner. We chose Rust to build such a large distributed storage project from scratch. In this blog, I will uncover the rationale.
In the past long period of time, C or C++ has dominated the development of infrastructure software such as databases. Java or Golang has problems such as GC jitter especially in case of high read/write pressure. On the one hand, Goroutine, the light-weight thread and the fascinating feature of Golang, has significantly reduced the complexity of developing concurrent applications at the cost of the extra overhead in context switching in the Goroutine runtime. For an infrastructure software like a database, the importance of performance goes without saying. On the other hand, the system needs to remain its “Certainty” which makes it convenient for performance tuning. But introducing GC and another runtime contributes to the opposite. So for quite a long time, C/C++ seems to be the only choice.
TiKV originates from the end of 2015. Our team was struggling among different language choices such as Pure Go, Go + Cgo, C++11, or Rust.
Pure Go: Our core team has rich experience in Go. The SQL layer of TiDB is developed in Go and we have benefited quite a lot from the high efficiency brought by Go. However, when it comes to the development of the storage layer, Pure Go is the first option to rule out for one simple reason: we have decided to use RocksDB as the bottom layer which is written in C++. The existing LSM-Tree implementations (like goleveldb) in Go were hardly as mature as RocksDB.
Cgo: If we had to use Go, we had to use Cgo to bridge but Cgo had its own problems. At the end of 2015, the performance might be greatly impacted if calling Cgo in Go code rather than calling Cgo in the same thread with Goroutine. Besides, databases require frequent calls to the underneath storage libraries, aka RocksDB. It was highly inefficient if the extra overhead was needed every time the RocksDB functions were called. Of course, some workarounds could be introduced to enlarge the throughput of calling Cgo, such as packaging the calls within a certain period to be a Cgo batch call that will increase the latency of a single request and erase the Cgo overhead. But, the implementation might be very complex while the GC problem was not entirely solved. At the storage layer, we want to use the memory as efficiently as possible. Hacky workarounds such as extensive use of
syscall.Mmap or object reuse might damage the readability of the code.
C++11: There ought to be absolutely no issue with C++11. RocksDB is developed using C++11. But given the team background and what we want to do, we didn’t choose C++11. The reasons are as follows:
Rust: The 1.0 version of Rust is released in May 2015 with some charming features:
The first and foremost reason is memory safety. As mentioned earlier, the issues in the memory management and data race might seem to be easy for C++ veterans. But I believe the utmost solution, which is what Rust is doing, is to put constraints in the compiler and solve it from the very beginning. For large projects, never ever bet the quality solely on human beings. To err is human. Though Rust is hard to begin with, I think it’s totally worth the while. Besides, Rust is a very modern programming language with its extraordinary type system, pattern modeling, powerful macros, traits, etc. Once you are familiar with it, it can greatly improve the efficiency which might be the same as if we chose C++ counting the time to debug. According to our experience, it takes about 1 month for a software engineer to code in Rust from zero experience. The efficiency is almost the same between an experienced Rust engineer and a Golang engineer.
To sum up, Rust, as an emerging programming language, seems to be new to most of the developers in China, but it has become the most promising challenger to C/C++. Rust was also crowned the “most loved” technology in StackOverflow’s 2016 developer survey. So from a long term, Rust will shine in scenarios where memory safety and performance matter the most.
© 2018 PingCAP. All Rights Reserved.中文