Author: PingCAP
Transcreator: Fendy Feng; Editors: Tom Dewan, Wink Yao
TiDB Hackathon is a developers’ carnival held by the TiDB community for those who are passionate about hacking. All the participating developers have to create and complete a software project related to the TiDB ecosystem and demonstrate it within 48 hours. Developers who create especially innovative or exciting “star” projects within such a short time can get a hefty bonus.
We completed in early January our latest event, TiDB Hackathon 2021, whose theme was “Explore the Sky.” It was the fifth hackathon event in the TiDB community and our largest ever, attracting 64 teams with 279 developers. They came from high-tech giants such as Tencent and TikTok and prestigious universities such as Peking University and the Royal Melbourne Institute of Technology. Ten teams won prizes totalling RMB 400,000 (about $63,000 USD).
Hackathon 2021 produced many outstanding projects, including pCloud, Software-as-a-Service (SaaS) designed for data backup and restore, and TiGraph, a graph database built on top of TiDB and TiKV.
Let me briefly introduce some star projects produced at Hackathon 2021 and tell you what surprises they brought us.
TiDB tiered storage: lower storage cost
Built by Team He3
When you use TiDB, as the amount of user data grows, the storage cost gradually increases. Over time, storage cost becomes a larger and larger proportion of the total database cost. A pressing issue for many users is how to reduce TiDB’s storage cost.
The He3 team built a tiered storage that separates the hot and cold data to reduce TiDB storage cost. In their design, the hot data is stored on TiKV, a distributed and transactional key-value database, and the cold data, which has fewer queries and analytical workloads, is stored on less expensive Amazon S3 storage. The S3 storage engine supports the pushdown of some TiDB operators, so TiDB can respond to analytical queries based on the cold data stored on S3.
TiGraph: a distributed graph database on top of TiDB
Built by Team TiMatch
Graph technology is the foundation of modern data analytics, enabling users to discover the relationships between people, locations, items, and events across disparate data assets. Graph technology can also help you quickly find answers to complex business questions that in the past were almost impossible.
By introducing a graph mode to TiDB with a new set of key-value codes, TiGraph builds a distributed graph database with a complete syntax on top of TiDB and TiKV. It can handle graph data analytics that are difficult for traditional relational databases. TiGraph also is a step forward to a mature and easy-to-use graph database on top of TiDB.
pCloud: the iCloud of databases
Built by Team pCloud
For database users, data backup and restore is a must. Losing data is almost equivalent to losing everything, especially for enterprises. That’s why we need a powerful backup and restore tool to prevent data loss. Currently, TiDB users can use the Backup & Restore (BR) tool to back up and restore all TiDB cluster data and the incremental data. But the downside is the BR tool can only restore the data up to the time of the backup. That’s not enough.
pCloud is a SaaS project that is fully hosted on cloud. It can host database backup and recovery in one stop, and it can recover data to any point in time. All the backup data is stored on the Amazon S3, which is relatively inexpensive.
TiLaker: direct data into data lakes with efficiency
Built by Team TiLaker
TiDB can be used as a data hub between applications and offline data lakes. It processes fresh data from applications to serve real-time queries, and it archives data in batch to the offline data lakes. But the problem is that you have to use different tools in two split processes to direct data from TiDB into data lakes. You use Dumpling to export full data and TiCDC to replicate incremental data.
TiLaker solves this problem. It is a data export tool which streamlines the process of exporting all the data and incremental data change into data lakes. TiLaker also builds a more rapid, efficient, and streamlined pipeline to connect and integrate TiDB and the big data ecosystem.
TiDB Visual Plan: making SQL execution plans visible
Built by Team TiVP
In the everyday work of database performance tuning, TiDB SQL tuning is in great demand. More than half of those demands are related to the execution plans. SQL is a declarative language, and observing execution plans is the only way to check its execution efficiency. But execution plans explained by slow SQL queries are complex and difficult to understand, and this hurts tuning efficiency. So, making execution plans more readable will definitely make it easier to diagnose performance issues.
TiDB Visual Plan aims to visualize SQL execution plans. It collects and sorts SQL execution plans and its runtime information in the TiDB system. It then builds a display interface based on dalibo/pev2, an open source Vue.js component, making SQL execution plans visible and easier to understand. TiDB Visual Plan also helps users locate and highlight the issues in TiDB such as slow SQL queries and wrong execution plans.
Collie Diagnosing Platform: making TiDB observability possible
Built by Team “We are so dumb and will the judges be mad?”
Collie Diagnosing Platform helps you identify problems by integrating fault scene information collection, online UI observation and analysis, and machine learning-assisted diagnosis. It also explores how database administors (DBAs) and operation engineers will probably work in the next three to five years.
Stay tuned
Today in this post, we glanced at a handful of star projects produced at Hackathon 2021. In future posts, we will introduce more outstanding projects and take a deep dive into them. Stay tuned.
How about you? Are you a hacker? If you are also interested in hacking and the TiDB Hackathon, you’re welcome to follow @PingCAP on Twitter, Facebook, GitHub, and Slack for the latest information.
Experience modern data infrastructure firsthand.
TiDB Cloud Dedicated
A fully-managed cloud DBaaS for predictable workloads
TiDB Cloud Serverless
A fully-managed cloud DBaaS for auto-scaling workloads