Title: Log-structured Merge (LSM) Trees in the Cloud Era
Time: May 19th 8 PM EDT (5 PM PDT); May 20th, 8 AM China Standard Time
Introduction
The log-structured merge (LSM) tree is the standard for write-intensive storage layers for both production NoSQL data stores and relational systems. LSM-based systems are used by various applications and are deployed in shared infrastructures such as a public or private cloud. Therefore, they must support a number of requirements, including performance, cost, and privacy. LSM-based systems must also respond to sometimes challenging external requirements and unpredictable workloads.
In this meetup, we will discuss the latest developments in LSM-tree, especially in the cloud era. We will start with a panel discussion focusing on system design in the cloud. Then, Professor Manos Athanassoulis and PhD candidate Andy Huynh—both from Boston University—will join us to share their insights on the LSM-tree. Ed Huang and Xiaoguang Sun will discuss their experience on the LSM-tree when developing TiKV.
Agenda
- A Prelude to Robust LSM-Trees – Manos Athanassoulis
- Paper Reading: “ENDURE: A Robust Tuning Paradigm for LSM Trees” – Andy Huynh
- Remote compaction and LSM on tiered and cloud storage – Ed Huang and Xiaoguang Sun
- Panel discussion: Using heterogeneous hardware and machine learning to design data systems in the cloud
Speakers
Manos Athanassoulis
Assistant Professor of Computer Science, Boston University,
Director and Founder of the BU Data-intensive Systems and Computing (DiSC) lab
Professor Athanassoulis’ area of research is data management, focusing on building data systems that efficiently exploit modern hardware (computing units, storage, and memory), and are deployed in the cloud. These systems can adapt to the workload both at setup time and, dynamically, at runtime. Before he joined Boston University, Manos was a postdoc at Harvard University. Earlier he obtained his PhD from EPFL, Switzerland, and spent one summer at IBM Research, Watson. Manos’ work has been recognized by awards like “Best of SIGMOD” in 2016, “Best of VLDB” in 2010 and 2017, and “Most Reproducible Paper” at SIGMOD in 2017. He has been supported by a National Science Foundation (NSF) and an NSF CAREER award. He also received industry funds including a Facebook Faculty Research Award and gifts from Cisco, Red Hat, and Meta. He has served or is serving as:
- Publicity Co-chair for IEEE ICDE 2021
- Reproducibility Chair for ACM SIGMOD 2021
- Sponsorship Chair for ACM SoCC 2021
- Availability and Reproducibility Co-Chair for ACM SIGMOD 2022 & 2023
- Publicity Chair for VLDB 2022
- Area Chair for IEEE ICDE 2022
- Proceedings Chair for VLDB 2023
- VLDB Ambassador for Industry Relations for 2022 and 2023.
Andy Huynh
PhD candidate and first author of VLDB 2022 paper paper “ENDURE: A Robust Tuning Paradigm for LSM Trees”
Andy is a PhD student with the DiSC lab at Boston University and is advised by Manos Athanassoulis. His research is in data systems and databases, with a focus on automatic tuning of data systems, optimal data systems under changing environments, and applications of machine learning for systems. He received a 2020 IBM PhD Fellowship.
Ed Huang
Co-founder and CTO, PingCAP
Ed Huang is co-founder and CTO of PingCAP, one of the creators of the TiDB distributed database and the TiKV key value store. While he was at Wandou Labs, Ed worked on clustering Redis and created and open-sourced Codis, a proxy based high performance Redis cluster solution. Deciding to focus on this area, he founded PingCAP and created TiDB and TiKV.
Xiaoguang Sun
Head of Cloud Storage Engine Team, PingCAP
Xiaoguang leads PingCAP’s Cloud Storage Engine team. Before he joined PingCAP, Xiaoguang was the director of Zhihu’s Infrastructure team. He has worked on distributed systems for a long time and is interested in cloud native technologies.