Pinterest, one of the world’s leading visual search and discovery platforms, recently undertook a significant overhaul of its graph storage service. At the HTAP Summit 2024, Pinterest Senior Software Engineer Ke Chen discussed the company’s transition from its legacy Zen system to a new architecture built on top of TiDB, an advanced distributed SQL database.
This blog recaps Chen’s talk from the event. It highlights the key challenges Pinterest faced, the motivations behind its migration, and the results achieved with PinGraph, their new TiDB-based graph service. Through this transformation, Pinterest streamlined data management, improved performance, and reduced costs while supporting its massive user base and diverse use cases.
The Legacy Graph Service: Challenges with Zen
Pinterest’s original graph service, Zen, was built in 2014 and powered by a combination of HBase and MySQL. While Zen supported over 100 use cases and handled 1.5 petabytes (PBs) of data with a peak of 8 million queries per second (QPS), Chen said it faced several limitations:
- Data Inconsistencies: Zen relied on application-level secondary indexing, which caused data inconsistencies during failures or rollback scenarios. To address these issues, a daily MapReduce job, “Dr. Zen,” was used to reconcile data, adding operational overhead.
- Limited Query Capabilities: Chen noted that queries in Zen were constrained to basic patterns. These included fetching nodes and edges by single properties, and lacked support for advanced filtering, composite indexes, or multi-hop queries.
- High Technical Debt: With a 10-year-old codebase and outdated dependencies, Zen struggled to adapt to Pinterest’s evolving needs. This made it harder to support new business use cases.
Chen said these challenges prompted Pinterest to explore modernizing its graph service, leading to the creation of PinGraph.
PinGraph: A Modern Graph Service Built with TiDB
Pinterest chose TiDB as the backbone of PinGraph to address Zen’s limitations and future-proof its graph service. Chen noted that TiDB’s distributed SQL architecture aligned with Pinterest’s goals of achieving consistency, scalability, and efficiency.
TiDB was selected as the foundation for PinGraph for several reasons:
- Strong Consistency: TiDB’s native support for ACID transactions and secondary indexes eliminated the need for application-level indexing, simplifying data management and reducing errors.
- Scalability: As a distributed SQL database, TiDB enabled horizontal scaling without requiring manual sharding, allowing Pinterest to easily handle growing workloads.
- SQL Query Power: TiDB provided flexible and powerful query capabilities, supporting composite indexes, range queries, and multi-hop queries, essential for graph use cases.
- Operational Efficiency: By replacing the outdated infrastructure with TiDB, Pinterest expected to reduce maintenance overhead and cut costs.
- Open Source: TiDB’s active community and open-source nature aligned with Pinterest’s values, offering a collaborative ecosystem for innovation.
Inside PinGraph’s Graph Service Architecture
Chen walked through how PinGraph was designed to meet Pinterest’s diverse use cases while simplifying operations and improving performance. Built on top of Pinterest’s Structured Data Store (SDS) framework, PinGraph integrated seamlessly with TiDB.
PinGraph’s architecture consisted of the following components:
- Frontend and Backend: The PinGraph frontend handled graph-related logic and queries, while the SDS backend executed operations using TiDB’s SQL capabilities.
- Universal Query Language: A layer of abstraction over SQL simplified communication between PinGraph and TiDB.
- Type Schema: A metadata management component defined nodes and edges using a YAML-based schema, automating table creation and schema updates in TiDB.
- Caching Layer: Built into SDS, the caching layer reduced load on TiDB by handling read-heavy workloads efficiently.
Chen said PinGraph’s modular architecture enabled Pinterest to centralize graph operations, enhance query capabilities, and streamline infrastructure management.
Advanced Query Capabilities with TiDB
PinGraph addressed the limitations of Zen, its original graph service, by introducing robust query features with TiDB:
- Composite Indexes: Queries could now filter on multiple properties simultaneously, improving flexibility and efficiency.
- Range Queries and Pagination: Support for range-based filtering allowed more granular data retrieval.
- Multi-Hop Queries: Although limited to three hops for performance reasons, this feature opened new possibilities for exploring relationships across nodes.
- Efficient Edge Counting: PinGraph optimized common queries like counting incoming or outgoing edges, reducing the strain on TiDB.
Chen mentioned these enhancements now allow Pinterest’s product teams to model complex relationships, such as user connections or content recommendations, more effectively.
Pinterest’s Results: Performance Gains and Cost Savings
Since adopting TiDB, Pinterest has seen significant improvements in query latency, infrastructure costs, and operational efficiency.
The migration to PinGraph delivered transformative results including:
- Improved Latency: Compared to Zen, PinGraph reduced P99 latency by up to 10x, ensuring faster query responses and a more reliable user experience.
- Cost Reduction: By consolidating infrastructure and leveraging TiDB’s horizontal scalability, Pinterest achieved over 50% savings in infrastructure costs.
- Simplified Operations: PinGraph’s unified architecture reduced technical debt and operational complexity, freeing engineering resources for innovation.
- Future-Ready Platform: With advanced query support and a modular design, PinGraph provides a strong foundation for evolving business needs.
Future Plans: Evolving with Industry Standards
In the coming months, Chen said Pinterest has ambitious plans to further enhance PinGraph through:
- GQL Support: Implementing the Graph Query Language (GQL) will make queries more expressive and user-friendly, reducing reliance on Thrift APIs.
- Customer-Focused Features: Developing a streamlined interface for product teams will boost productivity and enable faster data exploration.
- Graph-Native Enhancements: Pinterest aims to introduce more graph-specific functionalities to unlock new use cases and improve performance.
Conclusion
Pinterest’s migration from Zen to PinGraph, powered by TiDB, highlights the transformative potential of distributed SQL databases for modern graph services. By addressing long-standing challenges, PinGraph improved scalability, performance, and operational efficiency, enabling Pinterest to support its massive user base with confidence.
For organizations facing similar challenges, Pinterest’s journey offers a compelling blueprint for leveraging distributed SQL to modernize infrastructure, streamline operations, and unlock new opportunities. As PinGraph continues to evolve, it sets a benchmark for scalable and feature-rich graph services in data-intensive environments.
Want to take the first steps in modernizing your legacy data infrastructure with distributed SQL? Register to watch this entire session from the event for additional insights. Happy viewing!
HTAP Summit 2024 session replays!
TiDB Cloud Dedicated
A fully-managed cloud DBaaS for predictable workloads
TiDB Cloud Serverless
A fully-managed cloud DBaaS for auto-scaling workloads