Experience modern data infrastructure firsthand.
Catalyst, a cutting-edge software-as-a-service (SaaS) platform, serves the needs of customer success teams, product managers, marketers, and data scientists. It stands out as the world’s premier Customer Success Platform (CSP), expertly crafted by a team of industry leaders.
Catalyst seamlessly integrates with your existing tools, delivering a unified hub for customer data. With Catalyst, customer success managers gain the ability to proactively address potential churn risks, receiving automated alerts when customers neglect critical features essential for their success.
The Challenge: SaaS with Mixed Data Workloads
Catalyst’s SaaS platform deals with a large amount of data. It consolidates data from a wide range of sources including Salesforce, Mixpanel, and PostgreSQL. It then moves data into the Catalyst ecosystem for processing, analytics, and actionable insights.
This platform works with three major data types:
- Transactional objects: Includes internally-created notes and tasks, as well as externally-collected data objects from Salesforce, Zendesk, and others.
- Read-only objects: Ticket objects collected from platforms including Jira and Zendesk.
- Time series objects: The platform’s most important data type. When the Catalyst team was looking for a new database, one of their main requirements was if it could handle these kinds of objects.
The Problem: Legacy Data Architecture (and Its Bottlenecks)
It took a while for Catalyst to find the best database and data architecture for its SaaS platform.
In the beginning, Catalyst used Ruby on Rails and PostgreSQL to manage all externally collected data from diverse sources. However, as its business grew and data sources multiplied rapidly, PostgreSQL struggled to meet their evolving demands. The company initially tried to remedy this issue by storing the data as JSON documents, but query performance was heavily impacted.
From there, the team turned to pre-caching. They adopted Elasticsearch to store the results in order to respond to customers’ queries more quickly. However, because Elasticsearch doesn’t support SQL-style JOINs, Catalyst had to precompute everything before storing it in Elasticsearch. Due to the increased amount of data being stored, costs skyrocketed.
Defining SaaS Requirements
As a result, Catalyst decided to rearchitect its tech stack with a new database at its heart that could:
- Withstand hybrid transaction/analytical processing (HTAP) workloads: The company has to deal with transactional, read-only objects, and time series data. They need a solution, either a single database or a database combination, that can handle both transactional and analytical workloads.
- Respond at speed: The new database solution must be more agile than Catalyst’s previous solution—especially in query speed and user interface performance. It must respond to queries in subseconds at low latency.
- Handle complicated and highly customized data: Catalyst’s customers can customize many settings including queries, data transformations, and relationships, both inside the platform and on Salesforce and Zendesk. The combination of custom objects integrated with many custom fields can be quite complicated. The new solution has to be able to handle such situations.
- Be highly available: Catalyst needs to be very responsive to their customers. Keeping their system up is Catalyst’s top priority. In some cases when Catalyst was down, they got hit up by customers within tens of seconds. Therefore, the new database solution must be highly available to help Catalyst survive possible disasters.
- Be horizontally scalable: Scalability is another must-have. The company deals with a huge volume of data, and the volume will keep expanding. The new solution must easily scale to an enormous size.
- Be strongly consistent: Data consistency is another requirement. But given that it is extremely hard to keep strong consistency throughout the entire system—due to so much data processing— Catalyst was accepting eventual consistency.
The Solution: Re-architecting SaaS with TiDB to Scale for Future Growth
To address their existing issues, the Catalyst team redesigned their entire data processing and storage system. That’s when they discovered TiDB, the most advanced open-source distributed SQL database with HTAP capabilities.
TiDB Outperforms
The company was careful with their choices of a new database. They investigated TiDB along with two other options: Amazon Aurora and YugabyteDB, each coupled with AWS Timestream. Both options combine an online transactional processing (OLTP) database and a time series database.
Catalyst tested the three candidates by running large real-world datasets from their internal Salesforce and Jira instances. Under this heavy load, they simultaneously and continuously ran a subset of queries. The query response speed was among the most important evaluation criteria.
TiDB responded to both representative and aggregation queries in seconds—much faster than the other options. In regards to time series aggregation queries, TiDB was also agile enough to return results in approximately 7 seconds. The following table summarizes some key test results.
Query speed(s) | |||
Key indicators | Amazon Aurora & AWS Timestream | YugabyteDB & AWS Timestream | TiDB |
Representative queries | ~30 | ~3 | |
Aggregation queries | ~120 | ~2 | |
Time series aggregation queries | Did not test | ~7 |
Figure 1. Key test results.
As shown above, the types of queries were:
- Representative queries: The queries that customers were most interested in.
- Aggregation queries: Primarily computations based on complicated JOINs.
- Time series aggregation queries: Catalyst did not test time series aggregation queries on Amazon Aurora and YugabyteDB because of limited time and TiDB’s impressive performance.
Rearchitecting SaaS for Success
The new data architecture powering Catalyst’s SaaS platform has five distinct data layers, as shown in the below diagram:
Figure 2. The new data architecture running Catalyst’s SaaS platform.
- Data ingestion layer: Raw data enters Fivetran, an automated data movement platform.
- Data lake layer: Databricks acts as a centralized repository that stores, processes, and secures this raw data.
- Spark layer: Apache Spark combines objects, performs pre-computations, and enriches the data.
- Data serving layer: A new element of the architecture, Catalyst selected TiDB to store preprocessed data for customer queries that directly impacts end-user experience.
The layers below the data serving layer do not have to be in real time. However, at the data serving layer, Catalyst requires sub-second latency—so customers can receive the data they need instantly.
The Results
By adopting TiDB, Catalyst’s new data architecture now provides:
- Outstanding query response time: Depending on the type of query, TiDB’s response time was up to 60x faster than its competitors. This is the most important reason Catalyst selected TiDB.
- Online DDL and schema changes: TiDB supports online DDL without impacting the online business. TiDB also offers worry-free schema changes and allows Catalyst to add or drop indexes much faster, especially for large tables.
- A single database with HTAP capabilities: TiDB is a distributed SQL database with HTAP capabilities. Of the three options evaluated, TiDB could handle both object data and time series data in a single tech stack. Not only is this highly efficient, but it also saves Catalyst a lot of time, effort, and money.
- Horizontal scalability: This requirement meets Catalyst’s business need to respond to expanding data volume. TiDB also separates its compute and storage layers, allowing Catalyst to scale out each layer separately. This helps the company control costs.
- Fast and automatic failover: TiDB uses the Raft consensus algorithm to ensure data is highly available and safely replicated throughout storage. Additionally, TiDB offers a choice of multiple disaster recovery solutions, each of which applies to different scenarios with flexible costs.
- Multi-cloud capabilities: For flexibility, Catalyst straddles two clouds: some workloads run on Google Cloud Platform (GCP), while some run on Amazon Web Services (AWS). TiDB Dedicated, a fully-managed cloud TiDB-as-a-Service solution, was a perfect fit for their needs as it can be deployed across any cloud.
Catalyst’s SaaS platform now guarantees better customer experience with faster query responses, enhanced system resilience, expanded data storage, and improved processing and analytical capabilities. The company also reduced their overall storage, operation, and cost reductions across storage, operations, and maintenance.