Shifting to the cloud is great for getting products to market faster, but it can be a double-edged sword. The ease of spinning up new cloud resources is great for innovation, but the downside is that it can quickly lead to large cloud bills.
That’s especially true for cloud databases. When you don’t have to think about physical hard drive space, for example, the elasticity of cloud databases makes it almost too easy to store every last byte of data.
But there is good news. There are ways to optimize your cloud databases for cost-efficiency without sacrificing performance. In this post, we’ll look at five ways to keep your cloud database spending under control while maintaining the scalability and innovation you need.
1. Right-sizing
Some cloud databases ask you to make up-front decisions about how much RAM, CPU, and other resources your database will need. It’s tempting to over-specify, especially before you’re sure what your database usage pattern will look like.
But after launch, you’ll start to see the reality of your application’s usage patterns. Within time, you might find that you’ve overprovisioned your cloud database in a couple by focusing only on peak demand.
Provisioning for peak workloads
Cloud providers often ask you to specify resource requirements based on percentile usage, with the 99th percentile being a common benchmark. This ensures your database can handle unexpected traffic surges. However, this approach can lead to significant idle resources during normal operation.
In our ecommerce example, let’s say there was a daily flash sale. This predictable, short-term spike in activity could overwhelm a database provisioned for average daily traffic. But outside the sale period, the database would be over-provisioned, leading to wasted resources.
Right-sizing to improve cost efficiency
A database becomes inefficient or cost-ineffective when there’s a mismatch between resource allocation and workload demands. Right-sizing is all about provisioning resources efficiently. One strategy that’s particularly helpful is auto-scaling to meet demand. By switching to a serverless database, that takes care of scaling seamlessly, you can be sure that you pay only for what you use rather than what you might need. In a more traditional cloud database, use historical data to inform your autoscaling.
2. Data tiering
While right-sizing focuses on efficiently allocating compute resources (CPU, RAM), data tiering tackles another crucial aspect of cloud database optimization: storage. Traditional databases place all data on the same type of storage, which might not always be the most cost-effective or performant approach.
Data tiering balances cost and performance, allowing you to choose the right storage method depending on how and when you need to access that data.
To understand how data tiering works, we need to look at data access patterns:
- Hot data: Frequently accessed data that is critical for daily operations.
- Warm data: This data is accessed less frequently than hot data but might still be needed occasionally (e.g., historical purchase data, older customer records).
- Cold data: Rarely accessed but still valuable, this data could be useful for regulatory compliance, historical analysis, or back-ups.There are a few ways to handle data tiering. One is to manually move data between different storage types, perhaps shifting data once it hits a particular age or when usage drops off. Another approach is to use a database, such as TiDB database, that automatically moves data between fast SSDs and lower cost storage according to how often you need to access it.
3. Denormalization
Right-sizing and data tiering focus on optimizing cloud databases for cost-efficiency without sacrificing performance. Denormalization takes a different approach, specifically targeting read performance for workloads that heavily rely on fast data retrieval.
Let’s take a moment to acknowledge that denormalization might make you and other members of your team a little uncomfortable. After all, it goes against one of the core principles of relational databases because it relies on data redundancy.
However, creating copies of certain items of data for use in different contexts can help your read performance by reducing expensive JOINs. In the example of an ecommerce store, typically you’d have separate tables for customers, products, and orders. To retrieve one customer’s order history would involve a query that:
- Joins the Orders table with the Customers table: Here we link a specific customer’s order data with their corresponding customer information.
- Joins the Orders table with the Products table: Then we link each order entry with its corresponding product information.
That’s great for data integrity but multiplied across millions of queries there might be a more efficient approach. For example, when a customer makes an order, the system might write the product and customer details directly into the order table. That way, just one hit on a single row in a single table can retrieve all the data you need for an order.
4. Simplify database structure
Over time, your database can become complex and even unwieldy. Although making major changes to your database can be nerve wracking, you might be able to improve your cost efficiency by rethinking your database’s structure. Here are some approaches you might consider.
- Reduce redundancy: We just looked at how denormalization can improve read query performance but at the expense of duplicating data. But there might be some data where redundancy is harming your efficiency. Identify and eliminate data elements that are duplicated across tables without a good reason. This can shrink your database footprint, reducing storage costs.
- Archive historical data: Not all data needs to reside in your active database. Consider archiving historical data (e.g., past orders, inactive customer records) to a separate, lower-cost storage tier. This frees up space in your primary database, potentially reducing compute resource costs as well.
- Decompose underutilized tables: If you mix frequently and infrequently accessed data in the same table you should consider decomposing these tables into smaller, more focused tables. For example, an “Orders” table might contain both current and historical order data. Decomposing it into separate “Current Orders” and “Archived Orders” tables allows you to optimize the storage tier for each based on its access frequency. This approach can streamline queries and potentially reduce storage costs for less frequently accessed data.
5. Use a serverless database
So far, we’ve looked at ways that you can restructure your database and your usage. But what if you consider changing the database management system itself?
With a serverless database, like TiDB Serverless, problems such as overprovisioning go away entirely. That’s because you no longer have to consider how your data is stored, optimized, and queried. Instead, you put the data into the database, query it how you need, and everything else is taken care of. The serverless database dynamically adjusts resources based on actual demand, so you only pay for what you use. This frees your team from database management headaches, allowing them to focus on core application development and strategic data initiatives.
Experience TiDB Serverless firsthand by signing up for the generous free tier.
Spin up a Serverless database with 25GiB free resources.
TiDB Cloud Dedicated
A fully-managed cloud DBaaS for predictable workloads
TiDB Cloud Serverless
A fully-managed cloud DBaaS for auto-scaling workloads