You can't have your cake and eat it too
The capacity of storage devices is changing with each passing day. It feels like only a few years from MB to GB to TB, and it is still growing.
But I’m guessing none of this will come as a surprise to you: We’re recording, storing and managing more and more data, whether it’s personal data, work data or corporate data.
So how do you strike a balance between storing massive amounts of data and accessing that data efficiently?
For work-related data, I have options available to me—even for very large data sets.
set up
When scaling a database, whether it is on-premises or in the cloud, performance is a top priority. Without high performance, a large-scale database is nothing more than an active/semi-active archive.
If the entire data set is small and memory (DRAM) can hold it, then the performance requirements are not so high and the capacity of the storage system is not so important. However, with the huge data growth, the amount of data that can be held in memory is getting smaller and smaller under the premise of affordability. Coupled with the growing demand for faster and more detailed analysis, we have reached a data-driven crossroads: we need high performance, high capacity and affordability.
Enterprise SATA SSDs can help. Building with these SSDs allows us to future-proof our Apache Cassandra® deployment, keeping up as our active datasets grow while scaling out in terms of storage capacity. Cassandra's massive scalability, combined with multi-terabyte, high-IOPS enterprise SATA SSDs, allows us to build a high-volume NoSQL platform with massive capacity, agility, and power.
Note: Given the wide range of Cassandra deployments, Micron tested multiple workloads.
Application 1
Enterprise-class SSDs meet growing demands
When building Cassandra nodes using old-style hard disk storage, you scale out by adding more nodes to the cluster, and you scale up by upgrading to larger hard disks. Sometimes you need to do both.
Adding more old-style nodes worked (up to a point) but quickly became impractical. We gained capacity and a small performance boost, but as we added more nodes, the cluster became larger and more complex, taking up more rack space and support resources.
Upgrading to larger regular hard disks helps (to a certain extent) because you gain more capacity per node and per cluster, but the performance gains offered by such upgrades are limited.
Both approaches have high performance costs and do not scale effectively with growth.
High-capacity, fast SSDs like the Micron® 5200 Series are changing the design game. With capacities in the terabytes (TB), throughput in the megabytes per second (MB/s), and tens of thousands of IOPS on a single SSD, high-capacity, ultra-fast SSDs open up new design opportunities and performance thresholds.
Application 2
SSD Clusters: Real Results from a Large Dataset
When planning the next generation of high-volume, high-demand Cassandra clusters, SSDs can provide amazing capacity and very attractive results. Figures 1a-1c summarize the storage configurations tested by Micron.
The tests used the Yahoo! Cloud Serving Benchmark (YCSB) workloads A–D and F to compare three 4-node Cassandra test cluster configurations:
-
SSD Configuration 1: 1 Micron 5200 ECO (3.8TB each)
-
SSD Configuration 2: 2 Micron 5200 ECO (3.8TB each)
-
Old configuration: 4 15000RPM regular hard disks (300GB each)
Note: Given the wide range of Cassandra deployments, Micron tested multiple threads.
With the same number of nodes and one SSD per node, the capacity of a 1 SSD test cluster can be increased by 3 times (2 SSD test cluster can increase capacity by 6 times) compared to the old configuration. In addition, through measurement, it was found that all workloads of each SSD test cluster tested had significant performance improvements, ranging from a minimum of 1.7 times to a maximum of 10.7 times, while latency was reduced and became more consistent.
Application 3
SSD clustering provides more consistent response
Read Response Consistency: Many Cassandra deployments rely heavily on fast and consistent responses, so Micron compared the 99th percentile read response times for each test cluster and workload. The 99th percentile read latency for each configuration is shown below.
A
Workload
A workload that frequently performs update operations, where 50% of the total I/O is data write operations. At the application level, this workload is similar to recording the latest session operations .
B
Workload
Read-dominated workloads (95% read operations). At the application level, this workload is similar to adding metadata to existing content (for example, tagging an image or article).
C
Workload
Read-only workloads. At the application level, this type of workload is similar to reading a user profile or static data when the profile is built elsewhere.
D
Workload
Read the latest entries (the latest records are the most frequently accessed). At the application level, this type of workload is similar to reading user status updates.
Conclusion
High-capacity, high-performance SSDs can help Cassandra achieve amazing results. Whether you are scaling an on-premises or cloud-based Cassandra deployment for higher performance or faster and more consistent read responses, SSDs are the ideal choice.
We can achieve high performance when the data set can fit in memory, but the huge growth of data means that memory can hold less and less data economically.
We are at a crossroads: business demands drive us to seek higher performance, while data growth drives us to seek affordable capacity. Taken together, the answer is clear: Enterprise SSDs deliver outstanding results, helping to meet both performance needs and data growth needs - you can have your cake and eat it too.
Deploying SSDs in your data center is a high-value option for reducing your total cost of ownership (TCO). To see how this configuration compares to other configurations, use Micron's
Move2SSD TCO Tool
to estimate the cost savings you can achieve by deploying SSDs compared to your existing architecture
.
Featured Posts