Apache HBase - Snappy Compression

#bigdata #apache #hbase #snappy

Overview

Apache HBase provides the ability to perform realtime random read/write access to large datasets. HBase is built on top of Apache Hadoop and can scale to billions of rows and millions of columns. One of the features of HBase is to enable different types of compression for a column family. It is recommended that testing be done for your use case, but this blog shows how Snappy compression can reduce storage needs while keeping the same query performance.

Evidence

Below are some images from some clusters where testing was done with Snappy compression. The charts show a variety of metrics from storage size to system metrics.

Conclusion

The charts above show >80% storage saving while only seeing a slight bump in mutate latencies. The clusters that this was tested on were loaded with simulated data and load. The production data matched this when deployed as well. This storage savings also helped backups and disaster recovery since we didn’t need to move as much data across the wire. References for implementing this yourself with more options for testing are below.

References

Do your career a big favor. Join DEV. (The website you're on right now)

It takes one minute, it's free, and is worth it for your career.

Get started

Community matters

DEV Community

Apache HBase - Snappy Compression

Overview

Evidence

Conclusion

References

Top comments (0)

Read next

Navigating Financial Stability in Open Source Projects

Statistical Method Improves Classification Accuracy by Handling Data Uncertainty Through Hypothesis Testing

AI Model Slashes Complex 5G Signal Processing While Matching Traditional Performance

AI Models Learn When to Say "I Don't Know" with New Safety System

Okay