DEV Community: Speedb

Key-Value Store vs Storage Engine

bosmatt — Tue, 19 Dec 2023 09:55:06 +0000

As data management and storage solutions evolve, it's imperative to understand the different technologies available. In the domain of data storage, key-value stores and storage engines are both major players. While these terms might sound similar to some, there can often be confusion surrounding their definitions and functionalities.

So what are the differences between a key-value store and a storage engine?

‍

Key-value store‍

A key-value store is a simple and flexible data storage paradigm that organizes data in a straightforward manner. It revolves around a key-value pair, where data is stored and accessed based on a unique key. Key-value stores offer excellent scalability, fast read/write operations, and flexible schema-less data modeling. They are widely used for caching, session management, user preferences, and other scenarios that require efficient data retrieval based on unique keys. Key-value stores include popular solutions like Redis and Memcached.

Key value stores usually support basic operations such as put, get, set, and delete.

Example:

Storage engine:

A storage engine, also known as a data engine, is a software component that interacts with the underlying hardware or file system to provide efficient and reliable data storage and retrieval capabilities. A storage engine is an extension of the key-value store. It also organizes data in key-value format but supports an extended set of operations and formats.

Storage engines handle the low-level details of data organization, indexing, and I/O operations, providing efficient storage and retrieval mechanisms. It can be tailored to different data models, such as document-oriented, columnar, or graph databases, and is often integrated with higher-level database systems. The main capabilities that differentiate storage engines from key-value stores are transaction support, snapshots, and the fact that the elements are ordered.

For example, S3 (Simple Service Storage) by Amazon, is a cloud storage service that implements a key-value store scheme. It can provide an object per a given key but it doesn’t support range queries. In other words, for a given key it can return the object but it can’t return the next object since the elements are not ordered.

A very common usage of a storage engine is in a database management system and it is used to create, read, update, and delete data from a database (CRUD).

A storage engine serves as the underlying engine that interacts with the DBMS, translating high-level commands into low-level data operations. It determines how data is structured, indexed, and organized, ultimately impacting the performance, reliability, and functionality of the database system.

Storage engines can be implemented using different data structures such as B-tree and LSM.
Some popular storage engine examples are InnoDB, MyISAM, Speedb, RocksDB, and WiredTiger.

Different storage engines have different characteristics that make them more or less suitable for different use cases. For example, RocksDB uses an LSM tree since it’s optimized for write-intensive workloads. InnoDB on the other hand, is based on B-tree to better support read-intensive workload and range queries.

Key value store and storage engine can both be used as embedded components.

Embedded or standalone?

‍
Since a storage engine and key-value store can be used for different purposes they can, and sometimes should be embedded in the application’s software stack. When it’s embedded it becomes a part of the application that can be replaced when you find a substitute that is more suitable for your needs. Look at this as outsourcing specific operations at your office. You don’t need to use your employees for IT services, you can use external companies to do it. Another popular example of using embedded KVS is Apache Flink. It uses RocksDB for storing states in a key-value format. RocksDB can be replaced with any other compliant storage engine in order to support heavy write workload/reduce write amplification or solve any other issue it might have.

While storage engines can only be embedded, there are some key-value store databases that are used as standalone applications, such as Redis. A storage engine needs an application to reside in it, and the application interacts with the storage engine.

So What are the differences between a Storage Engine and a Key Value Store?

‍
Stand-alone vs embedded:
While a key-value store can refer to a stand-alone application, a storage engine must be embedded in another application.

Functionality:
A key-value store is mostly used for simple operations such as get and set while a storage engine has more data management capabilities such as transactions, snapshots, and iterators:

Transactions:
Transactions are very important to ensure consistency of the data when multiple operations are being performed at the same time. Also, it allows recovery from a situation where there is inconsistency in the data.

**Snapshots: **point-in-time view of the data, allowing users to access and query the data as it existed at the time the snapshot was taken, regardless of subsequent modifications. Snapshots are useful for various purposes, such as creating consistent backups, facilitating data versioning, and enabling point-in-time analysis. Most of the key-value stores do not support snapshots, since the main focus is on simplicity, performance, or specific use cases where snapshot functionality may not be a primary requirement,

Storage engines on the other hand, commonly support snapshots as a fundamental feature. A snapshot in the context of storage engines refers to a point-in-time copy of the data, providing a consistent view of the database or file system as it existed at the time the snapshot was taken.

Organized data:

The data in the storage engine is organized in a way that we can not only find an object but also the next object since the data is structured and organized. This cannot be said for every key-value store.

‍
Here is a table that summarizes the major differences between KVS and storage engine:

Thus, what is Speedb?

‍
Speedb is an embedded key-value storage engine that writes the data in a key-value format. It can be used as an embedded key-value store in any application as well as a storage engine, since it supports all the unique capabilities storage engines support, such as transactions, snapshots, and the elements are in order, supporting range query operations.

Since Speedb is fully compatible with RocksDB and LevelDB, it can be replaced easily. What are the benefits of it? Well, this is for another blog post. In the meantime, you can read more about Speedb innovation here

To summarize:

A storage engine is an extended implementation of a key-value store. It is used as an embedded component, the same as KVS, and can manage data and metadata efficiently.

Key-value store primarily addresses the question of "how" data is stored and the format of its structure, while the storage engine pertains to the question of "what" can be accomplished with the data. In other words, the storage engine handles the diverse data operations, while the key-value store serves as a specific means of organizing and storing the data.

Storage engines implement key-value data structure but sometimes this model is extended to serve other applications needs.

Boosting Your Application with Speedb

bosmatt — Wed, 18 Oct 2023 07:09:36 +0000

Introduction:

Speedb is a modern key-value store that leverages advanced techniques to deliver exceptional performance. It is built upon RocksDB, a popular open-source storage engine, but incorporates additional optimizations to maximize performance and efficiency.
In this blog post, we will explore the problems Speedb solves and how, benefits and performance comparisons of Speedb and RocksDB to understand why it is becoming a popular choice for performance-critical applications.

What Speedb is?

Embedded key value storage engine
Open source
It has an enterprise version
Drop-in replacement for rocksDB (fully compatible)
Written in C++, API exposed in Java and C as well

What Speedb is NOT?

Stand alone application
Database Management System
Not written in Go/Rust (but wrappers are available)

Which problems Speedb solves?

Performance Hiccups - One of the pain points of RocksDB users is the hiccups that actually affect performance of the production applications. Speedb improved the mechanism of delayed writes when the write rate reaches a certain threshold. The delay is now done moderately, taking into account the limit set by the user and this changed dramatically the application behavior and not more hiccups observed while using it.

Unexpected memory usage: In order to run Speedb properly, the memory allocation should be well defined by the user. One of the parameters is the write buffer size. Which means, how much data can the application hold in memory before it flushes it to the disk. This of course affects the application behaviors. If you define a much lower size than the applican’s write rate, you can get into a situation where the storage engine can’t handle all of the writes and it will slow down until zero, which means stalls. In case your application is more sensitive to performance than you are limited in memory, you can observe high memory usage for the dirty data.

Speedb eliminates this tradeoff when using the new write buffer manager - your application will perform well, with no hiccups and without exceeding the memory limit you have defined.

The graph below compares RocksDB write buffer manager and Speedb write buffer manager with 32 column families and 4 memtables of 256MB max each. The write buffer manager is set to 4GB.
With rocksdb the actual memory consumption when the allow stalls is enabled reached 11GB and many stalls occurred, leading to unstable performance, while with Speedb it was consumed 4GB only and not stalls at all.
The graph represents a test of heavy write workload (95%) and shows the stalls reached while using RocksDB’s write buffer manager vs Speedb write buffer manager.

Read performance: RocksDB is designed for write intensive workload. But read performance should not be neglected. With the static pinning feature you can enjoy the benefits of pinning filter and index blocks to the cache without risking out of memory condition. The pinning has a major advantage over LRU cache and users barely use it because of the risk of being out of memory. Speedb forcing a cap of the pinned data and allows you to enjoy the performance without risking the application.

Usability: RocksDB is a very smart but also very complex storage engine. You sometimes need to make some configuration changes when you realize the current configuration is not optimal.

1. Live configuration changes: Speedb improves the usability of this complex data engine: when you want to make changes to the storage engine without any downtime, you can easily do it using the live configuration changes mechanism. This is like a backdoor that allows changing immutable options on the fly without affecting your application’s availability.

2. Tuning function:
with Speedb’s tuning function you can enable Speedb features and tune basic parameters for optimized performance for single and multi Speedb instances, in case you are running in a multi-db environment. The tuning function also sets the write buffer manager size for optimized dirty data management of all your instances.

3. Log parser: This is a very useful python tool to analyze the storage engine logs. It gives a high level insight on your database structure and performance profile and also provides deep understanding of internal processes such as compaction in a clear and readable format.

Enables 100% parallel read/write workload:
The Sorted hash memtable, is a new memtable type that improves read and write performance from O (logn) to O(1) without compromising on seek performance, supports parallel reads and writes since the data structure has changed from skip list to a combination to hash table and array of vectors.

Improved write flow: Speedb introduced major performance improvements and 100% parallel write support by changing the existing write flow: Instead of a global DB mutex, a read/write lock is used, and now data is written to the WAL (write ahead log) and memtables simultaneously.

Write amplification (Available on Speedb Enterprise):
Speedb Enterprise's revolutionary compaction method provides several benefits over traditional LSM trees. It significantly reduces write amplification, from 24 to 4, eliminates processing latency and throughput issues, and reduces CPU utilization and memory consumption, making it a cost-effective and efficient solution for enterprise use.

No Shards - Performance at Scale: (Available on Speedb Enterprise)
Speedb enterprise introduces a multidimensional compassion method that allows keeping extremely high performance with much larger datasets without the need to shard your database. This of course simplifies the day to day management of your environment and you can get more with less.

Speedb is also managing the biggest community for Speedb and RocksDB users.
Join us to learn more: Speedb Discord