Rajesh G

Posted on Jan 13

What are Object Stores - simplified

[Originally published in medium]

We all know that Object stores are the backbone of the internet of cloud era and, we also know they have certain behavioral characteristics. Like scalability, immutability, eventual consistency.

But, do we know why object stores behave this way? What are they actually? Let’s find out.

Since I started using AWS S3 10 years ago, it really amazed me all that can be done using it and, all the incidents that showed how dependent the internet is on S3. The other thing that made me curious is- that there is no official architecture documentation or details for AWS S3. It’s kind of a “secret” — I guess.

First, let’s find out what an Object Store is. Per Wiki, below is the definition:

“Object storage (also known as object-based storage[1]) is a computer data storage architecture that manages data as objects, as opposed to other storage architectures like file systems which manages data as a file hierarchy, and block storage which manages data as blocks within sectors and tracks.”

Let me try to explain it in English. To understand the definition even better, we need to understand what Block Storage is. Am not going to pull the definition from Wiki for this one — let me try to explain in simple English.

What is Block Storage?

Block Storage is the fundamental storage mechanism of the computer systems where data is stored in “blocks”. The key here is that there is a central system/module that keeps track of what data is stored in which specific block.

What that means is, when a file is stored, the filesystem breaks it down to blocks and stores the data in multiple blocks. Only the filesystem knows which part of the file is stored in which block. So, any changes to the file, has to go through the filesystem only. All of this data about the blocks is the “metadata” and its centrally stored and managed by the filesystem.

For the sake of simplicity, am not going to differentiate File storage from Block storage. In a way, they are related to each other but, they are fundamentally different from Object storage.

To understand it a little better, let’s try an analogy.

Note: Some of the operations, mechanics of databases and object storage have been overly simplified for easier understanding.

Block Storage Analogy : Storage Units based on item type with one/more store keepers and, no direct access to storage details.
In this “hypothetical” analogy, imagine a multiple storage units like Book storage unit, Furniture storage unit, Electronics storage unit ..etc where people store their items. You bring your items in a box(labelled with your name and address) and hand over to the store keeper at the appropriate storage unit.

The store keepers at the storage units only understand and handle the items they are designed for — like store keeper at book storage unit only can store books and nothing else. Same with store keeper at electronics storage unit — can only store electronics.

How and where exactly inside the storage unit each of these items are stored, is a black box to you. That information is only known to the store keeper and is stored in a place accessible only by the store keeper. The store keeper needs to know what the contents of the box are. He/she records all the details of each customer details(from the box), books details, along with where exactly inside the storage unit each one of those books are stored. Let’s call this data as the “book-storage-details”.

Now, the equivalents of these hypothetical box, book, page to, the data storage in real world are file/data, record and, block respectively.

Below is an example of how book-storage-details can look like. This is the data about where each of the customer’s books are stored. It’s nothing but the metadata of the books stored. In real world, it’s the metadata of the file/data stored in the filesystem.

Scenarios — add or retrieve books.
If you have to store one or more additional books at a later point of time, you give the new books to the store keeper and he/she will make a note of the details in the book-storage-details and then store them on the shelves.

If you have to retrieve one or more books from the storage, you give the details of the book(s) you need and the store keeper will retrieve them for you. When you bring them back, the store keeper follows the same process of making note of all the metadata and storing them on one of the shelves.

In real world — this is how you interact with the file system when you want to make any changes to a saved file/data. You can retrieve one or more records and make changes to them at a record level and save them back.

Advantage — consistency. As each and every storage/retrieval is going though the store keeper, you can pretty much look at the metadata(book-storage-details) at a given point in time and exactly tell what books are stored and by whom and where. If the entry exists in the metadata(book-storage-details), then the book is stored and, if there is no entry, it is not stored. Simple! The answer to questions like “is the book stored?”, “how many total books are stored?”, “list of all books stored” ..etc are all precise and definite at a given point in time.

Advantage-efficiency. As the exchanges are at the record level, only one or more records can be retrieved or written. This is the most efficient way to operate on data.

Disadvantage-scalability. If there are a 100 customers who want to store their books or retrieve them at the same time, they will have to wait until the store keeper processes all of them one after the other. May be there can be multiple store keepers sharing the same book-storage-details can work faster and handle 10 customers at a time but eventually, there is a limit to the scalability and the process slows down as every single storage/retrieval detail has to be recorded in detail(about every single book) in a metadata store(book-storage-details).

Key here is the store keeper needs to know “what” you store. That is where the bottle neck is. They need to know “what” is inside the box -every single time.

This is how most of the databases work in real world. There is a central metadata store that keeps track of every record stored and its details(like size, location, block details etc). Every exchange with the database is “transactional” and is “consistent”. All of the exchanges have to go through a centrally stored metadata store and thats where the potential for a bottleneck is. They are precise and consistent but, do not scale linearly per the load. Once the number of writes reaches a limit, the database does not scale.

Now let’s look at how Object stores work.

What is Object storage?
Simply said — Object storage is a storage mechanism designed to tradeoff consistency of Block storage mechanism in exchange for infinite scalability.

What this means is — object storage is designed from ground up to be scalable infinitely(theoretically).

Now, why do we need infinite scalability? That’s what the Volume and Velocity of Big data are reasons for. The rate and size at which the world is generating data is too fast and too huge than what we can process with the traditional databases. We cannot ever keep-up or even catchup to the backlog and store all of that data produced without potentially losing some of the data.

Let’s look at the same Book storage unit analogy to understand how it would play out in case of Object storage.

Object Storage Analogy : Box storage with one/more store keepers and direct access to storage details.
Key words here are “box storage”, “infinite store keepers” and “direct access”. Imagine the same hypothetical storage unit example we discussed earlier but, in this case we deal with boxes and no longer with books. You can store anything in the box and it doesn’t matter. The store keepers do NOT know or care what you have inside the box. This, reduces a lot of their work — to know whats inside and record everything every time. So, practically the same number of store keepers, can do more work — which is just moving boxes in or out of the store unit.

What is this going to change? Well, it’s going to change the way you interact with the storage unit a lot. First, you can store anything inside the box. So, there is only one type of storage unit that is needed. Second, you can store/retrieve boxes much faster irrespective of how many customers are trying to store/retrieve boxes at the same time.

In this new model, you bring your contents in a box(labelled appropriately as earlier) and, handover the box to the store keeper. The store keeper then assigns a vacant aisle for you and names that aisle after you and, he/she stores the box in that aisle. The store keeper also records the information about the box like aisle details, name of the box, weight of the box, size of the box, time of storage etc and stores them in “box-storage-details”. But, this time, this information is no longer centrally managed or stored, but its stored along with your box. One copy of the information is attached to the box so that anyone can find out the details by looking at it and, one more copy of its is stored at the front desk. Also, you get to see the information in “box-storage-details”.

This will help you because, next time, when you have to interact with your box(add/remove contents), you can handover the details to the store keeper and he/she can help you with your box.

With this new model, you have to always interact with the book store keeper in terms of a “box” and no longer in terms of individual items. The store keeper no longer has any details of what books/toys/electronics you have in the box, or anything inside the box. This cuts down the requirement of keeping track of what is inside the box.

Scenarios — add or retrieve items.
If you have to add more items to the box or retrieve items from the box, you will have to retrieve the entire box, make whatever changes you need to make — add or remove books/electronics/toys and put the box back in the aisle.

Magically, the process of adding content or retrieving content is simplified. Any storekeeper can help you as you have all the details required.

Advantage — scalability. Now that the job a store keeper does is simply store the box or retrieve the box and the details about where to find your box are already provided at the time of an interaction, the process becomes linearly scalable by simple adding more store keepers as the number or concurrent customers increase. There is no longer a central metadata store or “box-storage-details” being maintained.

Periodically, may be all of the storekeepers consolidate their storage details numbers to come up with the total stats of the entire storage unit.

Disadvantage-consistency. How does this process effect consistency is the key here. Now that the information about what boxes are stored and where they are stored is stored with multiple store keepers in a distributed fashion, you cannot get a consistent answer for questions like “is the box stored?”, “how many total boxes are stored?”, “list of all boxes weighing more than 5 lbs”.

Every time you ask the question — is the box#25 from customer 1001 stored, the storekeepers have to consolidate their records to get to an answer. And while they are consolidating the records, customer 1001 might add/retrieve box#25 and that info is not recorded yet. So, the numbers are eventually consistent. Meaning, if you give enough time for storekeepers to consolidate their records after box#25 from customer 1001 is stored, then you might get a consistent answer.

That is “Eventual Consistency”. And, that is NOT a bug — it’s by design.

Disadvantage-Efficiency. Now that all of the interactions are in terms of “boxes” and not anymore in terms of individual “items” or “contents”, even if you have to add/remove of item, you will have to retrieve the entire box. This is not very efficient but, with the developments in the processing speeds of the cloud, unlimited availability of compute power and network transfer speeds, the inefficiency makes a very small dent in the overall big picture.

Advantage-Scalability. This is the big one. With this design of not needing to record details about the contents of the boxes, the speed at which storekeepers can store/retrieve boxes can be increased dramatically. This gives rise to potentially infinite scalability. The absolute answer to our question “How can we keep up with the speed at which data is being generated?”

Key here is the store keeper does not know “what” you store. That is where the bottleneck is eliminated. They only need to know “about” the box -every single time.

This is how object stores work. This is why object stores can scale infinitely and store anything. You can store flat files, songs, pictures, movies..etc

Now that we’ve solved the need for scalability, how can this be implemented in a shared infrastructure world of cloud where they have to host data from customers all over the world? How can they make sure, people do not overwrite contents created by others?

Do you know that the bucket name you pick for storing your data in AWS S3 has to be unique? By unique, what I mean is — it has to be unique across the world. No one else in the world can have the same name for their bucket.

That is interesting right? The reason is — the flat naming structure followed by AWS to name the objects.

When you store a file named “vacation_pic1.jpg” in the folder structure ///, it’s designed to make the navigation and understanding of the data stored easier on the end-users. But, in reality, there are no folders at all in AWS S3.

The actual implementation of the storage works by flattening the path into one single name and hashing it out.

So, the object “vacation_pic1.jpg” is stored as bucket1_folder1_folder2_vacation_pic1.jpg or the hash of the name of the file. So, when the starting point of the object name is made unique, whatever logical folder structure you create after that — doesn’t matter and, in the end, object names will be unique across the board.

Now, can S3 be used to store anything — yes. Then, do we still need all the other databases like Postgres, Redshift, SQL Server, Teradata, Neo4j, Arango DB, MongoDB..etc?

Yes, we do need other databases as well. Let’s discuss the need for those in a followup article. For now, let’s discuss the use cases for S3.

What are the best use cases for Object Stores?
As you can see by the design, the ability to quickly write huge datasets is possible with AWS S3. You can read huge data as well but, the data is immutable. Meaning, once you create an object, you cannot edit the contents. You will have to reproduce the entire object with what ever changes you want.

So, that is the reason, AWS S3 is not a good fit for frequently changing data. In other words, object stores are good for static data. If you produce the data once and read it multiple times, then it’s the perfect fit for it. Write once, read a million times.

How many times do you think Netflix produces a movie and edits it? May be it will edit the movie initially for a couple of times but after that a movie is a movie. So, you produce/create a movie and store it on S3, and stream it from S3 millions or even billions of times. Same with all website content. How many times would you change the logo, picture or other static content of websites? Not very much — so, the static content of almost all websites can be stored on S3.

What’s the future of Objects stores?
The reliability, cost effectiveness, infinite scalability of Object stores scream a lower total cost of ownership to store data. But, the eventual consistency and immutability stops it just short of being the only storage we would ever need. Is it?

Note** AWS S3 was eventually consistent at the time of writing this article (June-2020), its strongly consistent as of Dec-2020.

That is where, very interesting ideas came in from companies like Netflix, Google, Snowflake, Databricks..etc. These companies created virtual ACID layers on top of the Object stores making the eventual consistency and immutability of Object stores virtually non-existent.

How far these virtual ACID layers have come along in making the eventual consistency and immutability of Object stores non-existent and how they work? How did some open source solutions(Netflix’s s3mper) try to solve the eventual consistency using OLTP database systems? How does Google solve the eventual consistency using Google Spanner?

Let’s discuss these in detail in my next article.

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.

DEV Community

What are Object Stores - simplified

Top comments (0)

Read next

Understanding Python Terminology: Module, Package, Library, and Framework

Mastering Git Repositories: Initialization, Cloning, Remotes, URLs, and Submodules

Step-by-Step Guide to Installing Git on Windows, macOS, and Linux

How to Access Git Help: A Comprehensive Guide