DEV Community

Cover image for Storing Files In Today’s Cloud: What Is Object Storage
Bunnyshell
Bunnyshell

Posted on

Storing Files In Today’s Cloud: What Is Object Storage

Simply put, object storage, or object-based storage, is a data storage architecture that manages data as distinct units called objects. Each object includes the data itself, a variable amount of metadata, and a globally unique identifier.

Further, object storage devices can be aggregated into larger storage pools, and these pools can be distributed across multiple locations. This enables not only unlimited scale but also data resiliency and disaster recovery.

Because of this, object storage is used to retain and manage massive amounts of unstructured data, such as Facebook photos, Spotify songs, or Dropbox files.

How object storage works

Let’s suppose you want to store all the books in the world in a very large library system on a single platform. There are two things you need to store:

  • the contents of the books (data)
  • and the associated information like author, publisher, publication date, etc. (metadata).

Traditionally, you would do this using a relational database, organized in folders under a hierarchy of directories and subdirectories. But because we’re talking about a vast amount of books, the search and retrieval process becomes cumbersome and time-consuming when you want to access a specific book.

Instead, object storage is a better choice because the data is static or fixed (neither the contents of the books or the associated information will change).

In this case, the objects (data, metadata, and ID) will be stored as “packages” in a flat structure that’s easily located and retrieved with a single API call. In the future, as the number of books will grow, you can aggregate storage devices into larger storage pools and distribute these storage pools for unlimited scale.

Object storage use cases

Besides the storage of unstructured data, object storage can be used for:

  • storage of backup files, database dumps, and log files;
  • large data sets;
  • archiving files in place of local take drives.

Object storage vs. block storage vs. file storage

There are fundamental differences between file storage and object storage. File storage organizes data hierarchically inside directories, sub-directories, and files. It works great when the number of files is not very large and you know exactly where they’re stored.

Object storage, on the other hand, typically presents itself via a RESTful API – there’s no concept of a file system. Instead, applications save objects (files + additional metadata such as the file name, the date it was created, and the date it was last modified) to the object store using the PUT or POST API, and the object storage saves the object somewhere in the system.

Then, the application receives a unique key (think of it like the number you receive when you leave your coat in the cloakroom) for that object from the object storage platform which will be saved in the application database. If an application wants to fetch that object, all they would need to do is give the key as part of the GET API and the object would be fetched by the object storage.

Some good examples of object storage APIs are those used by Amazon S3 (the default standard for object storage access), OpenStack Swift, and Azure Blob Service REST API.

Because objects are stored in a flat address space, this makes it easier to locate and retrieve data across regions. This is why object storage has become the preferred solution for data archiving and backup.

Files in a filesystem have a broader set of functions that can be applied to them, including appending data and updating data in place. The programming model is more complex than an object store and is, nowadays at least, almost always accessed programmatically via a “POSIX” style of interface. Generally, the use of CPU and memory is very efficient and there’s a mindset that the filesystem is a private local resource.

Although NFS and SMB do allow for a filesystem to be made available as a multi-tenanted resource, developers usually avoid them because, sometimes, there are subtle differences in how they react compared to “local” filesystems despite their full support for POSIX semantics.

Another big difference between file storage and object storage is that file systems are designed with consistency in mind, and are usually accessed over low to moderate latency (50 microseconds – 50 milliseconds) networks. In contrast, object stores are distributed over a shared infrastructure, connected together over low-bandwidth high-latency wide area networks, and their time to first byte can sometimes be measured in multiples of whole seconds. Performing lots of small (4K – 16K) random reads from an object store is likely to cause frustration and performance problems.

Last but not least, maybe the biggest advantage object storage has is that you can be reasonably sure that anything you put in an object store will remain there until you ask for it again, and that it will never run out of space as long as you pay for what you use. These resources generally run at large scale with built-in replication, version control, automated recovery, etc., and only natural disasters will make the data disappear (but even then, you have easy options to make another copy in another location).

When it comes to filesystems, especially those you’re expecting you or your local team to manage, you can only hope that everything is getting backed up and that it doesn’t fill up accidentally and cause everything to melt down when you can’t update your data anymore.

Block storage is an alternative to file storage. It takes a file apart into equally-sized data blocks and then stores these blocks as separate pieces of data. Each piece of data has a different address, so they don’t need to be stored in a file structure. To access a file, the operating system uses the unique addresses to reassemble it.

Want to find out more about the benefits of Object Storage? Check out our blog!

Top comments (0)