DEV Community

Cover image for CS fundamentals: How Data Storage actually works
JoelBonetR πŸ₯‡
JoelBonetR πŸ₯‡

Posted on

CS fundamentals: How Data Storage actually works

Have you ever wondered how the digital files, photos, and documents you store on your devices are actually kept safe and accessible? Let's dive into it

Understanding Bits and Bytes:

At the core of data storage is the concept of bits and bytes. A bit is the smallest unit of digital information, representing either a 0 or a 1.

Binary Code:

Oftentimes is useful to think of it as Booleans (true/false) and, while they can represent primitive Booleans, this is true or false, by grouping them one can represent numbers, text, images and sounds.

All the data you see on your screen is ultimately represented in binary code - a series of 0s and 1s.

Your computer, smartphone or any other device, reads this code and translates it into the text, images, and sounds you interact with daily.

When eight bits come together, they form a byte, which can represent a character, number, or a small piece of data.

Fractions

The most common method for representing fractions in computers is through floating-point representation. Floating-point numbers consist of two main components: the sign bit, which represents whether the number is positive or negative, and the fractional part, which represents the actual value of the fraction.

E.g. the decimal fraction 0.75. In binary, this is represented as 0.11.
The sign bit would indicate the sign of the fraction

Not all fractions can be precisely represented in the binary system. This can lead to rounding errors and imprecision, especially when performing arithmetic operations involving fractions.

File Systems:

Think of a file system as the organization structure for your data. It manages how data is stored, retrieved, and organized on a storage device.

Without a file system, data placed on a storage medium would be a large body of data with no way of knowing where one piece of data ends and the next begins.

Some examples that may ring a bell are:

  • NTFS (New Technology File System) mainly used on Windows systems
  • EXT4 (Extended File System) mainly used on Linux systems
  • APFS (Apple File System) mainly used on MacOSX systems
  • FAT32 (File Allocation Table) which is currently at the verge of extinction.

Storage Devices:

Your computer, smartphone, or external hard drive uses various storage devices to keep your data safe. The two main types are:

Hard Disk Drives (HDD): A stack of spinning disks with magnetic surfaces. Data is stored on these disks as magnetized areas, and a tiny arm reads or writes data as the disks spin.

Solid State Drives (SSD): These use memory chips to store data. Since there are no moving parts, SSDs are faster and more durable than HDDs.

SSD M.2 are just SSD that are connected to the motherboard through M.2 port, an expansion port of the PCI family

Data compression:

As the devices to store data are limited, reducing the size of our data is oftentimes useful.
We can use two methods to achieve data compression:

  • Lossless Compression retains all the original data when decompressed. Common algorithms include ZIP and GZIP. It's ideal for text files and documents.
  • Lossy Compression: This sacrifices some data to achieve higher compression ratios. It's often used for multimedia files like images, audio, and video. Examples include JPEG for images, MP3 for audio or MP4 for video.

But how does data compression works?

1- Identifying Redundancy: Compression algorithms exploit redundancies in data, which can be either spatial (repeating patterns within the data) or temporal (repeating patterns over time).

2- Encoding and Replacing: Once redundancies are identified, the algorithm encodes them more efficiently. For example, it might replace repeated patterns with shorter codes or use mathematical representations.

Decompression

If we have compression, decompression must exist as well.
To compress and decompress data processing power is required, hence it has some cost in terms of electricity and time, for that reason not everything is compressed and not everything is compressed the same way or to the same extent.

Following the same logic, we have two techniques:

  • Lossless Decompression: In lossless compression, the original data is fully restored during decompression, ensuring no loss of information.

  • Lossy Decompression: In lossy compression, some data may be lost, but the decompressed file should be perceptually similar to the original.

For developers: you might decide when to GZIP a JSON from a microservice (or other) depending on the amount of data returned. Is usually a matter of minutes to enable compression in your server and some time more to benchmark if the download time of your new "gzipped" JSON has shortened more or less than the time the client device lasts in decompressing it.

Cloud Storage:

In addition to local storage on your devices, many of us use cloud storage. This involves storing your data on servers maintained by companies like Google, Dropbox, or Microsoft to mention some.

Ultimately, data will be stored in devices like the ones mentioned above with filesystems like the ones mentioned above and in binary code like mentioned above.
The difference is that you don't need the physical device but an internet connection to interact with your data.

Backups and Redundancy:

To ensure your precious data is safe, it's crucial to make backups. This involves creating duplicate copies of your files, either on an external drive or in the cloud.
Some storage systems also use redundancy, where data is duplicated across multiple drives for added security.

If you want to configure a system with redundancy you might be interested in researching topics like RAID (Redundant Array of Independent Disks) and NAS (Network Attached Storage).

Closing

Remember, every time you save a document or snap a photo, you're engaging with the intricate world of data storage. It's a mix of hardware, software, and clever organization that keeps your digital life intact and easily accessible.

Feel free to ask if you have any questions or want more details on a specific aspect!

Cheers,

Top comments (2)

Collapse
 
efpage profile image
Eckehard

If you are using images on a web page, things are often more complicated. Data may come from a database or from a file storage. As the internet is still a bottleneck, file access may be different depending on the file size. Smaller packages are stored directly in a database, while larger files are stored in the file system.

To get a better performance, large files are often delivered asynchronously, so the user does not have to wait for all data do be loaded. Modern CMS often recalculate images before delivery, so you can store your images in maximum resolution, but deliver a much smaller file size if needed. So, the browser can ask for a certain file size and will get only what he needs.

We should also mention that safety is crucial if you store or retrieve data over the internet. You would not want anybody to store dangerous data (like viruses) on your server, but also would not want anybody to get data from your server that should not be visible to the public. So, you always need a layer above the file storage, that controls file access.

Collapse
 
ant_f_dev profile image
Anthony Fung

Great overview!

It's worth noting that while a byte can represent a character, it's from a (relatively) small set; for the extended character sets in Unicode, we may need more than one byte.