DEV Community

Cover image for Building a Fault-Tolerant File Storage over JPEG Images
Artem
Artem

Posted on

Building a Fault-Tolerant File Storage over JPEG Images

TL;DR: The idea is to use a set of ordinary JPEG images as a distributed medium for an encrypted container. The data is split into redundant fragments and distributed across the images, allowing the container to be recovered even if some of the JPEG files are lost. We will not examine the code; instead, we will focus only on the general principle behind the approach.

Where the idea came from

Most steganographic systems follow a "one image — one file" model. If the image is lost or damaged, the embedded data is lost with it. The idea was to distribute the data container across multiple JPEG images while preserving the ability to recover the data even if some of those images are lost.

I should point out right away that this is not steganography in the traditional sense. The additional data is written to JPEG images after the EOI (End Of Image) marker, where it can be easily detected by even the simplest analysis. The goal is not to completely hide the presence of the data, but to use ordinary JPEG images as a distributed medium for an encrypted container.

At the same time, in everyday use the JPEG images remain fully functional, look like ordinary photographs, and do not attract attention. Even if the additional data is detected, it does not reveal whether it is an encrypted container, write artifacts, or simply an arbitrary sequence of bytes.

Why JPEG

First, JPEG is the most widely used image format. Photographs in this format naturally accumulate on computers and phones and are shared between people.

Second, this format has one important property: a JPEG file ends with a special EOI marker. Everything written after it is ignored during image decoding, so the additional data does not affect how the image is displayed. This is the area we will use:

      JPEG     JPEG     Additional
      data     EOI      data
       │        │        │
...░░░░░░░░░░░░░▓▒▒▒▒▒▒▒▒▒▒▒▒▒▒
Enter fullscreen mode Exit fullscreen mode

At first glance, the solution looks extremely simple: just split the encrypted container into fragments and write one fragment to the end of each JPEG file.

However, several questions immediately arise. How do we determine which JPEG files belong to the same container? How do we determine the order of the fragments during reconstruction? Where do we store the encryption keys? How do we ensure that the loss of any single JPEG file does not make the container unrecoverable? And how do we make the entire JPEG tail indistinguishable from a random sequence of bytes?

Beyond the EOI marker

It naturally follows that, in addition to the container fragment itself, each JPEG file must contain two auxiliary blocks: an encryption key block and a metadata block. This immediately leads to several design requirements.

First, the user password must not be used to encrypt the container or the metadata. They should be encrypted with a randomly generated master key, while the master key itself should be encrypted with the user's password.

Second, the master key cannot be stored in only one JPEG file. Otherwise, losing that file would make it impossible to recover the container. Therefore, every JPEG file must contain its own encryption key block.

Third, every JPEG file must also contain its own metadata block. This makes it possible to determine which JPEG files belong to the same container, which fragment is stored in each file, and in what order the fragments should be reconstructed.

Fourth, none of the service data should stand out from the rest of the data. Therefore, both the encryption keys and the metadata must be stored only in encrypted form.

Thus, when creating a container, we first ask for the user's password and generate a random master key. The container and the metadata are encrypted with the master key, while the master key itself is encrypted with the user's password. After the EOI marker, we sequentially write the encryption key block, the metadata block, and the encrypted container fragment:

      JPEG     JPEG      Encryption     Metadata     Container
      data     EOI       keys           block        fragment
       │        │         │              │            │
...░░░░░░░░░░░░░▓░░░░░░░░░░░░░░░░░░▒▒▒▒▒▒▒▒▒▒▒▒▒▒░░░░░░░░░░░░░░
Enter fullscreen mode Exit fullscreen mode

The encryption key block and the metadata block must always have a fixed size. This makes it possible to open the container using any JPEG file with a valid tail: decrypt the first block with the user's password to recover the master key, then use that key to decrypt the second block and obtain the information required to locate and reconstruct the container.

Encryption keys

Although each JPEG file contains the same master key, the encryption key blocks must be different. Otherwise, identical byte sequences would be repeated across all JPEG files.

If we simply encrypt the master key in every JPEG file using the same password, the key blocks will be identical. Therefore, we use AEAD encryption with random salt and nonce values generated separately for each JPEG file. As a result, each encryption key block has a different encrypted representation.

We will use a 32-byte master key. After AEAD encryption, its size increases to 48 bytes due to the 16-byte authentication tag. Together with the salt and nonce, the total size of the key block is 76 bytes:

salt                  16 bytes
key_nonce             12 bytes
encrypted_master_key  48 bytes
------------------------------
                      76 bytes
Enter fullscreen mode Exit fullscreen mode

The master key provides another advantage: to change the user's password, it is enough to re-encrypt the master key with the new password and update the encryption key block in each JPEG file. The entire container does not need to be re-encrypted.

Metadata

After recovering the master key, we need to know which JPEG files belong to the container, which fragment is stored in each file, and which recovery threshold was used when splitting the container into fragments:

container_uuid         16 bytes  - container identifier (UUID)
container_generation    4 bytes  - container generation (uint32)
container_threshold     2 bytes  - recovery threshold for reconstruction
shard_index             2 bytes  - index of the current fragment
shard_total             2 bytes  - total number of fragments in the container
-------------------------------
                       26 bytes
Enter fullscreen mode Exit fullscreen mode

The recovery threshold defines the minimum number of container fragments required for successful reconstruction (specified when the container is initialized). The container identifier is used to determine which JPEG files belong to which container (if files from multiple containers are accidentally mixed together). The container generation is used to select the latest recoverable version of the container in case failures occur while modifying its contents.

Thus, the total size of the metadata in plaintext is only 26 bytes. As with the key block, we use AEAD encryption. A separate random nonce is generated for each JPEG file, so even identical metadata produces a different encrypted representation.

After encryption, a 16-byte authentication tag is added to the 26 bytes of metadata, increasing the size of the encrypted metadata to 42 bytes. Together with the nonce, the total size of the block is 54 bytes:

metadata_nonce      12 bytes
encrypted_metadata  42 bytes
----------------------------
                    54 bytes
Enter fullscreen mode Exit fullscreen mode

JPEG tail

We have covered the format of the service data. It remains to decide how the user files themselves will be stored. Once that is defined, the structure of the JPEG tail will be complete.

Fragmentation algorithms operate on a single stream of bytes rather than on a collection of individual files. Therefore, before encryption, the user files (the ones that will be stored inside the container) must be combined into a single binary object. At the same time, it must be possible to obtain the list of stored files without extracting their contents. There are several possible approaches:

  1. A custom binary object format and a manifest file. It provides full control over the data structure but requires a manual implementation.
  2. A TAR archive. It packs the files into a single binary object, but a separate manifest file is required to quickly obtain information about the contents.
  3. A ZIP archive without compression. It packs the files into a single binary object while also storing all the necessary information about the contents. This is the approach we will use.

In other words, the user files are packed in memory into a ZIP archive, which is then encrypted and split into redundant fragments. Each container fragment is written after the encryption key block and the metadata block:

       Encryption Key Block            Metadata Block    Container Fragment
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
└─────────┘└───────┘└─────────────┘└───────┘└───────────┘└────────────────┘
 16 bytes  12 bytes    48 bytes    12 bytes   54 bytes        N bytes
 random    random      encrypted   random     encrypted       encrypted
 salt      nonce       master key  nonce      metadata        data shard
Enter fullscreen mode Exit fullscreen mode

The container fragment occupies all the remaining space until the end of the file. Its size is not fixed and depends on the size of the encrypted container, the number of JPEG files, and the selected recovery threshold.

Redundancy and recovery

From this point on, the scheme resembles a RAID array: the container fragments (together with the keys and metadata) are distributed across multiple independent JPEG files with a specified recovery threshold. As a result, the container can be recovered from any set of JPEG files of the current generation, provided the number of valid fragments is not below the threshold:

                      Container
                      Shards: 5
┌──────────────────────────────────────────────────┐
▒▒▒▒▒▒▓▓   ▒▒▒▒▒▒▓▓   ▒▒▒▒▒▒▓▓   ▒▒▒▒▒▒▓▓   ▒▒▒▒▒▒▓▓
  JPEG       JPEG       JPEG       JPEG       JPEG
└────────────────────────────┘   └─────────────────┘
        Threshold: 3/3              Redundancy: 2
Enter fullscreen mode Exit fullscreen mode

If some JPEG files are lost, but the number of remaining files is not below the recovery threshold, the container can not only be reconstructed but also restored to its original level of redundancy. To do this, it is enough to add new JPEG files and redistribute the container across all available media. In the project implementation, this operation is provided as a separate command that rebuilds the container and restores full redundancy relative to the current set of JPEG files.

For a Python implementation, the zfec library is an excellent choice. It provides an erasure coding mechanism — a practical implementation of the Reed–Solomon algorithm.

Container initialization

Now that the storage format is fully defined, we can move on to container initialization. To get started, we need a set of JPEG files, a user password, and a recovery threshold.

First, we verify that the files are valid JPEG files (they start with SOI), are not already being used as carriers for another container, and that the selected recovery threshold does not exceed the number of files.

Next, we generate the container identifier and the master key, create an empty ZIP archive in memory, encrypt it, and split it into fragments using erasure coding.

After that, for each JPEG file independently, we generate an encryption key block and a metadata block, append the corresponding container fragment, and write the resulting JPEG tail after the EOI marker.

The write operation is performed in two stages. First, temporary copies are created for all JPEG files, and the JPEG tails are added to them. Each temporary file is written and synchronized to disk. Only after all temporary files have been successfully prepared are they used to replace the original files one by one using POSIX semantics (an atomic rename/replace). The directory is then synchronized.

This approach does not make the group of replacements a single transaction, but it eliminates the possibility of an individual JPEG file being left partially written. After a failure, each carrier will be either in the old state or in the new state.

Container removal from the carrier files of the current recoverable generation follows the same principle: temporary versions of all JPEG files are first prepared without the JPEG tails, after which they atomically replace the original carriers.

Modifying the contents

After the container has been created, working with the files inside it reduces to updating the internal ZIP archive. All intermediate operations are performed in memory — data is written to disk only after the new version of the container has been created.

Regardless of whether we are adding a new file, deleting an existing one, or modifying its contents, the sequence of operations remains the same. First, we recover the master key using the user's password, decrypt the metadata, reconstruct the container from its fragments, and decrypt it. After making the necessary changes, we encrypt the container again, split it into a new set of fragments, update the metadata, and write the new JPEG tails using the same two-phase write procedure as during container initialization.

Each write creates a new container generation (the container_generation value in the metadata is incremented by one). This allows us to distinguish the current version of the container from previous ones in the event of an interrupted operation and restore consistency, if possible.

Inconsistent state and redundancy recovery

Suppose the container is distributed across four JPEG files, and the recovery threshold is set to two fragments:

               Generation 1
               Shards: 4
┌───────────────────────────────────────┐
░░░░░░▓▓   ░░░░░░▓▓   ░░░░░░▓▓   ░░░░░░▓▓
  JPEG       JPEG       JPEG       JPEG
└─────────────────┘
  Threshold: 2/2
Enter fullscreen mode Exit fullscreen mode

Now suppose that a modification operation is interrupted after the second fragment has been overwritten. After the failure, the available JPEG files will contain two container generations at the same time:

    Generation 1          Generation 2
    Shards: 2             Shards: 2
┌─────────────────┐   ┌─────────────────┐
░░░░░░▓▓   ░░░░░░▓▓   ▒▒▒▒▒▒▓▓   ▒▒▒▒▒▒▓▓
  JPEG       JPEG       JPEG       JPEG
└─────────────────┘   └─────────────────┘
  Threshold: 2/2        Threshold: 2/2
Enter fullscreen mode Exit fullscreen mode

Since both generations still satisfy the minimum recovery threshold, either of them can be reconstructed. This is where the generation number becomes important: when opening the container, the generation with the highest number is selected as the most recent one. After opening the container, it is enough to run the recovery command, which redistributes the container across all available JPEG files, restores full redundancy for the current set of available JPEG files, and creates a new container generation.

Now consider a less fortunate scenario. Suppose the modification operation is again interrupted after the second fragment has been overwritten, but one of the second-generation fragments is also lost for external reasons (for example, the corresponding JPEG file is accidentally overwritten in an image editor):

    Generation 1          Generation 2
    Shards: 2             Shards: 1
┌─────────────────┐   ┌─────────────────┐
░░░░░░▓▓   ░░░░░░▓▓   ▒▒▒▒▒▒▓▓   ▒▒▒▒▒▒▒▒
  JPEG       JPEG       JPEG       JPEG
└─────────────────┘   └─────────────────┘
  Threshold: 2/2        Threshold: 1/2
Enter fullscreen mode Exit fullscreen mode

In this case, the latest generation can no longer be reconstructed: only one of the required two fragments is available. However, the previous generation is still fully recoverable because it still has enough fragments. Therefore, container reconstruction will roll back to that generation. After opening the container, it is again sufficient to run the recovery command to restore the lost redundancy.

Unfortunately, recovery after failures is not always possible. If no generation has the minimum required number of fragments available, the container cannot be recovered.

For this reason, the recovery threshold should be chosen with some margin. The lower the threshold, the more losses the container can tolerate, but the more redundant data must be stored. It is always a trade-off between fault tolerance and the amount of redundancy (I usually set the threshold slightly below half of the total number of JPEG files).

Problems and limitations

Although the described approach works, it has a number of important limitations.

First, as I mentioned at the very beginning, the project is not a steganographic system. The additional data after the EOI marker can be easily detected using any editor. However, the images themselves remain unchanged, and the JPEG tail is a cryptographically random sequence of bytes from which its purpose cannot be determined.

Second, image editors and some online services completely rewrite the JPEG file, discarding the entire JPEG tail. Therefore, this container is intended primarily for local storage and file transfer without re-encoding.

Third, all operations are performed entirely in memory. When working with large containers, the amount of memory used during an operation may significantly exceed the size of the container itself. If there is not enough available RAM, the operating system may start using swap space, causing parts of the decrypted data to be temporarily written to disk.

Conclusion

The result is a small open source project implementing the idea described above. Of course, it is certainly not meant to be a serious data protection system or a steganographic tool. But as an engineering experiment, I found it quite interesting.

I tried to keep the implementation as simple as possible. The project has only two external dependencies: cryptography and zfec. You can take a look at the code or try the project yourself:

GitHub: github.com/artabramov/jpegfs
PyPI: pypi.org/project/jpegfs

Top comments (0)