DEV Community

D
D

Posted on

[Side A] Why BytesIO Isn't Enough — Building a Python In-Memory FS Library

Author's Note:
I am a software engineer based in Japan. This article is an English translation (with AI assistance for clarity) of a practical technical guide originally written for the Japanese developer community platform, Qiita.
Recently, I introduced my project, D-MemFS, on Reddit (r/Python), where it sparked intense discussions. This response confirmed that in-memory I/O bottlenecks and OOM crashes are truly universal pain points for developers everywhere. Therefore, I decided to cross the language barrier and share these concrete solutions globally.

🧭 About this Series: The Two Sides of Development

In Japan, I publish this series across two distinct platforms to serve different developer needs. To provide the complete picture here on Dev.to, I've brought them together as two "Sides":

  • Side A (Practical / originally on Qiita): Focuses on the "How". Implementation details, benchmarks, and concrete solutions for practical use cases.
  • Side B (Philosophy / originally on Zenn): Focuses on the "Why". The development war stories, design decisions, and how I collaborated with AI through Specification-Driven Development (SDD).

Introduction

I kept running into the exact same problem across different projects.

Whether I was trying to mock a file system for testing, or handle temporary files in a CI pipeline without touching the physical disk, my first instinct was always to reach for io.BytesIO. It feels like the standard, "Pythonic" way to handle in-memory data.

But let's be honest: the moment you try to do something even slightly complex—like handling multiple files or directories—BytesIO quickly shows its limitations and starts making your life miserable.

In this article, I will categorize exactly why BytesIO falls short for these use cases, and introduce D-MemFS—a pure Python in-memory file system library I built to solve these exact frustrations.

The Limitations of BytesIO

It's Just a Single Buffer

io.BytesIO provides a read/write interface for a single mutable byte sequence. While it feels like handling a file, it is ultimately just "a single buffer."

from io import BytesIO

buf = BytesIO()
buf.write(b"hello, world")
buf.seek(0)
print(buf.read())  # b'hello, world'
Enter fullscreen mode Exit fullscreen mode

Where this falls short is when you want to handle multiple files.

Attempting to Substitute with dict[str, BytesIO]...

When developers want to handle multiple files in memory, a common stopgap measure is using a dictionary. I've seen this workaround in countless codebases (and yes, I'm fully guilty of writing it myself in the past).

from io import BytesIO

# A simple in-memory "file system"
vfs: dict[str, BytesIO] = {}

def vfs_write(path: str, data: bytes) -> None:
    buf = BytesIO(data)
    vfs[path] = buf

def vfs_read(path: str) -> bytes:
    buf = vfs[path]
    buf.seek(0)
    return buf.read()

vfs_write("config/settings.json", b'{"debug": true}')
vfs_write("data/input.csv", b"id,name\n1,Alice\n")
print(vfs_read("config/settings.json"))
Enter fullscreen mode Exit fullscreen mode

This looks like it works at first glance, but problems arise immediately.

Desired Feature Situation with dict[str, BytesIO]
Directory creation & listing Must pseudo-implement using key prefixes
Deleting a sub-tree Must manually implement {k: v for k, v in vfs.items() if not k.startswith(prefix)}
Memory limits (Quota) None. Memory piles up infinitely
Thread-safe reads/writes None. Must write your own locks
File stat (size, modified time) Must manage it manually
Append mode Must manually buf.seek(0, 2)

If you seriously try to implement a directory structure, you end up in a state where you are essentially "building your own file system." Moreover, whoever does this will inevitably introduce subtle bugs.

Another Pitfall: Memory Swells Infinitely

When processing massive amounts of data in memory, BytesIO cannot set a memory limit. You won't notice until the process crashes with an Out-Of-Memory (OOM) error. It becomes a breeding ground for troubles that don't reproduce in the test environment but only crash in production.

I Built D-MemFS

To solve all of the above problems entirely, I built D-MemFS.

  • Zero external dependencies (standard library only)
  • Hierarchical directory structure
  • Hard Quotas (rejects writes before memory is allocated)
  • Thread-safe via RW locks
  • Asynchronous wrappers (AsyncMemoryFileSystem)
pip install D-MemFS
Enter fullscreen mode Exit fullscreen mode

Basic Usage

mkdir / open / write / read

from dmemfs import MemoryFileSystem

mfs = MemoryFileSystem()

# Create a directory
mfs.mkdir("/work/data")

# Write to a file
with mfs.open("/work/data/hello.txt", "wb") as f:
    f.write(b"Hello, D-MemFS!\n")

# Read from a file
with mfs.open("/work/data/hello.txt", "rb") as f:
    print(f.read())  # b'Hello, D-MemFS!\n'

# stat information
st = mfs.stat("/work/data/hello.txt")
print(st["size"])         # 16
print(st["is_dir"])       # False
print(st["modified_at"])  # Unix timestamp (float)
Enter fullscreen mode Exit fullscreen mode

Directory Operations

# List directory contents
for entry in mfs.listdir("/work/data"):
    print(entry)  # 'hello.txt'

# Delete an entire sub-tree
mfs.rmtree("/work")

# Check existence
print(mfs.exists("/work"))  # False
Enter fullscreen mode Exit fullscreen mode

Handling Text Files

By default it's binary mode only, but you can use MFSTextHandle for text I/O.

from dmemfs import MemoryFileSystem, MFSTextHandle

mfs = MemoryFileSystem()
mfs.mkdir("/logs")

# Write text
with mfs.open("/logs/app.log", "wb") as f:
    # Wrap the binary handle with the text wrapper
    th = MFSTextHandle(f, encoding="utf-8")
    th.write("Started\n")
    th.write("Processing completed\n")

# Read text
with mfs.open("/logs/app.log", "rb") as f:
    th = MFSTextHandle(f, encoding="utf-8")
    for line in th:
        print(line, end="")
Enter fullscreen mode Exit fullscreen mode

Preventing Runaways with Quotas

By simply passing the max_quota parameter, you can restrict memory usage before writing.

from dmemfs import MemoryFileSystem, MFSQuotaExceededError

# Set a 1 MiB quota
mfs = MemoryFileSystem(max_quota=1 * 1024 * 1024)

mfs.mkdir("/data")

try:
    with mfs.open("/data/big.bin", "wb") as f:
        # Raises an exception here if it tries to exceed 1 MiB
        f.write(b"x" * (2 * 1024 * 1024))
except MFSQuotaExceededError as e:
    print(f"Quota exceeded: {e}")
    # -> Quota exceeded: quota exceeded (limit=1048576, used=0, requested=2097152)
Enter fullscreen mode Exit fullscreen mode

MFSQuotaExceededError occurs before the write is executed. Files won't be polluted in a half-written state.

You can also set a limit on the maximum number of files (nodes).

from dmemfs import MemoryFileSystem, MFSNodeLimitExceededError

mfs = MemoryFileSystem(max_nodes=4)
mfs.mkdir("/data")

mfs.open("/data/a.txt", "xb").close()
mfs.open("/data/b.txt", "xb").close()

try:
    mfs.open("/data/c.txt", "xb").close()
except MFSNodeLimitExceededError as e:
    print(f"Node limit exceeded: {e}")
Enter fullscreen mode Exit fullscreen mode

Batch Operations with import_tree / export_tree

It also features functionalities to import and export files all at once.

import zipfile
import io
from dmemfs import MemoryFileSystem

mfs = MemoryFileSystem()

# Import in bulk using dict[str, bytes] format
mfs.import_tree({
    "/snapshot/config.json": b'{"debug": true}',
    "/snapshot/data.csv": b"id,name\n1,Alice\n",
})

# Processing...
with mfs.open("/snapshot/config.json", "rb") as f:
    config = f.read()

# Export all at once and write to a ZIP file
exported = mfs.export_tree("/snapshot")
buf = io.BytesIO()
with zipfile.ZipFile(buf, "w") as zf:
    for path, data in exported.items():
        zf.writestr(path.lstrip("/"), data)
with open("snapshot.zip", "wb") as f:
    f.write(buf.getvalue())
Enter fullscreen mode Exit fullscreen mode

It naturally supports the pattern: "Do all processing in memory, and export only at the very end."

Comparison Summary

Feature BytesIO dict[str, BytesIO] D-MemFS
Single-file I/O
Hierarchical directories △ Pseudo-implementation
Directory listing △ Manual implementation
Subtree deletion △ Manual implementation
Append mode △ Manual seek △ Manual seek
Hard Quota
stat (size, time)
Thread safety
External dependencies None None None (stdlib only)

Installation

pip install D-MemFS
Enter fullscreen mode Exit fullscreen mode

Python 3.11+ is required. Zero external dependencies.

Conclusion

BytesIO is excellent as a "single buffer," but it is not a substitute for a file system. If you try to build your own with dict[str, BytesIO], you will step on the same bugs every time.

D-MemFS is designed to be used wherever an in-memory file system is needed—testing, CI, temporary data processing pipelines (limited to processing within the Python process). First, pip install D-MemFS and try replacing TemporaryDirectory with it.


🔗 Links & Resources

If you find this project interesting, a ⭐ on GitHub would be the best way to support my work!

Top comments (0)