Author's Note:
I am a software engineer based in Japan. This article is an English translation (with AI assistance for clarity) of a practical technical guide originally written for the Japanese developer community platform, Qiita.
Recently, I introduced my project, D-MemFS, on Reddit (r/Python), where it sparked intense discussions. This response confirmed that in-memory I/O bottlenecks and OOM crashes are truly universal pain points for developers everywhere. Therefore, I decided to cross the language barrier and share these concrete solutions globally.
🧭 About this Series: The Two Sides of Development
In Japan, I publish this series across two distinct platforms to serve different developer needs. To provide the complete picture here on Dev.to, I've brought them together as two "Sides":
- Side A (Practical / originally on Qiita): Focuses on the "How". Implementation details, benchmarks, and concrete solutions for practical use cases.
- Side B (Philosophy / originally on Zenn): Focuses on the "Why". The development war stories, design decisions, and how I collaborated with AI through Specification-Driven Development (SDD).
Introduction
I kept running into the exact same problem across different projects.
Whether I was trying to mock a file system for testing, or handle temporary files in a CI pipeline without touching the physical disk, my first instinct was always to reach for io.BytesIO. It feels like the standard, "Pythonic" way to handle in-memory data.
But let's be honest: the moment you try to do something even slightly complex—like handling multiple files or directories—BytesIO quickly shows its limitations and starts making your life miserable.
In this article, I will categorize exactly why BytesIO falls short for these use cases, and introduce D-MemFS—a pure Python in-memory file system library I built to solve these exact frustrations.
The Limitations of BytesIO
It's Just a Single Buffer
io.BytesIO provides a read/write interface for a single mutable byte sequence. While it feels like handling a file, it is ultimately just "a single buffer."
from io import BytesIO
buf = BytesIO()
buf.write(b"hello, world")
buf.seek(0)
print(buf.read()) # b'hello, world'
Where this falls short is when you want to handle multiple files.
Attempting to Substitute with dict[str, BytesIO]...
When developers want to handle multiple files in memory, a common stopgap measure is using a dictionary. I've seen this workaround in countless codebases (and yes, I'm fully guilty of writing it myself in the past).
from io import BytesIO
# A simple in-memory "file system"
vfs: dict[str, BytesIO] = {}
def vfs_write(path: str, data: bytes) -> None:
buf = BytesIO(data)
vfs[path] = buf
def vfs_read(path: str) -> bytes:
buf = vfs[path]
buf.seek(0)
return buf.read()
vfs_write("config/settings.json", b'{"debug": true}')
vfs_write("data/input.csv", b"id,name\n1,Alice\n")
print(vfs_read("config/settings.json"))
This looks like it works at first glance, but problems arise immediately.
| Desired Feature | Situation with dict[str, BytesIO]
|
|---|---|
| Directory creation & listing | Must pseudo-implement using key prefixes |
| Deleting a sub-tree | Must manually implement {k: v for k, v in vfs.items() if not k.startswith(prefix)}
|
| Memory limits (Quota) | None. Memory piles up infinitely |
| Thread-safe reads/writes | None. Must write your own locks |
| File stat (size, modified time) | Must manage it manually |
| Append mode | Must manually buf.seek(0, 2)
|
If you seriously try to implement a directory structure, you end up in a state where you are essentially "building your own file system." Moreover, whoever does this will inevitably introduce subtle bugs.
Another Pitfall: Memory Swells Infinitely
When processing massive amounts of data in memory, BytesIO cannot set a memory limit. You won't notice until the process crashes with an Out-Of-Memory (OOM) error. It becomes a breeding ground for troubles that don't reproduce in the test environment but only crash in production.
I Built D-MemFS
To solve all of the above problems entirely, I built D-MemFS.
- Zero external dependencies (standard library only)
- Hierarchical directory structure
- Hard Quotas (rejects writes before memory is allocated)
- Thread-safe via RW locks
- Asynchronous wrappers (
AsyncMemoryFileSystem)
pip install D-MemFS
Basic Usage
mkdir / open / write / read
from dmemfs import MemoryFileSystem
mfs = MemoryFileSystem()
# Create a directory
mfs.mkdir("/work/data")
# Write to a file
with mfs.open("/work/data/hello.txt", "wb") as f:
f.write(b"Hello, D-MemFS!\n")
# Read from a file
with mfs.open("/work/data/hello.txt", "rb") as f:
print(f.read()) # b'Hello, D-MemFS!\n'
# stat information
st = mfs.stat("/work/data/hello.txt")
print(st["size"]) # 16
print(st["is_dir"]) # False
print(st["modified_at"]) # Unix timestamp (float)
Directory Operations
# List directory contents
for entry in mfs.listdir("/work/data"):
print(entry) # 'hello.txt'
# Delete an entire sub-tree
mfs.rmtree("/work")
# Check existence
print(mfs.exists("/work")) # False
Handling Text Files
By default it's binary mode only, but you can use MFSTextHandle for text I/O.
from dmemfs import MemoryFileSystem, MFSTextHandle
mfs = MemoryFileSystem()
mfs.mkdir("/logs")
# Write text
with mfs.open("/logs/app.log", "wb") as f:
# Wrap the binary handle with the text wrapper
th = MFSTextHandle(f, encoding="utf-8")
th.write("Started\n")
th.write("Processing completed\n")
# Read text
with mfs.open("/logs/app.log", "rb") as f:
th = MFSTextHandle(f, encoding="utf-8")
for line in th:
print(line, end="")
Preventing Runaways with Quotas
By simply passing the max_quota parameter, you can restrict memory usage before writing.
from dmemfs import MemoryFileSystem, MFSQuotaExceededError
# Set a 1 MiB quota
mfs = MemoryFileSystem(max_quota=1 * 1024 * 1024)
mfs.mkdir("/data")
try:
with mfs.open("/data/big.bin", "wb") as f:
# Raises an exception here if it tries to exceed 1 MiB
f.write(b"x" * (2 * 1024 * 1024))
except MFSQuotaExceededError as e:
print(f"Quota exceeded: {e}")
# -> Quota exceeded: quota exceeded (limit=1048576, used=0, requested=2097152)
MFSQuotaExceededError occurs before the write is executed. Files won't be polluted in a half-written state.
You can also set a limit on the maximum number of files (nodes).
from dmemfs import MemoryFileSystem, MFSNodeLimitExceededError
mfs = MemoryFileSystem(max_nodes=4)
mfs.mkdir("/data")
mfs.open("/data/a.txt", "xb").close()
mfs.open("/data/b.txt", "xb").close()
try:
mfs.open("/data/c.txt", "xb").close()
except MFSNodeLimitExceededError as e:
print(f"Node limit exceeded: {e}")
Batch Operations with import_tree / export_tree
It also features functionalities to import and export files all at once.
import zipfile
import io
from dmemfs import MemoryFileSystem
mfs = MemoryFileSystem()
# Import in bulk using dict[str, bytes] format
mfs.import_tree({
"/snapshot/config.json": b'{"debug": true}',
"/snapshot/data.csv": b"id,name\n1,Alice\n",
})
# Processing...
with mfs.open("/snapshot/config.json", "rb") as f:
config = f.read()
# Export all at once and write to a ZIP file
exported = mfs.export_tree("/snapshot")
buf = io.BytesIO()
with zipfile.ZipFile(buf, "w") as zf:
for path, data in exported.items():
zf.writestr(path.lstrip("/"), data)
with open("snapshot.zip", "wb") as f:
f.write(buf.getvalue())
It naturally supports the pattern: "Do all processing in memory, and export only at the very end."
Comparison Summary
| Feature | BytesIO |
dict[str, BytesIO] |
D-MemFS |
|---|---|---|---|
| Single-file I/O | ✅ | ✅ | ✅ |
| Hierarchical directories | ❌ | △ Pseudo-implementation | ✅ |
| Directory listing | ❌ | △ Manual implementation | ✅ |
| Subtree deletion | ❌ | △ Manual implementation | ✅ |
| Append mode | △ Manual seek | △ Manual seek | ✅ |
| Hard Quota | ❌ | ❌ | ✅ |
| stat (size, time) | ❌ | ❌ | ✅ |
| Thread safety | ❌ | ❌ | ✅ |
| External dependencies | None | None | None (stdlib only) |
Installation
pip install D-MemFS
Python 3.11+ is required. Zero external dependencies.
Conclusion
BytesIO is excellent as a "single buffer," but it is not a substitute for a file system. If you try to build your own with dict[str, BytesIO], you will step on the same bugs every time.
D-MemFS is designed to be used wherever an in-memory file system is needed—testing, CI, temporary data processing pipelines (limited to processing within the Python process). First, pip install D-MemFS and try replacing TemporaryDirectory with it.
🔗 Links & Resources
- GitHub: https://github.com/nightmarewalker/D-MemFS
- Original Japanese Article: BytesIO じゃダメな理由 — Python インメモリ FS ライブラリを作った話
If you find this project interesting, a ⭐ on GitHub would be the best way to support my work!
Top comments (0)