Posted on Apr 8 • Originally published at zenn.dev

[Side B] Should a Binary-Only FS Support Text Mode? Redrawing the Architecture Boundary

#python #opensource #architecture #design

From the Author:
D-MemFS was featured in Python Weekly Issue #737 (March 19, 2026) under Interesting Projects, Tools and Libraries. Being picked up by one of the most widely-read Python newsletters confirmed that in-memory I/O bottlenecks and memory management are truly universal challenges for developers everywhere. This series is my response to that interest.

🧭 About this Series: The Two Sides of Development

In Japan, I publish this series across two distinct platforms to serve different developer needs. To provide the complete picture here on Dev.to, I've brought them together as two "Sides":

Side A (Practical / originally on Qiita): Focuses on the "How". Implementation details, benchmarks, and concrete solutions for practical use cases.
Side B (Philosophy / originally on Zenn): Focuses on the "Why". The development war stories, design decisions, and how I collaborated with AI through Specification-Driven Development (SDD).

What the Design Document Said

D-MemFS's design principles include a clause in Article 3:

MFS is dedicated to "pure byte-sequence virtual hierarchy management and resource control." Text encoding, encryption, and physical persistence are delegated to upper-layer boundary controllers.

And in the open() mode specification:

Text mode (r, w, etc. — modes without the binary suffix b) shall raise ValueError.

In other words, text mode was explicitly excluded by design.

The reasoning is straightforward and consistent. A memory filesystem's job is to manage byte sequences. Encoding is an application-layer concern — the filesystem layer has no business managing it. Unix VFS doesn't know about text either. Introducing text mode would drag in encoding selection, newline translation, and multibyte boundary handling — a significant expansion of responsibility. There was no good reason to make that trade-off at the outset.

Design Correctness Doesn't Stop Reality From Punching Back

The design reviews were thorough. The implementation was solid. 346 test cases passed. Coverage exceeded 97%. We were in final polish mode before release.

Then I paused and asked myself:

"What will users actually think of this?"

Imagine being a user wanting to work with log files, config files (.ini, .json, .yaml) in memory with D-MemFS. No text mode. How do you read a config?

with mfs.open("/config/settings.json", "rb") as f:
    text = f.read().decode("utf-8")
    config = json.loads(text)

…It works. But am I really going to write .decode() every single time? And the writes too?

data = json.dumps(config, ensure_ascii=False).encode("utf-8")
with mfs.open("/config/settings.json", "wb") as f:
    f.write(data)

It works, but it's not natural.

Particularly for the ZIP-in-memory use case — one of D-MemFS's primary advertised strengths is "extract a ZIP into memory and process its contents." If every text file read after extraction requires an explicit .decode(), that felt like a lack of care toward the user.

"Just Use `io.TextIOWrapper`?" — Not That Simple

Python's standard library provides io.TextIOWrapper, which wraps a binary stream to provide text I/O — used everywhere under the hood in normal file operations.

"Just wrap the handle with TextIOWrapper(handle) — problem solved," I initially thought. But trying it revealed three distinct problems.

Problem 1: The `readinto()` Requirement

io.TextIOWrapper requires readinto() on its target for internal buffering. MemoryFileHandle was designed with a clean interface of read() / write() / seek() / tell() — no readinto().

Adding it is technically possible, but readinto() is tightly coupled to memoryview and the buffer protocol, which would ripple through the memory management design. Modifying the internal buffer architecture just to support text mode felt like the tail wagging the dog.

Problem 2: Buffering vs. Quota Incompatibility

TextIOWrapper maintains an internal buffer. Buffered writes are not reflected in the quota system until flush() is called. This creates a window where "the quota is technically exceeded, but writes are still succeeding."

D-MemFS has a hard quota. "Reject before writing" is the cornerstone of its design. Buffering-induced delayed accounting directly contradicts this principle.

Problem 3: The `seek()` Cookie Problem

TextIOWrapper's seek() uses opaque "cookies" — not byte offsets. You can only seek() to positions previously returned by tell(). This is fundamentally incompatible with MemoryFileHandle, which navigates freely by byte offset.

Solution: Build a Lightweight Wrapper — Without a Buffer

The conclusion was to build MFSTextHandle, a lightweight custom wrapper. Its design is guided by three principles:

No internal buffer. Every write() is immediately delegated to the binary handle. Quota checks are always immediate. The hard quota contract is never broken.

Encoding/decoding is the only responsibility. MFSTextHandle does exactly one thing: convert between strings and bytes. It holds no file operation logic, no lock management. Everything is delegated to the underlying MemoryFileHandle.

No close() responsibility. Handle lifecycle is managed by MemoryFileHandle's with statement. MFSTextHandle is a thin adapter — nothing more.

with mfs.open("/data/hello.txt", "wb") as f:
    th = MFSTextHandle(f, encoding="utf-8")
    th.write("Hello, world\n")

This is a helper used exclusively within a with mfs.open(...) scope. No separate lifecycle management of MFSTextHandle is required.

`read(size)` Is an Approximation in Characters

There is one limitation that deserves honest acknowledgment.

The size argument to read(size). In Python's standard text mode, size means character count. But because MFSTextHandle has no internal buffer, it passes size to handle.read(size) as a byte count.

For UTF-8 multibyte characters (e.g. Japanese), this means fewer characters than requested may be returned.

# In UTF-8, one Japanese character = 3 bytes
th.read(3)  # Reads 3 bytes → returns "こ" (1 character) — fine
th.read(1)  # Reads 1 byte → potential decode error mid-multibyte sequence

This limitation is explicitly documented in the docstring:

Note that this is an approximation in characters, not bytes.

It is not perfect. But for the most common use case — read() with no argument to read the entire file — it is not an issue. And readline(), which detects newlines byte by byte, is also unaffected by multibyte boundaries.

`readline()` and Three-Way Newline Handling

One more detail that required careful implementation: newline detection.

POSIX uses \n, old Mac used \r, Windows uses \r\n. MFSTextHandle.readline() recognizes all three.

The implementation is deliberately simple. Read one byte at a time, stop at \n or \r. If \r is found, peek at the next byte: if it's \n, consume it (treating \r\n as a single newline); otherwise, seek back.

if b == b"\r":
    next_b = self._handle.read(1)
    if next_b == b"\n":
        buf.extend(next_b)    # treat \r\n as a single newline
    elif next_b:
        self._handle.seek(self._handle.tell() - 1)  # put it back

Is reading one byte at a time slow? In theory, yes. But this is memory — not disk I/O. Reading one byte at a time from memory is a fundamentally different order of magnitude from disk. In practice, this straightforward implementation is not a bottleneck.

Did We Violate the Design Principle?

This is the question I thought hardest about.

Design principle Article 3 says "text encoding is delegated upward." MFSTextHandle handles text encoding. Isn't that a contradiction?

I concluded it isn't. Here's why.

MFSTextHandle is not a method of MemoryFileSystem. open() did not start accepting text modes. open() still accepts only rb, wb, ab, and r+b — pass a text mode and you get ValueError, same as before.

MFSTextHandle is an optional utility class. Users import it explicitly when they want it. It does not intrude into the filesystem layer's design at all.

In other words, the "MFS shall" scope of the design principle is fully preserved. Text processing was provided as a separate class outside of MFS. The design philosophy was not bent — the boundary was redrawn more precisely, and usability was added on top.

The Boundary Between Design Purity and Practicality

When building a library, there will always be a moment where "design purity" and "ease of use" collide.

Faithfulness to principle matters. But if following a principle forces unnecessary friction on users, it is worth re-examining where that principle applies — not whether it applies.

The key, I believe, is not to "bend" the principle but to "redraw the boundary accurately":

The filesystem layer (MemoryFileSystem) remains binary-only → maintained
Text conversion is provided as a separate class (MFSTextHandle) → added
No internal buffer; quota immediacy is preserved → consistent with design principles
open() mode specification is unchanged → API contract maintained

I did not change the principle written in the design document. I defined more precisely where that principle applies.

Closing

Adding MFSTextHandle is a small feature. _text.py is only 136 lines.

But the decision to add those 136 lines was where I spent the most time. Because it wasn't merely a feature addition — it was a dialogue with the design philosophy.

I wrote the design document. I reviewed it with AI repeatedly. I implemented, tested, and made 346 test cases pass. At the very end, I asked myself: "Am I violating the principles I've defended throughout this entire process?" And I needed to be able to answer — to myself — "No, here's why."

That ability to explain yourself is, I believe, the point of writing a design document in the first place.

Without one, this decision would have ended at "seemed useful so I added it." With one, I could ask "how does this align with the principles?" and actually answer it.

A design document isn't only useful for deciding what to include. It's useful for explaining why inclusion is justified.

🔗 Links & Resources

GitHub: https://github.com/nightmarewalker/D-MemFS
PyPI: https://pypi.org/project/D-MemFS/
Original Japanese Article: バイナリ専用 FS にテキストモードを足すべきか ― 設計原則とアーキテクチャ境界の再定義 ― D-MemFS 開発戦記４

If you find this project interesting, a ⭐ on GitHub would be the best way to support my work!

DEV Community

[Side B] Should a Binary-Only FS Support Text Mode? Redrawing the Architecture Boundary

🧭 About this Series: The Two Sides of Development

What the Design Document Said

Design Correctness Doesn't Stop Reality From Punching Back

"Just Use `io.TextIOWrapper`?" — Not That Simple

Problem 1: The `readinto()` Requirement

Problem 2: Buffering vs. Quota Incompatibility

Problem 3: The `seek()` Cookie Problem

Solution: Build a Lightweight Wrapper — Without a Buffer

`read(size)` Is an Approximation in Characters

`readline()` and Three-Way Newline Handling

Did We Violate the Design Principle?

The Boundary Between Design Purity and Practicality

Closing

🔗 Links & Resources

Top comments (0)

🧭 About this Series: The Two Sides of Development

What the Design Document Said

Design Correctness Doesn't Stop Reality From Punching Back

"Just Use io.TextIOWrapper?" — Not That Simple

Problem 1: The readinto() Requirement

Problem 2: Buffering vs. Quota Incompatibility

Problem 3: The seek() Cookie Problem

Solution: Build a Lightweight Wrapper — Without a Buffer

read(size) Is an Approximation in Characters

readline() and Three-Way Newline Handling

Did We Violate the Design Principle?

The Boundary Between Design Purity and Practicality

Closing

🔗 Links & Resources

"Just Use `io.TextIOWrapper`?" — Not That Simple

Problem 1: The `readinto()` Requirement

Problem 3: The `seek()` Cookie Problem

`read(size)` Is an Approximation in Characters

`readline()` and Three-Way Newline Handling