DEV Community

D
D

Posted on • Originally published at zenn.dev

[Side B] Should a Binary-Only FS Support Text Mode? Redrawing the Architecture Boundary

From the Author:
D-MemFS was featured in Python Weekly Issue #737 (March 19, 2026) under Interesting Projects, Tools and Libraries. Being picked up by one of the most widely-read Python newsletters confirmed that in-memory I/O bottlenecks and memory management are truly universal challenges for developers everywhere. This series is my response to that interest.

๐Ÿงญ About this Series: The Two Sides of Development

In Japan, I publish this series across two distinct platforms to serve different developer needs. To provide the complete picture here on Dev.to, I've brought them together as two "Sides":

  • Side A (Practical / originally on Qiita): Focuses on the "How". Implementation details, benchmarks, and concrete solutions for practical use cases.
  • Side B (Philosophy / originally on Zenn): Focuses on the "Why". The development war stories, design decisions, and how I collaborated with AI through Specification-Driven Development (SDD).

What the Design Document Said

D-MemFS's design principles include a clause in Article 3:

MFS is dedicated to "pure byte-sequence virtual hierarchy management and resource control." Text encoding, encryption, and physical persistence are delegated to upper-layer boundary controllers.

And in the open() mode specification:

Text mode (r, w, etc. โ€” modes without the binary suffix b) shall raise ValueError.

In other words, text mode was explicitly excluded by design.

The reasoning is straightforward and consistent. A memory filesystem's job is to manage byte sequences. Encoding is an application-layer concern โ€” the filesystem layer has no business managing it. Unix VFS doesn't know about text either. Introducing text mode would drag in encoding selection, newline translation, and multibyte boundary handling โ€” a significant expansion of responsibility. There was no good reason to make that trade-off at the outset.

Design Correctness Doesn't Stop Reality From Punching Back

The design reviews were thorough. The implementation was solid. 346 test cases passed. Coverage exceeded 97%. We were in final polish mode before release.

Then I paused and asked myself:

"What will users actually think of this?"

Imagine being a user wanting to work with log files, config files (.ini, .json, .yaml) in memory with D-MemFS. No text mode. How do you read a config?

with mfs.open("/config/settings.json", "rb") as f:
    text = f.read().decode("utf-8")
    config = json.loads(text)
Enter fullscreen mode Exit fullscreen mode

โ€ฆIt works. But am I really going to write .decode() every single time? And the writes too?

data = json.dumps(config, ensure_ascii=False).encode("utf-8")
with mfs.open("/config/settings.json", "wb") as f:
    f.write(data)
Enter fullscreen mode Exit fullscreen mode

It works, but it's not natural.

Particularly for the ZIP-in-memory use case โ€” one of D-MemFS's primary advertised strengths is "extract a ZIP into memory and process its contents." If every text file read after extraction requires an explicit .decode(), that felt like a lack of care toward the user.

"Just Use io.TextIOWrapper?" โ€” Not That Simple

Python's standard library provides io.TextIOWrapper, which wraps a binary stream to provide text I/O โ€” used everywhere under the hood in normal file operations.

"Just wrap the handle with TextIOWrapper(handle) โ€” problem solved," I initially thought. But trying it revealed three distinct problems.

Problem 1: The readinto() Requirement

io.TextIOWrapper requires readinto() on its target for internal buffering. MemoryFileHandle was designed with a clean interface of read() / write() / seek() / tell() โ€” no readinto().

Adding it is technically possible, but readinto() is tightly coupled to memoryview and the buffer protocol, which would ripple through the memory management design. Modifying the internal buffer architecture just to support text mode felt like the tail wagging the dog.

Problem 2: Buffering vs. Quota Incompatibility

TextIOWrapper maintains an internal buffer. Buffered writes are not reflected in the quota system until flush() is called. This creates a window where "the quota is technically exceeded, but writes are still succeeding."

D-MemFS has a hard quota. "Reject before writing" is the cornerstone of its design. Buffering-induced delayed accounting directly contradicts this principle.

Problem 3: The seek() Cookie Problem

TextIOWrapper's seek() uses opaque "cookies" โ€” not byte offsets. You can only seek() to positions previously returned by tell(). This is fundamentally incompatible with MemoryFileHandle, which navigates freely by byte offset.

Solution: Build a Lightweight Wrapper โ€” Without a Buffer

The conclusion was to build MFSTextHandle, a lightweight custom wrapper. Its design is guided by three principles:

No internal buffer. Every write() is immediately delegated to the binary handle. Quota checks are always immediate. The hard quota contract is never broken.

Encoding/decoding is the only responsibility. MFSTextHandle does exactly one thing: convert between strings and bytes. It holds no file operation logic, no lock management. Everything is delegated to the underlying MemoryFileHandle.

No close() responsibility. Handle lifecycle is managed by MemoryFileHandle's with statement. MFSTextHandle is a thin adapter โ€” nothing more.

with mfs.open("/data/hello.txt", "wb") as f:
    th = MFSTextHandle(f, encoding="utf-8")
    th.write("Hello, world\n")
Enter fullscreen mode Exit fullscreen mode

This is a helper used exclusively within a with mfs.open(...) scope. No separate lifecycle management of MFSTextHandle is required.

read(size) Is an Approximation in Characters

There is one limitation that deserves honest acknowledgment.

The size argument to read(size). In Python's standard text mode, size means character count. But because MFSTextHandle has no internal buffer, it passes size to handle.read(size) as a byte count.

For UTF-8 multibyte characters (e.g. Japanese), this means fewer characters than requested may be returned.

# In UTF-8, one Japanese character = 3 bytes
th.read(3)  # Reads 3 bytes โ†’ returns "ใ“" (1 character) โ€” fine
th.read(1)  # Reads 1 byte โ†’ potential decode error mid-multibyte sequence
Enter fullscreen mode Exit fullscreen mode

This limitation is explicitly documented in the docstring:

Note that this is an approximation in characters, not bytes.

It is not perfect. But for the most common use case โ€” read() with no argument to read the entire file โ€” it is not an issue. And readline(), which detects newlines byte by byte, is also unaffected by multibyte boundaries.

readline() and Three-Way Newline Handling

One more detail that required careful implementation: newline detection.

POSIX uses \n, old Mac used \r, Windows uses \r\n. MFSTextHandle.readline() recognizes all three.

The implementation is deliberately simple. Read one byte at a time, stop at \n or \r. If \r is found, peek at the next byte: if it's \n, consume it (treating \r\n as a single newline); otherwise, seek back.

if b == b"\r":
    next_b = self._handle.read(1)
    if next_b == b"\n":
        buf.extend(next_b)    # treat \r\n as a single newline
    elif next_b:
        self._handle.seek(self._handle.tell() - 1)  # put it back
Enter fullscreen mode Exit fullscreen mode

Is reading one byte at a time slow? In theory, yes. But this is memory โ€” not disk I/O. Reading one byte at a time from memory is a fundamentally different order of magnitude from disk. In practice, this straightforward implementation is not a bottleneck.

Did We Violate the Design Principle?

This is the question I thought hardest about.

Design principle Article 3 says "text encoding is delegated upward." MFSTextHandle handles text encoding. Isn't that a contradiction?

I concluded it isn't. Here's why.

MFSTextHandle is not a method of MemoryFileSystem. open() did not start accepting text modes. open() still accepts only rb, wb, ab, and r+b โ€” pass a text mode and you get ValueError, same as before.

MFSTextHandle is an optional utility class. Users import it explicitly when they want it. It does not intrude into the filesystem layer's design at all.

In other words, the "MFS shall" scope of the design principle is fully preserved. Text processing was provided as a separate class outside of MFS. The design philosophy was not bent โ€” the boundary was redrawn more precisely, and usability was added on top.

The Boundary Between Design Purity and Practicality

When building a library, there will always be a moment where "design purity" and "ease of use" collide.

Faithfulness to principle matters. But if following a principle forces unnecessary friction on users, it is worth re-examining where that principle applies โ€” not whether it applies.

The key, I believe, is not to "bend" the principle but to "redraw the boundary accurately":

  • The filesystem layer (MemoryFileSystem) remains binary-only โ†’ maintained
  • Text conversion is provided as a separate class (MFSTextHandle) โ†’ added
  • No internal buffer; quota immediacy is preserved โ†’ consistent with design principles
  • open() mode specification is unchanged โ†’ API contract maintained

I did not change the principle written in the design document. I defined more precisely where that principle applies.

Closing

Adding MFSTextHandle is a small feature. _text.py is only 136 lines.

But the decision to add those 136 lines was where I spent the most time. Because it wasn't merely a feature addition โ€” it was a dialogue with the design philosophy.

I wrote the design document. I reviewed it with AI repeatedly. I implemented, tested, and made 346 test cases pass. At the very end, I asked myself: "Am I violating the principles I've defended throughout this entire process?" And I needed to be able to answer โ€” to myself โ€” "No, here's why."

That ability to explain yourself is, I believe, the point of writing a design document in the first place.

Without one, this decision would have ended at "seemed useful so I added it." With one, I could ask "how does this align with the principles?" and actually answer it.

A design document isn't only useful for deciding what to include. It's useful for explaining why inclusion is justified.


๐Ÿ”— Links & Resources

If you find this project interesting, a โญ on GitHub would be the best way to support my work!

Top comments (0)