DEV Community

Cover image for Python: open().read()->str , but for big files
Tai Kedzierski
Tai Kedzierski

Posted on • Updated on

Python: open().read()->str , but for big files

Image (C) Tai Kedzierski

We just had a use case where we needed to POST a file over to a server. The naive implementation for posting with requests is to do

with open("my_file.bin", 'rb') as fh:
    requests.post(url, data={"bytes": fh.read()})
Enter fullscreen mode Exit fullscreen mode

Job done! Well. If the file is reeeally big, that .read() operation will attempt to load the entire file into memory, before passing the loaded bytes to requests.post(...)

Clearly, this is going to hurt. A lot.

Use mmap

A quick search yielded a solution using mmap to create a "memory mapped" object, which would behave like a string, whilst being backed by a file that only gets read in chunks as needed.

As ever, I like making things re-usable, and easy to slot-in. I adapted the example into a contextual object that can be used in-place of a normal call to open()

# It's a tiny snippet, but go on.
# Delviered to You under MIT Expat License, aka "Do what you want"
# I'm not even fussy about attribution.

import mmap

class StringyFileReader:

    def __init__(self, file_name, mode):
        if mode not in ("r", "rb"):
            raise ValueError(f"Invalid mode '{mode}'. Only read-modes are supported")

        self._fh = open(file_name, mode)
        # A file size of 0 means "whatever the size of the file actually is" on non-Windows
        # On Windows, you'll need to obtain the actual size, though
        fsize = 0
        self._mmap = mmap.mmap(self._fh.fileno(), fsize, access=mmap.ACCESS_READ)


    def __enter__(self):
        return self


    def read(self):
        return self._mmap


    def __exit__(self, *args):
        self._mmap.close()
        self._fh.close()
Enter fullscreen mode Exit fullscreen mode

Which then lets us simply tweak the original naive example to:

with StringyFileReader("my_file.bin", 'rb') as fh:
    requests.post(url, data={"bytes": fh.read()})
Enter fullscreen mode Exit fullscreen mode

Job. Done.

EDIT: we've discovered through further use that requests is pretty stupid. It sill tries to read the entire file into memory - possibly by doing a copy of the "string" it receives during one of its internal operations. So this solution seems to only stand in limited cases...

Top comments (4)

Collapse
 
xtofl profile image
xtofl

That's so terribly simple!

I was anticipating some multipart chunked transferring, but this makes excellent use of the machinery offered by the operating system.

Ever considered using contextlib?

@contextlib.contextmanager
def readmm(filename):
      _fh = open(file_name, mode)
      fsize = 0
     _mmap = mmap.mmap(self._fh.fileno(), fsize, access=mmap.ACCESS_READ)
     try:
        yield _mmap
     finally:
      _fh.close()
      _mmap.close()

with readmm("my_file.bin", "rb") as data:
  requests.post(url, data={"bytes": data})
Enter fullscreen mode Exit fullscreen mode
Collapse
 
taikedz profile image
Tai Kedzierski

That does make it even more concise !

That said, I'm not sure how I feel about the enter/exit context being implicit behind this; as in, it reduces the amount of code, but I can feel like reading it back feels unintuitive.

Collapse
 
xtofl profile image
xtofl

Mind, it's not implicit! It's extracted into the @contextmanager function.

I respect that you phrase it as unintuitive, since intuition is learned. Indeed, to (very) many, the extracted form of the code sandwich is intuitive. You can observe the movement from explicit sandwiches to extracted in many languages (e.g. Scope.Exit, using in C#).

The power it brings is that the developer cannot possibly forget to cleanup, so the reader doesn't have to wonder whether they did. Assuming code is read 10 times more than it is written, inner peace will be your part after growing this intuition.

Thread Thread
 
taikedz profile image
Tai Kedzierski

Indeed. I guess it's something I just have to get used to - can be regarded as analogous to the with keyword which, unless you've learned and used it properly, can look oddly incomplete.