DEV Community

Discussion on: How to Open A File in Python Like A Pro

Collapse
 
vedgar profile image
Vedran Čačić

Or simply

with open(path) as file:
    for line in file: print(line)
Enter fullscreen mode Exit fullscreen mode

Don't reinvent the wheel.

And why do you think swallowing the exception of a non-existent file is a good idea? Read that Zen again. Errors should never pass silently. (Not to mention a race condition with your code.)

It would be more useful if you mentioned local file encoding, and utf8 as a new sensible default. That would really be "like a pro".

Collapse
 
mirage2032 profile image
mirage2032

Your code is not perfect either, what if a single line is a few GB?
You just made the assumption that each line will be small in size which might not always be the case.

Collapse
 
vedgar profile image
Vedran Čačić

No code is perfect, especially in Python. :-) But if we are more explicit about what exactly are we doing, we can produce code that is robust and good enough. For start, is your file textual or binary? They are not the same, although on various UNIXes you can often pretend they are. [Text files are sequences of characters, binary files are sequences of bytes. Bytes aren't characters, characters aren't bytes.]

From the context, I realized you're probably talking about text files, although the "gimme bunch of random bits" is just wrong there. And for a good reason: in 33 years of working with computers in all forms, I never had to read a text file whose line didn't fit in memory. Have you? It's a honest question.

Collapse
 
zhiyueyi profile image
Zhiyue Yi

Thanks for sharing nice suggestions here about encoding, exception handling! Definitely worth for me to explore more! I’m still quite new to Python and there are still a lot to learn.

As for reading large file examples, probably this is the case: imagine you have a server which has only one API to process a file and thousands QPS pressure. Though each time only some MB size files are processed, with thousands of requests coming in, it accumulates to a greater consumption of memories. Not to mention those servers with more functionalities.

I hope I could have a real-life example for you but currently I don’t :(

Collapse
 
vedgar profile image
Vedran Čačić

It's not reason for ":(", it's for ":)". Because it means you can write normal-looking easy code and it will work.

Your example, even though fictional, has nothing to do with my comment: if you have to process files as a whole, then you do, and no amount of black magic will help you. If you don't, then the question is whether it's a text or a binary file, as I said. And then you should use line or block buffering as needed.

If you're really strapped for memory, the first optimization I'd suggest is not using Python. Python is so dynamic that common data structures easily take up many times more memory than in "normal" languages with value semantics.

Thread Thread
 
zhiyueyi profile image
Zhiyue Yi

I think I got what you mean here.

with open(path) as file:
    for line in file: print(line)
Enter fullscreen mode Exit fullscreen mode

is fast and low memory consumed (Just learnt it from you and tried by myself, thanks)

And I agree with you that we should write the code as simple as possible in most cases, because having black magic here makes code less readable.

But I still think this technique is worth to mention and good to know, in case somebody needs it for some extreme cases.

Thread Thread
 
vedgar profile image
Vedran Čačić

Like what? Like "I have a binary file 4GiB in size and I'm just gonna spit it to stdout 1KiB at a time"? Not to mention that you don't do any decoding at all, so bytes objects are written to your screen raw, which isn't what you want, no matter the usecase. And not to mention "if it doesn't exist in the moment I check, I won't do anything, even though it might exist later when I'd actually try to read from it"?

Sorry, I know you're trying to salvage your post, but "like a pro" doesn't mean that. A pro should know the terrain of possibilities they might encounter, and this is something you won't encounter. Ever. If you do, I'll eat my hat. :-P

Now, if you actually need to process a binary file in chunks (not "pretend it's text and write it on the screen"), that's why block buffering is for. Learn to use it. docs.python.org/3/library/io.html#... You're in fact implementing another buffer on top of a builtin one, which really doesn't help your memory nor your speed.

Thread Thread
 
zhiyueyi profile image
Zhiyue Yi • Edited

You are right. Thanks a lot for these helpful comments. It’s definitely a good lesson learnt

Thread Thread
 
vedgar profile image
Vedran Čačić

Let me just tell you one more thing. I do this all the time around the Net (explainxkcd.com/wiki/index.php/SIW...). Usually people stick to their guns and refuse to admit they are wrong. DEV is the only community where people thank me for correcting them. Kudos for that! B-)

Thread Thread
 
zhiyueyi profile image
Zhiyue Yi

Nobody is perfect. We are all learning to be better :D Though the post itself is not good enough, at least we had a meaningful conversation here and I get a better solution.