At some point in your program, you may want to work with files stored locally on your machine or from a network. Python represents files as file (or file-like) objects. This representation provides you with functions that enable the manipulation of files.
Table of Contents
- File Object APIs
- Opening/Creating a File
- Writing to Files
- Reading from Files
- Example: Copying a File
- Working with JSON Files
- Working with Structured Datasets
- Working with Temporary Files
- Working with File-like Objects
- Getting File Attributes
- Working with Directories
- Example: Back Up Recently Modified Files
- Summary
Typically, these file (or file-like) objects include:
-
Text file objects which let you read/write text strings. Examples include:
- files opened in text mode (
r
orw
) -
standard input represented by
sys.stdin
-
standard output represented by
sys.stdout
, and - a string created with
sys.StringIO()
- files opened in text mode (
-
Binary file objects that let you read/write buffered byte strings. Examples include:
- files opened in binary mode ('rb', 'wb' or 'rb+') and
- a byte object created with
io.BytesIO()
andgzip.GzipFile
Raw binary file objects are used behind the scenes to build binary and text streams. An example includes a file opened with
mode=rb
andbuffering=0
e.g.open("myfile.jpg", "rb", buffering=0)
.
You can use the word stream and buffered file interchangeably.
File Object APIs
The table below shows some of the properties of file objects.
Property | type | Description |
---|---|---|
buffer |
io.BufferedRandom | An in-memory temporary storage for the actual file text/binary contents. |
closed |
bool | Whether the file is opened or closed. |
encoding |
str | The file encoding applicable only to text files, gives the name of the encoding that the text will be decoded or encoded with. Always specify the encoding when you open a text file for reading/writing. Else, Python will use the default locale encoding and that might not be what you want. |
errors |
str | How should Python handle file errors? Defaults to strict . More info
|
line_buffering |
bool | If True , Python clears the file's internal buffer when a call to write() contains a \n or a \r\n . |
mode |
str | Whether the file is opened for reading(r), writing(w) and so on. |
name |
str | The name of the file. |
newlines |
str | Controls how line endings are handled. It can be None, '', '\n', '\r', and '\r\n' . |
write_through |
bool | If True , calls to write() are guaranteed not to be buffered. Any data written to a text file is directly handed to its underlying binary buffer. |
Use
sys.getdefaultencoding()
orlocale.getencoding()
to get your system's or locale's default encoding respectively. Note thatlocale.getencoding()
works in Python >=3.11.
The table below shows some text/binary file or stream methods.
Method | Description |
---|---|
close() |
Flush and close the stream. |
flush() |
Flush the write buffers of the stream if applicable. |
read() |
Read and return all (parameter size=-1 ) or n characters/bytes from a stream. |
readable() |
True if you can read the file. If False , read() will raise an error. |
readline() |
Read and return one line from the stream. |
readlines() |
Read and return a list of lines from the stream. |
reconfigure() |
Reconfigure this text stream using new settings for encoding, errors, newline, line_buffering and write_through. |
seek() |
Useful for random file access. |
seekable() |
Return True if the stream supports random access. If False, seek() , tell() and truncate() will raise an error. |
tell() |
Return the current stream position. |
truncate() |
Resize the stream to the given size in bytes (or the current position if size is not specified). |
writable() |
Return True if the stream supports writing. |
write() |
Write the given bytes/text string to the stream, and return the number of bytes/characters written. |
writelines() |
Write a list of lines to the stream. Add \n yourself if you need it. |
Opening/Creating a File
The open()
function is the most common way to open a file object for reading/writing. For example
# Open a file for reading in text mode
# Be explicit about the mode and encoding
text_file = open('your_text_file.txt', mode='rt', encoding='utf-8')
# <class '_io.TextIOWrapper'>
print(type(text_file))
# Open a file for reading in binary mode
binary_file = open('your_binary_file', mode='rb')
# <class '_io.BufferedReader'>
print(type(binary_file))
The open()
in write mode creates a new file if it does not exist. If the file already exists, the open()
call will clear its content. For example
# Open a file for writing in text mode
text_file = open('lyrics.txt', mode='wt', encoding='utf-8')
# Open a file for writing in binary mode
binary_file = open('your_binary_file', mode = 'wb')
The table below lists the available file modes and their description.
Mode | Description |
---|---|
r |
Open for reading (default). |
w |
Open for writing, truncating the file first. |
x |
Create a new file and open it for writing. Raises an error if the file already exists. |
a |
Open for writing, appending to the end of the file if it exists. |
b |
Binary mode. |
t |
Text mode (default) |
+ |
Open a disk file for updating (reading and writing). For example, r+t opens a file for reading and writing in text mode. |
Files are stored as raw bytes. Files opened in binary mode return contents as bytes objects without any decoding. In text mode, the underlying raw bytes are decoded into text (using the default locale encoding or the encoding you specified) and then returned.
Writing to Files
To write to a file, open it for writing with any of w
, w+
, a
, a+
, x
, x+
, or r+
in a text (t) or binary (b) mode and use either .write()
or .writelines()
. For example
# Open the file for writing.
# Add + so that we can also read from the file
lyrics = open('lyrics.txt', mode='w+', encoding='utf-8')
# Write something nice
lyrics.write('Best day of my life\n')
lyrics.write('I had a dream a dream so big and loud\n')
lyrics.write('I jumped so high I touched the clouds\n')
lyrics.write('Wo-o-o-o-o-oh, wo-o-o-o-o-oh\n')
# Persist the content into the file without closing it
lyrics.flush()
# Confirm if the write succeeded
lyrics.tell()
# Rewind to first position
lyrics.seek(0)
# Read the content
print(lyrics.read())
more_lines = [
'I stretched my hands out to the sky\n',
'We danced with monsters through the night\n',
'Wo-o-o-o-o-oh, wo-o-o-o-o-oh\n'
]
# Add more lines using .writelines()
lyrics.writelines(more_lines)
# Flush and close the file
lyrics.close()
Reading From Files
To read a file, open it for reading with any of r
, r+
, w+
, a+
, or x+
in a text (t) or binary (b) mode and use either .read()
, .readline()
or .readlines()
. For example
lyrics = open('lyrics.txt', mode='rt', encoding='utf-8')
# Read all contents
# The file cursor position goes to the end
all_content = lyrics.read()
print(all_content)
# Reset the cursor/position to the beginning
lyrics.seek(0)
# Read the first ten characters
first_10_chars = lyrics.read(10)
print(first_10_chars)
# Read 20 characters more
another_20_chars = lyrics.read(20)
print(another_20_chars)
# Where is the file cursor position
print("The file cursor is at position", lyrics.tell())
# Read the rest content starting from where read(10) stopped
the_rest_content = lyrics.read()
print(the_rest_content)
# Reset the cursor/position to the beginning again
lyrics.seek(0)
# Read only the first line
first_line = lyrics.readline()
print(first_line)
# Read the rest lines
the_rest_lines = lyrics.readlines()
print(the_rest_lines)
# Reset
lyrics.seek(0)
# print each line in a for loop
for line in lyrics.readlines():
print(line)
# Reset
lyrics.seek(0)
# Read from the file object directly
for line in lyrics:
print(line)
# Close the file
lyrics.close()
Example: Copying a File
# copy.py
"""A script for copying files. Usage: python copy.py file1 file1_copy"""
import sys
def copier(source: str, destination: str):
"""Copy the contents of the source to the destination
Args:
source - The source file.
destination - The destination file.
"""
with open(source, mode='r', encoding='utf-8') as srcfile:
with open(destination, mode='w', encoding='utf-8') as destfile:
for line in srcfile:
destfile.write(line)
if __name__ == '__main__':
if len(sys.argv) != 3:
print('Usage: python copy.py file file1_copy')
else:
_, src, dest = sys.argv
copier(src, dest)
The
with
statement, an example of a "context manager" closes a file. So there is no need to call.close()
yourself.
Working with JSON Files
You can read/write to a JSON file in the following ways:
To write, use
json.dump(python_object, file_opened_for_writing)
to write a python object to a file opened for writing or usefile_opened_for_writing.write(json.dumps(python_object))
.To read, use
json.load(file_opened_for_reading)
to read from a binary file or text file object opened for reading.
Note that you must open a JSON file in utf-8 mode when reading or writing to it.
import json
# Some menu I'd love to try :)
menu = [
dict(name='alien_fish', price=2000),
dict(name='alient_vegetables', price=100),
dict(name='some_alien_dish', price=200_000)
]
# Open a JSON for writing and dump the menu there so
# that I won't ever forget
with open('menu.json', mode='w', encoding='utf-8') as jsonfile:
json.dump(menu, jsonfile)
# Open a JSON file for reading
with open('menu.json', mode='r', encoding='utf-8') as jsonfile:
same_menu = json.load(jsonfile)
# Sure? yes
assert menu == same_menu, "Oops!, The menu isn't the same"
Working with Structured Datasets
You've already seen how to read and write CSV files in the previous article on modules. For other types of structured data such as CSVs, Excel, Stata, SPSS, databases, and so on, you'll want to use an external package like pandas
.
The pandas io API lists several methods for reading/writing various data files.
Working with Temporary Files
You can also securely create one-off files or file-like objects for transfer or demo purposes. For example, you can store a user-uploaded file as a temporary one, process it, and then copy the processed contents to a new one for permanent storage.
To create a temporary file, use tempfile.TemporaryFile()
. For example
import tempfile
# TemporaryFile() creates a temporary file-like object
with tempfile.TemporaryFile(mode='w+b') as tpfile:
print(tpfile)
print(tpfile.file)
tpfile.write(b'I will not stay long')
tpfile.seek(0)
print(tpfile.read())
The
tempfile.TemporaryFile()
call returns a file object (on POSIX platforms) or a file-like object on other platforms.The default mode parameter is 'w+b'.
The binary mode helps to maintain consistency on all platforms regardless of the data stored.
If you need a temporary file to have a name and if you also need to control the file deletion, use tempfile.NamedTemporaryFile()
instead. For example
import tempfile
# NamedTemporaryFile() always returns a file-like object
# whose file attribute is the underlying true file object.
with tempfile.NamedTemporaryFile(mode='w+b', delete=True, delete_on_close=True) as named_tpfile:
print(named_tpfile)
print(named_tpfile.name)
print(named_tpfile.file)
named_tpfile.write(b'Set delete=False or delete_on_close=False ')
named_tpfile.write(b'to control when I\'m deleted\n')
named_tpfile.seek(0)
print(named_tpfile.read())
Working with File-like Objects
StringIO and BytesIO
You can also create a file directly from text strings or bytes using io.StringIO()
or io.ByteIO()
, respectively. For example
import io
text_message = 'Hello this is a file created from raw strings'
file_from_raw_strings = io.StringIO(text_message)
print(type(file_from_raw_strings)) # <class '_io.StringIO'>
# print the text contents
print(file_from_raw_strings.getvalue())
byte_message = b'Hello! This is a file created from bytes'
file_from_bytes = io.BytesIO(byte_message)
print(type(file_from_bytes)) # <class '_io.BytesIO'>
# print the byte contents
print(file_from_bytes.getvalue())
This post on stackoveflow highlights the uses of io.StringIO()
.
Anywhere you need a file for processing, data transfer over the network, demo purposes, etc. For example, to make a CSV 'file' for testing with pandas:
# Copied from Stackoverflow :)
# https://stackoverflow.com/questions/7996479/what-is-stringio-in-python-used-for-in-reality
import io
import pandas as pd
f = io.StringIO("id,name\n1,brian\n2,amanda\n3,zoey\n")
# Pandas takes a file path or a file-like object
df = pd.read_csv(f)
Network Response
The urllib.request.urlopen()
method returns an HTTP response as a binary file-like object. Therefore, you can treat the response like a binary file. For example
import urllib.request
url = 'https://pandas.pydata.org/docs/user_guide/index.html'
with urllib.request.urlopen(url) as response:
print(type(response)) # <class 'http.client.HTTPResponse'>
content = response.read()
print(type(content)) # <class 'bytes'>
# Convert the bytes to text
text_content = content.decode('utf-8')
print(type(text_content)) # <class 'str'>
Here's an example from the Python documentation. The http response is copied directly to a named temporary file for further processing (since the delete
was set to False
).
import shutil
import tempfile
import urllib.request
with urllib.request.urlopen('http://python.org/') as response:
with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
shutil.copyfileobj(response, tmp_file)
# delete was set to False, so this is allowed
with open(tmp_file.name) as html:
pass
The
shutil
module offers several high-level operations on files and directories. Do check it out!
The Terminal (Standard Input and Output)
Python provides the standard input and standard output as text streams through sys.stdin
and sys.stdout
.
The example below shows a program that reads from sys.stdin
and writes to sys.stdout
and a file.
import sys
stop_signals = {'q', 'quit', 'exit', 'stop'}
with open('from_stdin.txt', mode='w', encoding='utf-8') as file:
# Read continuously from the terminal
for line in sys.stdin:
line = line.strip()
if line in stop_signals:
break
# Print writes to sys.stdout by default
# Therefore being explicit here is redundant
print(line, file=sys.stdout)
# Write to a given file using print instead of .write()
# Flush: persist immediately
print(line, file=file, flush=True)
You can use the
fileinput
module to replicate the example above by replacingsys.stdin
withfileinput.input(encoding="utf-8")
. Check out the module documentation for more information.
You can also use the input()
function to take a user input from the terminal or sys.stdin
. For example
name = input('what is your name: ')
print(f'Your name is {name}')
Getting File Attributes
To assess file attributes such as size, last access time, last modified time, and so on, use the os.stat()
function. The required argument is either the file path or the file descriptor (returned by a call to .fileno()
). For example
import os
from datetime import datetime
# Using the file stats
file_stats = os.stat('eggs.txt')
# Using the file descriptor of an opened file
with open('eggs.txt', mode='r', encoding='utf-8') as file:
file_stats_2 = os.stat(f.fileno())
# They are the same
assert file_stats == file_stats_2
print(file_stats)
print(f'The file size is {file_stats.st_size} bytes')
print(f'The file was last accessed on {datetime.fromtimestamp(file_stats.st_atime)}')
print(f'The file was last modified on {datetime.fromtimestamp(file_stats.st_mtime)}')
Use
datetime.fromtimestamp()
to convert timestamps/seconds into human-readable format
-
st_size
is the file size in bytes -
st_atime
is the time of most recent access expressed in seconds. -
st_mtime
is the time of the most recent content modification expressed in seconds. -
st_ctime
is the time of the most recent metadata change expressed in seconds.
Note that the
os.path
module provides similar functions such asos.path.getsize()
,os.path.getatime()
,os.path.getmtime()
, andos.path.getctime()
.
Working with Directories
The os
module provides several functions to work with directories. Some of these include:
-
os.getcwd()
- Get the current working directory -
os.listdir(directory)
- list the content of a given directory. Default to.
. -
os.chdir(directory)
- change/enter into a given directory -
os.mkdir(directory)
- Make a directory -
os.remove(path)
- Remove a file in the given path -
os.unlink(path)
- Remove a file in the given path -
os.rmdir(directory)
- Remove a directory. Throws an error if the directory does not exist or is not empty. -
os.removedirs(name)
- Remove directories recursively.
Example: Back Up Recently Modified Files
The program below backs up files modified within the last 24 hours. The explanation of the code is in the comments :)
#!/usr/bin/env python3
"""A Python script to backup files that have been created/modified in the last 24 hours
Adapted from a bash script from the Linux Commands & Shell Scripting Course on EDX
"""
import sys
import os
from os import path
from datetime import datetime, timedelta
import tarfile
import shutil
if __name__ == '__main__':
# Make sure we get exactly two arguments
# The first argument is the filename. So we check for three(3) arguments
args = sys.argv
if len(args) != 3:
print('Usage: python backup.py target destination')
sys.exit(1)
# Unpack the arguments.
_, target, destination = args
# Make sure both target and destination are valid directories
if not path.isdir(target) or not path.isdir(destination):
print('Invalid directories provided')
sys.exit(1)
# Let's save some variables
# Get the current time and yesterday's
current_ts = datetime.now()
yesterday_ts = current_ts - timedelta(hours=24)
# The absolute path to the base/root directory
orig_abs_path = path.abspath(path.curdir)
# The absolute path to the destination directory
dest_abs_path = path.join(orig_abs_path, destination)
# list of files to backup
to_backup = []
# Go into the target directory
os.chdir(target)
for file in os.listdir(path.curdir):
# Get the file stats
file_stat = os.stat(file)
# If the last modified date is greater than yesterday
# add it to the backup list
if datetime.fromtimestamp(file_stat.st_mtime) > yesterday_ts:
print(f'Adding file: {file} to list')
to_backup.append(file)
# Open a tar.gz file for writing
backup_file_name = f'backup-{current_ts.timestamp()}.tar.gz'
with tarfile.open(backup_file_name, "w:gz") as tar:
for name in to_backup:
tar.add(name)
# Move the tar.gz from target to destination directory
print(f'moving {backup_file_name} to destination')
shutil.move(backup_file_name, dest_abs_path)
You can extend the program to use zip instead of tar.gz by replacing the appropriate line with the following:
# ...
import zipfile
backup_file_name = f'backup-{current_ts.timestamp()}.zip'
# Write to a zip file
with zipfile.ZipFile(backup_file_name, mode='w') as zf:
for name in to_backup:
zf.write(name)
# ...
Do check the Python documentation to learn more about working with zip files, gzip files and shutil.
Summary
In this article, you saw how to read and write to files whether permanent or temporary, saved on disk, from the network, or the terminal. You also caught a glimpse of how to work with directories and how to compress files using tarfile
and zipfile
modules. Thank you for reading!
Top comments (1)
Please, let me know if you think this article is too lengthy.