Mark Edosa

Posted on Nov 2, 2023

Introduction to Python Programming - Working With Files

#python #programming #beginners #tutorial

At some point in your program, you may want to work with files stored locally on your machine or from a network. Python represents files as file (or file-like) objects. This representation provides you with functions that enable the manipulation of files.

File Object APIs
Opening/Creating a File
Writing to Files
Reading from Files
Example: Copying a File
Working with JSON Files
Working with Structured Datasets
Working with Temporary Files
Working with File-like Objects
Getting File Attributes
Working with Directories
Example: Back Up Recently Modified Files
Summary

Typically, these file (or file-like) objects include:

Text file objects which let you read/write text strings. Examples include:
- files opened in text mode (r or w)
- standard input represented by sys.stdin
- standard output represented by sys.stdout, and
- a string created with sys.StringIO()
Binary file objects that let you read/write buffered byte strings. Examples include:
- files opened in binary mode ('rb', 'wb' or 'rb+') and
- a byte object created with io.BytesIO() and gzip.GzipFile
Raw binary file objects are used behind the scenes to build binary and text streams. An example includes a file opened with mode=rb and buffering=0 e.g. open("myfile.jpg", "rb", buffering=0).

You can use the word stream and buffered file interchangeably.

File Object APIs

The table below shows some of the properties of file objects.

Property	type	Description
`buffer`	io.BufferedRandom	An in-memory temporary storage for the actual file text/binary contents.
`closed`	bool	Whether the file is opened or closed.
`encoding`	str	The file encoding applicable only to text files, gives the name of the encoding that the text will be decoded or encoded with. Always specify the encoding when you open a text file for reading/writing. Else, Python will use the default locale encoding and that might not be what you want.
`errors`	str	How should Python handle file errors? Defaults to `strict`. More info
`line_buffering`	bool	If `True`, Python clears the file's internal buffer when a call to `write()` contains a `\n` or a `\r\n`.
`mode`	str	Whether the file is opened for reading(r), writing(w) and so on.
`name`	str	The name of the file.
`newlines`	str	Controls how line endings are handled. It can be `None, '', '\n', '\r', and '\r\n'`.
`write_through`	bool	If `True`, calls to `write()` are guaranteed not to be buffered. Any data written to a text file is directly handed to its underlying binary buffer.

Use sys.getdefaultencoding() or locale.getencoding() to get your system's or locale's default encoding respectively. Note that locale.getencoding() works in Python >=3.11.

The table below shows some text/binary file or stream methods.

Method	Description
`close()`	Flush and close the stream.
`flush()`	Flush the write buffers of the stream if applicable.
`read()`	Read and return all (parameter `size=-1`) or `n` characters/bytes from a stream.
`readable()`	`True` if you can read the file. If `False`, `read()` will raise an error.
`readline()`	Read and return one line from the stream.
`readlines()`	Read and return a list of lines from the stream.
`reconfigure()`	Reconfigure this text stream using new settings for encoding, errors, newline, line_buffering and write_through.
`seek()`	Useful for random file access.
`seekable()`	Return `True` if the stream supports random access. If False, `seek()`, `tell()` and `truncate()` will raise an error.
`tell()`	Return the current stream position.
`truncate()`	Resize the stream to the given size in bytes (or the current position if size is not specified).
`writable()`	Return `True` if the stream supports writing.
`write()`	Write the given bytes/text string to the stream, and return the number of bytes/characters written.
`writelines()`	Write a list of lines to the stream. Add `\n` yourself if you need it.

Opening/Creating a File

The open() function is the most common way to open a file object for reading/writing. For example

# Open a file for reading in text mode
# Be explicit about the mode and encoding
text_file = open('your_text_file.txt', mode='rt', encoding='utf-8')

# <class '_io.TextIOWrapper'>
print(type(text_file))

# Open a file for reading in binary mode
binary_file = open('your_binary_file', mode='rb')

# <class '_io.BufferedReader'>
print(type(binary_file))

The open() in write mode creates a new file if it does not exist. If the file already exists, the open() call will clear its content. For example

# Open a file for writing in text mode
text_file = open('lyrics.txt', mode='wt', encoding='utf-8')

# Open a file for writing in binary mode
binary_file = open('your_binary_file', mode = 'wb')

The table below lists the available file modes and their description.

Mode	Description
`r`	Open for reading (default).
`w`	Open for writing, truncating the file first.
`x`	Create a new file and open it for writing. Raises an error if the file already exists.
`a`	Open for writing, appending to the end of the file if it exists.
`b`	Binary mode.
`t`	Text mode (default)
`+`	Open a disk file for updating (reading and writing). For example, `r+t` opens a file for reading and writing in text mode.

Files are stored as raw bytes. Files opened in binary mode return contents as bytes objects without any decoding. In text mode, the underlying raw bytes are decoded into text (using the default locale encoding or the encoding you specified) and then returned.

Writing to Files

To write to a file, open it for writing with any of w, w+, a, a+, x, x+, or r+ in a text (t) or binary (b) mode and use either .write() or .writelines(). For example

# Open the file for writing. 
# Add + so that we can also read from the file
lyrics = open('lyrics.txt', mode='w+', encoding='utf-8')

# Write something nice
lyrics.write('Best day of my life\n')
lyrics.write('I had a dream a dream so big and loud\n')
lyrics.write('I jumped so high I touched the clouds\n')
lyrics.write('Wo-o-o-o-o-oh, wo-o-o-o-o-oh\n')

# Persist the content into the file without closing it
lyrics.flush()

# Confirm if the write succeeded
lyrics.tell()

# Rewind to first position
lyrics.seek(0)

# Read the content
print(lyrics.read())

more_lines = [
  'I stretched my hands out to the sky\n',
  'We danced with monsters through the night\n',
  'Wo-o-o-o-o-oh, wo-o-o-o-o-oh\n'
]

# Add more lines using .writelines()
lyrics.writelines(more_lines)

# Flush and close the file
lyrics.close()

Reading From Files

To read a file, open it for reading with any of r, r+, w+, a+, or x+ in a text (t) or binary (b) mode and use either .read(), .readline() or .readlines(). For example

lyrics = open('lyrics.txt', mode='rt', encoding='utf-8')

# Read all contents
# The file cursor position goes to the end
all_content = lyrics.read()
print(all_content)

# Reset the cursor/position to the beginning
lyrics.seek(0)

# Read the first ten characters
first_10_chars = lyrics.read(10)
print(first_10_chars)

# Read 20 characters more
another_20_chars = lyrics.read(20)
print(another_20_chars)

# Where is the file cursor position
print("The file cursor is at position", lyrics.tell())

# Read the rest content starting from where read(10) stopped
the_rest_content = lyrics.read()
print(the_rest_content)


# Reset the cursor/position to the beginning again
lyrics.seek(0)

# Read only the first line
first_line = lyrics.readline()
print(first_line)

# Read the rest lines
the_rest_lines = lyrics.readlines()
print(the_rest_lines)

# Reset
lyrics.seek(0)


# print each line in a for loop
for line in lyrics.readlines():
    print(line)

# Reset
lyrics.seek(0)


# Read from the file object directly
for line in lyrics:
  print(line)


# Close the file
lyrics.close()

Example: Copying a File

# copy.py
"""A script for copying files. Usage: python copy.py file1 file1_copy"""
import sys

def copier(source: str, destination: str):
  """Copy the contents of the source to the destination
    Args:
      source - The source file.
      destination - The destination file.
  """
  with open(source, mode='r', encoding='utf-8') as srcfile:
    with open(destination, mode='w', encoding='utf-8') as destfile:
      for line in srcfile:
        destfile.write(line)


if __name__ == '__main__':
  if len(sys.argv) != 3:
    print('Usage: python copy.py file file1_copy')
  else:
    _, src, dest = sys.argv
    copier(src, dest)

The with statement, an example of a "context manager" closes a file. So there is no need to call .close() yourself.

Working with JSON Files

You can read/write to a JSON file in the following ways:

To write, use json.dump(python_object, file_opened_for_writing) to write a python object to a file opened for writing or use file_opened_for_writing.write(json.dumps(python_object)).
To read, use json.load(file_opened_for_reading) to read from a binary file or text file object opened for reading.

Note that you must open a JSON file in utf-8 mode when reading or writing to it.

import json

# Some menu I'd love to try :)
menu = [
  dict(name='alien_fish', price=2000), 
  dict(name='alient_vegetables', price=100),
  dict(name='some_alien_dish', price=200_000)
]

# Open a JSON for writing and dump the menu there so 
# that I won't ever forget
with open('menu.json', mode='w', encoding='utf-8') as jsonfile:
  json.dump(menu, jsonfile)

# Open a JSON file for reading
with open('menu.json', mode='r', encoding='utf-8') as jsonfile:
  same_menu = json.load(jsonfile)

# Sure? yes
assert menu == same_menu, "Oops!, The menu isn't the same"

Working with Structured Datasets

You've already seen how to read and write CSV files in the previous article on modules. For other types of structured data such as CSVs, Excel, Stata, SPSS, databases, and so on, you'll want to use an external package like pandas.

The pandas io API lists several methods for reading/writing various data files.

Working with Temporary Files

You can also securely create one-off files or file-like objects for transfer or demo purposes. For example, you can store a user-uploaded file as a temporary one, process it, and then copy the processed contents to a new one for permanent storage.

To create a temporary file, use tempfile.TemporaryFile(). For example

import tempfile

# TemporaryFile() creates a temporary file-like object
with tempfile.TemporaryFile(mode='w+b') as tpfile:
  print(tpfile)
  print(tpfile.file)

  tpfile.write(b'I will not stay long')
  tpfile.seek(0)
  print(tpfile.read())

The tempfile.TemporaryFile() call returns a file object (on POSIX platforms) or a file-like object on other platforms.
The default mode parameter is 'w+b'.
The binary mode helps to maintain consistency on all platforms regardless of the data stored.

If you need a temporary file to have a name and if you also need to control the file deletion, use tempfile.NamedTemporaryFile() instead. For example

import tempfile

# NamedTemporaryFile() always returns a file-like object 
# whose file attribute is the underlying true file object.
with tempfile.NamedTemporaryFile(mode='w+b', delete=True, delete_on_close=True) as named_tpfile:
  print(named_tpfile)
  print(named_tpfile.name)
  print(named_tpfile.file)

  named_tpfile.write(b'Set delete=False or delete_on_close=False ')
  named_tpfile.write(b'to control when I\'m deleted\n')

  named_tpfile.seek(0)

  print(named_tpfile.read())

Working with File-like Objects

StringIO and BytesIO

You can also create a file directly from text strings or bytes using io.StringIO() or io.ByteIO(), respectively. For example

import io

text_message = 'Hello this is a file created from raw strings'
file_from_raw_strings = io.StringIO(text_message)

print(type(file_from_raw_strings)) # <class '_io.StringIO'>

# print the text contents
print(file_from_raw_strings.getvalue())

byte_message = b'Hello! This is a file created from bytes'
file_from_bytes = io.BytesIO(byte_message)

print(type(file_from_bytes)) # <class '_io.BytesIO'>

# print the byte contents
print(file_from_bytes.getvalue())

This post on stackoveflow highlights the uses of io.StringIO().

Anywhere you need a file for processing, data transfer over the network, demo purposes, etc. For example, to make a CSV 'file' for testing with pandas:

# Copied from Stackoverflow :)
# https://stackoverflow.com/questions/7996479/what-is-stringio-in-python-used-for-in-reality
import io
import pandas as pd

f = io.StringIO("id,name\n1,brian\n2,amanda\n3,zoey\n")

# Pandas takes a file path or a file-like object 
df = pd.read_csv(f)

Network Response

The urllib.request.urlopen() method returns an HTTP response as a binary file-like object. Therefore, you can treat the response like a binary file. For example

import urllib.request

url = 'https://pandas.pydata.org/docs/user_guide/index.html'

with urllib.request.urlopen(url) as response:
  print(type(response)) # <class 'http.client.HTTPResponse'>

  content = response.read()

  print(type(content)) # <class 'bytes'>

  # Convert the bytes to text
  text_content = content.decode('utf-8')

  print(type(text_content)) # <class 'str'>

Here's an example from the Python documentation. The http response is copied directly to a named temporary file for further processing (since the delete was set to False).

import shutil
import tempfile
import urllib.request

with urllib.request.urlopen('http://python.org/') as response:
  with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
    shutil.copyfileobj(response, tmp_file)

# delete was set to False, so this is allowed
with open(tmp_file.name) as html:
  pass

The shutil module offers several high-level operations on files and directories. Do check it out!

The Terminal (Standard Input and Output)

Python provides the standard input and standard output as text streams through sys.stdin and sys.stdout.

The example below shows a program that reads from sys.stdin and writes to sys.stdout and a file.

import sys

stop_signals = {'q', 'quit', 'exit', 'stop'}

with open('from_stdin.txt', mode='w', encoding='utf-8') as file:
  # Read continuously from the terminal
  for line in sys.stdin:
    line = line.strip()

    if line in stop_signals:
      break

    # Print writes to sys.stdout by default
    # Therefore being explicit here is redundant
    print(line, file=sys.stdout)

    # Write to a given file using print instead of .write()
    # Flush: persist immediately
    print(line, file=file, flush=True)

You can use the fileinput module to replicate the example above by replacing sys.stdin with fileinput.input(encoding="utf-8"). Check out the module documentation for more information.

You can also use the input() function to take a user input from the terminal or sys.stdin. For example

name = input('what is your name: ')

print(f'Your name is {name}')

Getting File Attributes

To assess file attributes such as size, last access time, last modified time, and so on, use the os.stat() function. The required argument is either the file path or the file descriptor (returned by a call to .fileno()). For example

import os
from datetime import datetime

# Using the file stats
file_stats = os.stat('eggs.txt')

# Using the file descriptor of an opened file
with open('eggs.txt', mode='r', encoding='utf-8') as file:
  file_stats_2 = os.stat(f.fileno())

# They are the same
assert file_stats == file_stats_2

print(file_stats)
print(f'The file size is {file_stats.st_size} bytes')
print(f'The file was last accessed on {datetime.fromtimestamp(file_stats.st_atime)}')
print(f'The file was last modified on {datetime.fromtimestamp(file_stats.st_mtime)}')

Use datetime.fromtimestamp() to convert timestamps/seconds into human-readable format

st_size is the file size in bytes
st_atime is the time of most recent access expressed in seconds.
st_mtime is the time of the most recent content modification expressed in seconds.
st_ctime is the time of the most recent metadata change expressed in seconds.

Note that the os.path module provides similar functions such as os.path.getsize(), os.path.getatime(), os.path.getmtime(), and os.path.getctime().

Working with Directories

The os module provides several functions to work with directories. Some of these include:

os.getcwd() - Get the current working directory
os.listdir(directory) - list the content of a given directory. Default to ..
os.chdir(directory) - change/enter into a given directory
os.mkdir(directory) - Make a directory
os.remove(path) - Remove a file in the given path
os.unlink(path) - Remove a file in the given path
os.rmdir(directory) - Remove a directory. Throws an error if the directory does not exist or is not empty.
os.removedirs(name) - Remove directories recursively.

Example: Back Up Recently Modified Files

The program below backs up files modified within the last 24 hours. The explanation of the code is in the comments :)

#!/usr/bin/env python3

"""A Python script to backup files that have been created/modified in the last 24 hours 
   Adapted from a bash script from the Linux Commands & Shell Scripting Course on EDX
"""

import sys
import os
from os import path
from datetime import datetime, timedelta
import tarfile
import shutil

if __name__ == '__main__':
  # Make sure we get exactly two arguments
  # The first argument is the filename. So we check for three(3) arguments
  args = sys.argv
  if len(args) != 3:
    print('Usage: python backup.py target destination')
    sys.exit(1)

  # Unpack the arguments. 
  _, target, destination = args

  # Make sure both target and destination are valid directories
  if not path.isdir(target) or not path.isdir(destination):
    print('Invalid directories provided')
    sys.exit(1)

  # Let's save some variables

  # Get the current time and yesterday's
  current_ts = datetime.now()
  yesterday_ts = current_ts - timedelta(hours=24)

  # The absolute path to the base/root directory
  orig_abs_path = path.abspath(path.curdir)

  # The absolute path to the destination directory
  dest_abs_path = path.join(orig_abs_path, destination)

  # list of files to backup 
  to_backup = []


  # Go into the target directory
  os.chdir(target)

  for file in os.listdir(path.curdir):
    # Get the file stats
    file_stat = os.stat(file)

    # If the last modified date is greater than yesterday
    # add it to the backup list
    if datetime.fromtimestamp(file_stat.st_mtime) > yesterday_ts:
      print(f'Adding file: {file} to list')
      to_backup.append(file)

  # Open a tar.gz file for writing
  backup_file_name = f'backup-{current_ts.timestamp()}.tar.gz'
  with tarfile.open(backup_file_name, "w:gz") as tar:
    for name in to_backup:
      tar.add(name)

  # Move the tar.gz from target to destination directory
  print(f'moving {backup_file_name} to destination')
  shutil.move(backup_file_name, dest_abs_path)

You can extend the program to use zip instead of tar.gz by replacing the appropriate line with the following:

# ...

import zipfile

backup_file_name = f'backup-{current_ts.timestamp()}.zip'

# Write to a zip file
with zipfile.ZipFile(backup_file_name, mode='w') as zf:
  for name in to_backup:
    zf.write(name)

# ...

Do check the Python documentation to learn more about working with zip files, gzip files and shutil.

Summary

In this article, you saw how to read and write to files whether permanent or temporary, saved on disk, from the network, or the terminal. You also caught a glimpse of how to work with directories and how to compress files using tarfile and zipfile modules. Thank you for reading!

Top comments (1)

Mark Edosa • Nov 4 '23

Please, let me know if you think this article is too lengthy.

DEV Community