Martin Heinz

Posted on Feb 28, 2022 • Originally published at martinheinz.dev

Optimizing Memory Usage of Python Applications

#python #programming #performance #tutorial

When it comes to performance optimization, people usually focus only on speed and CPU usage. Rarely is anyone concerned with memory consumption, well, until they run out of RAM. There are many reasons to try to limit memory usage, not just avoiding having your application crash because of out-of-memory errors.

In this article we will explore techniques for finding which parts of your Python applications are consuming too much memory, analyze the reasons for it and finally reduce the memory consumption and footprint using simple tricks and memory efficient data structures.

Why Bother, Anyway?

But first, why should you bother saving RAM anyway? Are there really any reason to save memory other than avoiding the aforementioned out-of-memory errors/crashes?

One simple reason is money. Resources - both CPU and RAM - cost money, why waste memory by running inefficient applications, if there are ways to reduce the memory footprint?

Another reason is the notion that "data has mass", if there's a lot of it, then it will move around slowly. If data has to be stored on disk rather than in RAM or fast caches, then it will take a while to load and get processed, impacting overall performance. Therefore, optimizing for memory usage might have a nice side effect of speeding-up the application runtime.

Lastly, in some cases performance can be improved by adding more memory (if application performance is memory-bound), but you can't do that if you don't have any memory left on the machine.

Find Bottlenecks

It's clear that there are good reasons to reduce memory usage of our Python applications, before we do that though, we first need to find the bottlenecks or parts of code that are hogging all the memory.

First tool we will introduce is memory_profiler. This tool measures memory usage of specific function on line-by-line basis:

# https://github.com/pythonprofilers/memory_profiler
pip install memory_profiler psutil
# psutil is needed for better memory_profiler performance

python -m memory_profiler some-code.py
Filename: some-code.py

Line #    Mem usage    Increment  Occurrences   Line Contents
============================================================
    15   39.113 MiB   39.113 MiB            1   @profile
    16                                          def memory_intensive():
    17   46.539 MiB    7.426 MiB            1       small_list = [None] * 1000000
    18  122.852 MiB   76.312 MiB            1       big_list = [None] * 10000000
    19   46.766 MiB  -76.086 MiB            1       del big_list
    20   46.766 MiB    0.000 MiB            1       return small_list

To start using it, we install it with pip along with psutil package which significantly improves profiler's performance. In addition to that, we also need to mark the function we want to benchmark with @profile decorator. Finally, we run the profiler against our code using python -m memory_profiler. This shows memory usage/allocation on line-by-line basis for the decorated function - in this case memory_intensive - which intentionally creates and deletes large lists.

Now that we know how to narrow down our focus and find specific lines that increase memory consumption, we might want to dig a little deeper and see how much each variable is using. You might have seen sys.getsizeof used to measure this before. This function however will give you questionable information for some types of data structures. For integers or bytearrays you will get the real size in bytes, for containers such as list though, you will only get size of the container itself and not its contents:

import sys
print(sys.getsizeof(1))
# 28
print(sys.getsizeof(2**30))
# 32
print(sys.getsizeof(2**60))
# 36

print(sys.getsizeof("a"))
# 50
print(sys.getsizeof("aa"))
# 51
print(sys.getsizeof("aaa"))
# 52

print(sys.getsizeof([]))
# 56
print(sys.getsizeof([1]))
# 64
print(sys.getsizeof([1, 2, 3, 4, 5]))
# 96, yet empty list is 56 and each value inside is 28.

We can see that with plain integers, everytime we cross a threshold, 4 bytes are added to the size. Similarly, with plain strings, everytime we add another character one extra byte is added. With lists however, this doesn't hold up - sys.getsizeof doesn't "walk" the data structure and only returns size of the parent object, in this case list.

Better approach is to use specific tool designed for analyzing memory behaviour. One such is tool is Pympler, which can help you get more realistic idea about Python object sizes:

# pip install pympler
from pympler import asizeof
print(asizeof.asizeof([1, 2, 3, 4, 5]))
# 256

print(asizeof.asized([1, 2, 3, 4, 5], detail=1).format())
# [1, 2, 3, 4, 5] size=256 flat=96
#     1 size=32 flat=32
#     2 size=32 flat=32
#     3 size=32 flat=32
#     4 size=32 flat=32
#     5 size=32 flat=32

print(asizeof.asized([1, 2, [3, 4], "string"], detail=1).format())
# [1, 2, [3, 4], 'string'] size=344 flat=88
#     [3, 4] size=136 flat=72
#     'string' size=56 flat=56
#     1 size=32 flat=32
#     2 size=32 flat=32

Pympler provides asizeof module with function of same name which correctly reports size of the list as well all values it contains. Additionally, this module also has asized function, that can give us further size breakdown of individual components of the object.

Pympler has many more features though, including tracking class instances or identifying memory leaks. In case these are something that might be needed for your application, then I recommend checking out tutorials available in docs.

Saving Some RAM

Now that we know how to look for all kinds of potential memory issues, we need to find a way to fix them. Potentially, quickest and easiest solution can be switching to more memory-efficient data structures.

Python lists are one of the more memory-hungry options when it comes to storing arrays of values:

from memory_profiler import memory_usage

def allocate(size):
    some_var = [n for n in range(size)]

usage = memory_usage((allocate, (int(1e7),)))  # `1e7` is 10 to the power of 7
peak = max(usage)
print(f"Usage over time: {usage}")
# Usage over time: [38.85546875, 39.05859375, 204.33984375, 357.81640625, 39.71484375]
print(f"Peak usage: {peak}")
# Peak usage: 357.81640625

The simple function above (allocate) creates a Python list of numbers using the specified size. To measure how much memory it takes up we can use memory_profiler shown earlier which gives us amount of memory used in 0.2 second intervals during function execution. We can see that generating list of 10 million numbers requires more than 350MiB of memory. Well, that seems like a lot for a bunch of numbers. Can we do any better?

import array

def allocate(size):
    some_var = array.array('l', range(size))

usage = memory_usage((allocate, (int(1e7),)))
peak = max(usage)
print(f"Usage over time: {usage}")
# Usage over time: [39.71484375, 39.71484375, 55.34765625, 71.14453125, 86.54296875, 101.49609375, 39.73046875]
print(f"Peak usage: {peak}")
# Peak usage: 101.49609375

In this example we used Python's array module, which can store primitives, such as integers or characters. We can see that in this case memory usage peaked at just over 100MiB. That's a huge difference in comparison to list. You can further reduce memory usage by choosing appropriate precision:

import array
help(array)

#  ...
#  |  Arrays represent basic values and behave very much like lists, except
#  |  the type of objects stored in them is constrained. The type is specified
#  |  at object creation time by using a type code, which is a single character.
#  |  The following type codes are defined:
#  |
#  |      Type code   C Type             Minimum size in bytes
#  |      'b'         signed integer     1
#  |      'B'         unsigned integer   1
#  |      'u'         Unicode character  2 (see note)
#  |      'h'         signed integer     2
#  |      'H'         unsigned integer   2
#  |      'i'         signed integer     2
#  |      'I'         unsigned integer   2
#  |      'l'         signed integer     4
#  |      'L'         unsigned integer   4
#  |      'q'         signed integer     8 (see note)
#  |      'Q'         unsigned integer   8 (see note)
#  |      'f'         floating point     4
#  |      'd'         floating point     8

One major downside of using array as data container is that it doesn't support that many types.

If you plan to perform a lot of mathematical operations on the data, then you're probably better off using NumPy arrays instead:

import numpy as np

def allocate(size):
    some_var = np.arange(size)

usage = memory_usage((allocate, (int(1e7),)))
peak = max(usage)
print(f"Usage over time: {usage}")
# Usage over time: [52.0625, 52.25390625, ..., 97.28515625, 107.28515625, 115.28515625, 123.28515625, 52.0625]
print(f"Peak usage: {peak}")
# Peak usage: 123.28515625

# More type options with NumPy:
data = np.ones(int(1e7), np.complex128)
# Useful helper functions:
print(f"Size in bytes: {data.nbytes:,}, Size of array (value count): {data.size:,}")
# Size in bytes: 160,000,000, Size of array (value count): 10,000,000

We can see that NumPy arrays also perform pretty well when it comes to memory usage with peak array size of ~123MiB. That's a bit more than array but with NumPy you can take advantage of fast mathematical functions as well as types that are not supported by array such as complex numbers.

The above optimizations help with overall size of arrays of values, but we can make some improvements also to the size of the individual objects defined by Python classes. This can be done using __slots__ class attribute which is used to explicitly declare class properties. Declaring __slots__ on a class also has a nice side effect of denying creation of __dict__ and __weakref__ attributes:

from pympler import asizeof

class Normal:
    pass

class Smaller:
    __slots__ = ()

print(asizeof.asized(Normal(), detail=1).format())
# <__main__.Normal object at 0x7f3c46c9ce50> size=152 flat=48
#     __dict__ size=104 flat=104
#     __class__ size=0 flat=0

print(asizeof.asized(Smaller(), detail=1).format())
# <__main__.Smaller object at 0x7f3c4266f780> size=32 flat=32
#     __class__ size=0 flat=0

Here we can see how much smaller the Smaller class instance actually is. The absence of __dict__ removes whole 104 bytes from each instance which can save huge amount of memory when instantiating millions of values.

The above tips and tricks should be helpful in dealing with numeric values as well as class objects. What about strings, though? How you should store those generally depends on what you intend to do with them. If you're going to search through huge number of string values, then - as we've seen - using list is very bad idea. set might be a bit more appropriate if execution speed is important, but will probably consume even more RAM. Best option might be to use optimized data structure such as trie, especially for static data sets which you use for example for querying. As is common with Python, there's already a library for that, as well as for many other tree-like data structures, some of which you will find at https://github.com/pytries.

Not Using RAM At All

Easiest way to save RAM is to not use it in a first place. You obviously can't avoid using RAM completely, but you can avoid loading full data set at once and instead work with the data incrementally where possible. The simplest way to achieve this is by using generators which return a lazy iterator, which computes elements on demand rather than all at once.

Stronger tool that you can leverage is memory-mapped files, which allows us to load only parts of data from a file. Python's standard library provides mmap module for this, which can be used to create memory-mapped files which behave both like files and bytearrays. You can use them both with file operations like read, seek or write as well as string operations:

import mmap

with open("some-data.txt" "r") as file:
    with mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) as m:
        print(f"Read using 'read' method: {m.read(15)}")
        # Read using 'read' method: b'Lorem ipsum dol'
        m.seek(0)  # Rewind to start
        print(f"Read using slice method: {m[:15]}")
        # Read using slice method: b'Lorem ipsum dol'

Loading/reading memory-mapped file is very simple. We first open the file for reading as we usually do. We then use file's file descriptor (file.fileno()) to create memory-mapped file from it. From there we can access its data both with file operations such as read or string operations like slicing.

Most of the time, you will be probably interested more reading the file as shown above, but it's also possible to write to the memory-mapped file:

import mmap
import re

with open("some-data.txt", "r+") as file:
    with mmap.mmap(file.fileno(), 0) as m:
        # Words starting with capital letter
        pattern = re.compile(rb'\b[A-Z].*?\b')

        for match in pattern.findall(m):
            print(match)
            # b'Lorem'
            # b'Morbi'
            # b'Nullam'
            # ...

        # Delete first 10 characters
        start = 0
        end = 10
        length = end - start
        size = len(m)
        new_size = size - length
        m.move(start, end, size - end)
        m.flush()
    file.truncate(new_size)

First difference in the code that you will notice is the change in access mode to r+, which denotes both reading and writing. To show that we can indeed perform both reading and writing operations, we first read from the file and then use RegEx to search for all the words that start with capital letter. After that we demonstrate deletion of data from the file. This is not as straightforward as reading and searching, because we need to adjust size of the file when we delete some of its contents. To do so, we use move(dest, src, count) method of mmap module which copies size - end bytes of the data from index end to index start, which in this case translates to deletion of first 10 bytes.

If you're doing computations in NumPy, then you might prefer its memmap features (docs) which is suitable for NumPy arrays stored in binary files.

Closing Thoughts

Optimizing applications is difficult problem in general. It also heavily depends on the task at hand as well as the type of data itself. In this article we looked at common ways to find memory usage issues and some options for fixing them. There are however many other approaches to reducing memory footprint of an application. This includes trading accuracy for storage space by using probabilistic data structures such as bloom filters or HyperLogLog. Another option is using tree-like data structures like DAWG or Marissa trie which are very efficient at storing string data.

DEV Community