Honeybadger Staff for Honeybadger

Posted on Apr 5, 2024 • Originally published at honeybadger.io

Reducing your Python app's memory footprint

#django #python

This article was originally written by Michael Barasa on the Honeybadger Developer Blog.

Many developers focus on developing core application functionalities and pay little or no attention to memory management until they run out of memory, and their apps start crashing, freezing, or experiencing random performance downgrades.

Computers have limited RAM, and it’s always best to make effective use of allocated resources. Trying to run a high memory-consuming app on a low-spec computer could cause it to crash, negatively impacting the user experience. Furthermore, a high memory footprint may also affect the performance of other apps and background services. When running a high memory-consuming app on the cloud, where resources are measured and charged for their use, you will likely end up with an expensive bill.

A high memory footprint can lead to undesirable consequences. Keep reading to learn what memory management entails and discover tips on lowering your Python app's memory footprint.

What is memory management, and why is it important?

Memory management is a complex process involving freeing and allocating computer memory to different programs, ensuring that the system operates efficiently. For example, when you launch a program, the computer has to allocate enough memory, and when the application is closed, the system frees memory and allocates it to another program.

Memory management has numerous benefits. First, it ensures that applications have the required resources to operate. The computer allocates memory to active processes and releases memory from inactive programs, which indicates effective memory utilization.

Second, proper memory management contributes to system stability. Since the computer handles memory allocation automatically, applications will always have access to the required memory, which reduces issues, such as random crashes and shutdowns. Memory management techniques, such as garbage collection, can assist in preventing memory leaks.

Third, memory management leads to better performance optimization. By continuously releasing and allocating memory, applications always have access to resources, which means they can quickly launch and execute.

Each application has a memory footprint, which refers to the amount of memory it consumes. A high memory footprint indicates an app is using a lot of memory, while a low footprint means it has low consumption.

Although computer systems can manage memory automatically, as a developer, you still have to keep your app's memory footprint in check. Using memory-intensive functions and inefficient data structures could cause your software to run out of memory, freeze, and even crash.

In the following section, we’ll explain how to measure your app's memory consumption. Later, we’ll discuss tips for lowering your memory footprint.

How to measure memory usage in Python

You can use any of the following methods to measure the amount of memory your application is using.

The psutil library

psutil is a Python library for fetching useful information about system utilization and active processes. Among other uses, the psutil library allows you to monitor memory, CPU, disk, and network usage.

To demonstrate how the psutil library works, we will use the following Python program that checks whether an integer is a prime number.

number = int(input("Please enter a number: "))

if number == 1:
    print(num, "is not a prime number")
elif number > 1:
   # check for factors
   for i in range(2,number):
       if (number % i) == 0:
           print(number,"is not a prime number")
           print(i,"times",number//i,"is",number)
           break
   else:
       print(number,"is a prime number")

# if the input number is less than or equal to 1, it is not prime
else:
   print(number,"is not a prime number")

We can check the above program's memory footprint by importing the psutil module and adding the following function in the code.

import psutil # import Python psutil module

def memory_usage():
    process = psutil.Process()
    usage = process.memory_info().rss 
    # Using memory_info() to check consumption
    return usage # Returning the memory in bytes

When you include and run the above function, it will show that the program uses 25886KB of memory.

Resource module

We can also use the resource module, specifically the getrusage() function, to check the amount of memory a program is using. We use it as follows:

import resource

def memory_usage():
    usage = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
    return usage

The sys module

The sys module also has getsizeof([]) and getallocatedblocks() methods, which allow you to check a program's memory footprint and the allocated number of memory blocks, respectively. The sys library can provide valuable insight for debugging purposes.

Here is how you can use the sys module in your code.

import sys

def memory_usage():
    usage = sys.getsizeof([])
    return usage

Third-party libraries

Apart from in-built functions, you can also utilize third-party libraries, such as memory_profiler, pympler, or objgraph, to measure an app's memory footprint.

Common causes of high memory usage

A large memory footprint can lead to undesirable consequences, including random freezes, crashes, and, ultimately, a bad user experience. We’ll cover the common causes of high memory usage in the following sections.

Memory leaks

The term memory leak refers to a situation where memory is allocated to a particular task but is not released upon completion of the process. This means that your application is not running efficiently. The amount of available memory is also reduced significantly.

Memory leaks can lead to performance downgrades. Apart from your application freezing or crashing, other background services may become inoperable. Furthermore, as more apps demand memory, the computer system may be forced to close down certain processes.

External dependencies

Although third-party libraries allow us to add numerous functionalities to our applications without creating everything from scratch, they may cause high memory consumption in an app.

For example, some libraries do not free up memory spaces when a task is completed or continuously run unnecessary background processes, which strains the available resources.

Large datasets

Python is a popular programming language for data analysis, machine learning, and artificial intelligence. Training AI algorithms require a considerable amount of data and memory. If you train an AI model on an underpowered CPU, it may crash or cause your computer to freeze.

Unoptimized code

Not using the garbage collector effectively, defining and storing too many objects in memory, and using the wrong datatypes could increase your app's memory footprint.

Tips to lower your app’s memory footprint

Now that we know the common causes of high memory usage and how to measure memory consumption, let's look at how to lower your app's memory footprint.

Use generators instead of lists

Although extremely useful, lists usually consume lots of memory, especially when they store many values. When the list is called, each value is loaded into memory and used by the application. Generators are like lists, but with one distinction; they support lazy loading. Thus, values stored in generators are retrieved only when needed.

Let's compare the memory consumption of lists and generators.

Here is a list that stores values between 0 and 999:

import sys

list = [i for i in range(1000)] # Stores values from 0 to 999
print(list) # We print values in the list
print(sum(list)) # We calculate the sum of the values in the list
print(f"The list consumes {sys.getsizeof(list)} bytes") # We check the amount of memory the list has taken.

When you run the above code, it shows that the list consumes about 920 bytes of memory.

In the following code sample, we use a generator instead of a list.

import sys

generatorlist = (i for i in range(1000)) # Stores values from 0 to 999
print(generatorlist) # We print values in the list
print(sum(generatorlist)) # We calculate the sum of the values in the list
print(f"The generator consumes {sys.getsizeof(generatorlist)} bytes") # We check the amount of memory the list has taken.

When the above code is executed, the generator consumes only 104 bytes of memory. Thus, generators are significantly more efficient than lists.

Read data in smaller chunks

As discussed, dealing with large datasets can be memory intensive. The computer has to allocate enough resources to process and store all file contents, meaning there is a chance of your application slowing down, freezing, or even crashing completely.

You can lower an app's memory footprint by reading data in smaller chunks compared to loading entire datasets in memory. This technique allows you to analyze data quickly without experiencing major performance issues.

For example, the following code is not memory efficient since we are loading our entire datasets ('employee.csv') into memory.

import pandas as pd

def readEmployeeData():
    df = pd.read_csv('employees.csv')['FIRST_NAME']
    print(df.value_counts())

We can save memory by defining a chunksize, or the number of rows our program should read from the dataset in one go, as demonstrated below.

import pandas as pd

def readDataInChunks():
    result = None
    for chunk in pd.read_csv("employees.csv", chunksize=200): #Setting the chunksize to 200 rows
        employees = chunk["FIRST_NAME"]
        chunk_result =  employees.value_counts()
        if result is None:
            result = chunk_result
        else:
            result = result.add(chunk_result, fill_value=0)

    result.sort_values(ascending=False, inplace=True)
    print(result)

readDataInChunks()

In the above code, we read and compute information from a smaller dataframe or chunk, which is more memory efficient. We save the results from the computation in a list and then proceed to analyze the next chunk of data, until we've analyzed the entire dataset.

Use memory-efficient dependencies

Before importing and using a third-party library in your project, research its key features and reliability. Ask questions, such as how much memory the library uses, and determine whether there are possible memory leaks. Being involved in online tech communities, such as Stack Overflow, can help you access valuable information much faster.

Use memory-profiling tools

It's a good idea to use memory profiling tools, such as memory_profiler, valgrind, and pympler, to measure an app’s memory footprint before pushing your application to production. This step ensures you're not caught off-guard and avoid negatively impacting the user experience.

For example, let's see how we can use memory_profiler to analyze memory consumption.

We can simply install memory_profiler with the following command.

$ pip install -U memory_profiler

Once the dependency is installed, add the @profile annotation above the function you wish to analyze.

import pandas as pd

@profile #Adding the @profile annotation
def readDataInChunks():
    result = None
    for chunk in pd.read_csv("employees.csv", chunksize=20):
        employees = chunk["FIRST_NAME"]
        chunk_result =  employees.value_counts()
        if result is None:
            result = chunk_result
        else:
            result = result.add(chunk_result, fill_value=0)

    result.sort_values(ascending=False, inplace=True)
    print(result)

readDataInChunks()

We can then execute the program with the following command.

python -m memory_profiler example.py

Alongside the program's log results, you should see the following output. You can use this information to optimize certain portions of your code.

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     9   56.156 MiB   56.156 MiB           1   @profile
    10                                         def readDataInChunks():
    11   56.160 MiB    0.004 MiB           1       result = None
    12   57.566 MiB    1.133 MiB           4       for chunk in pd.read_csv("employees.csv", chunksize=20):
    13   57.555 MiB    0.062 MiB           3           employees = chunk["FIRST_NAME"]
    14   57.555 MiB    0.090 MiB           3           chunk_result =  employees.value_counts()
    15   57.555 MiB    0.000 MiB           3           if result is None:
    16   57.152 MiB    0.000 MiB           1               result = chunk_result
    17                                                 else:
    18   57.562 MiB    0.121 MiB           2               result = result.add(chunk_result, fill_value=0)
    19
    20   57.570 MiB    0.004 MiB           1       result.sort_values(ascending=False, inplace=True)
    21   57.621 MiB    0.051 MiB           1       print(result)

Conclusion

As you work on a software project, having a low memory footprint should be on the top of your list and not just an afterthought. Applications with low memory consumption can experience fewer crashes and freezes and, thus, improve the overall user experience.

Using generators instead of lists, avoiding memory-intensive libraries, and reading data in smaller chunks are some helpful tips for lowering your app's memory footprint.

DEV Community