DEV Community

Dmitrii
Dmitrii

Posted on

Performance Optimisation of Python Applications

Python's popularity stems from its easy-to-grasp syntax, a vibrant community offering plenty of resources, a wealth of libraries and frameworks for nearly any project, and its impressive versatility and efficiency. It's no wonder everyone from beginners to tech giants is drawn to it! However, while it is extremely beginner-friendly, the more savvy and seasoned users at some point might face performance obstacles. With its flexibility, Python applications can sometimes suffer from operation bottlenecks. This makes performance optimisation a crucial task, and the tips for it become a necessity.

This article is a comprehensive list of tips and tricks for Python performance enhancement based on my own experience, which, I hope, will come in handy for you.

What Might Be The Source of the Problem?

Python is widely recognised for its simplicity, readability, and versatility. However, with all its conveniences, it is important to understand that Python may not always be the optimal choice for all scenarios, especially when it comes to performance-intensive tasks.

Dynamically Typed Nature:
One of the defining features of Python is that it's a dynamically typed rather than statically typed language. This means variables don't require an explicit type declaration before they're used. While this boosts flexibility and development speed, it can also lead to inefficiencies. Static-type languages have the advantage when it comes to performance, as the type-checking at compile-time can lead to more optimised machine code.

Interpreted Language:
That simply means that it executes code line-by-line. While this provides greater flexibility and ease of debugging, it can be slower than compiled languages that convert code into machine language before execution.

Global Interpreter Lock (GIL):
Python uses the Global Interpreter Lock (GIL) to handle multi-threading. This means that only one thread executes Python bytecode at a time, even in multi-core systems. This can be a bottleneck for CPU-bound applications that could benefit from true multi-threading. However, this may be a problem in extremely rare cases. In most of them, the experience of the developer themself plays a significant role.

Flexibility Over Performance:
Python's design has often prioritised developer convenience and readability over raw performance. For example, Python’s lists and dictionaries are incredibly versatile, allowing for mixed types and dynamic resizing. However, this flexibility can come at a performance cost compared to more restrictive but optimised data structures in other languages.

Memory Consumption:
Python's objects and data structures tend to consume more memory, which can be a limitation in memory-bound tasks.

Optimisation Tools

Fine-tuning the applications helps make them more efficient, responsive, and, most importantly, user-friendly. In Python, this task is made easier thanks to a wide range of tools and techniques designed specifically for optimisation. So, before addressing the hacks, let's deepen into some of the most prominent tools created for boosting your Python.

Profiling Tools
Profiling is the first step towards optimisation. It gives us a clear picture of where our code spends most of its time.

Profilers come in different types, each serving specific needs. Sampling profilers collect statistical data at regular intervals, offering a broad overview of where an application spends its time, with minimal overhead expenses on the application's execution. Tracing, on the other hand, provides accurate information about how frequently methods are executed but can impact performance and take longer to analyse post-collection. Instrumentation profiling dives deeper by either injecting code or using callback hooks to gather precise timing and call count details, although with a higher overhead than sampling.

Let’s look at the different types of profilers.

cProfile:
cProfile is a part of Python's standard library and is an instrumentation profiler that gives a comprehensive breakdown of function calls, including the number of calls and the time spent in each function. It can be invoked directly from the command line or within your Python script. The output can be sorted based on different parameters, which offers flexibility in analysing the results.

Py-Spy:
Py-Spy is a sampling profiler for Python applications. It runs in the background and sometimes samples the target program to provide insights.

It can be used with any running Python program without any code changes. Its strength lies in the real-time visualisation of profiling data. Also, since it's a sampling profiler, it has minimal performance impact on the target program.

memory-profiler:
As suggested by the name, this tool is specifically designed to profile memory usage. It can be incorporated into Python scripts using the @profile decorator. Memory-profiler provides line-by-line memory usage of a Python program, helping identify memory leaks or areas of inefficiency in terms of memory consumption. Memory-profiler can be seen as a hybrid: it mainly uses a sampling approach for general memory usage statistics, but offers tracing-like granularity for specific functions if requested.

Benchmarking Tools

Once you've profiled and identified potential bottlenecks, measuring the performance and comparing the impact of your optimisations is essential. Benchmarking tools come in handy here.

timeit:
Timeit is a small utility available both as a command-line tool and a Python module. It allows you to time small bits of Python code.

Timeit can be employed interactively or within a script to measure the execution time of small code snippets. It also helps in accurately measuring the execution time of small bits of Python code, abstracting away many external factors that could affect performance measurements.

pyperf:
Pyperf is the go-to tool if you're looking to benchmark your Python code. It's packed with features, from spotting iffy results to tracking memory use, and even lets you compare different bits of code or dive deep into one function's stats. Plus, it's the same tool the big Python pros use in the Performance Benchmark Suite.

Remember that the list is not limited to these profiling and benchmarking tools. There are a lot of other instruments like profilehooks, line_profiler, airspeed velocity, and many others which have already earned their fame, and those which are just entering significant positions.

Visualising Tools

Visualising performance metrics is crucial for understanding the complexity of any system, especially for Python, where performance bottlenecks can arise in unexpected places.

For example, profilers generate comprehensive data that captures a wide array of metrics. While this data is rich and detailed, it can be overwhelming and challenging to interpret in its unprocessed form. Trying to manually go through raw profiler output is not only tedious but can also lead to missed insights or misinterpretations.

FlameGraph
FlameGraphs are a visualisation of profiled software, allowing the most frequent code paths to be identified quickly and accurately.

Each "flame" or bar in the graph represents a function. The width of the bar indicates the total time spent in that function, including calls to other functions. Modern FlameGraph tools are often interactive, allowing users to zoom in on specific parts of the stack trace or to get more information about individual functions.

Image description Example of a Flame Graph illustrating the CPU-intensive codepaths in MySQL. Source: Roman Imankulov’s blog

FlameGraphs provides a quick and clear overview of which parts of the codebase are consuming the most resources. This makes them an invaluable tool when trying to identify bottlenecks or performance issues.

SnakeViz
SnakeViz is a browser-based graphical viewer specifically designed for the output from Python's cProfile module. It provides an interactive sunburst or icicle visualisation of profiled Python code. You can click on segments to zoom in, allowing for a detailed investigation of how time is distributed among functions.

Being browser-based, SnakeViz provides a platform-independent way to view profiling results. This is especially useful for teams or projects where developers might be using different operating systems or environments.

Image description An example of Icicle in SnakeViz. Source: Matt Davis’ (@jiffyclub) GitHub

Image description An example of Sunburst in SnakeViz. Source: Matt Davis’ (@jiffyclub) GitHub

Given the complexity of raw cProfile output, SnakeViz offers an intuitive way to make sense of the data. Through translating textual data into visual form, developers can easily identify performance hotspots and understand the flow of program execution.

Boosting Memory Use

When writing software, especially in languages like Python, it's easy to overlook how much memory is being consumed. Let me show some strategies to keep your Python application's memory footprint in check.

Using Arrays Rather Than Lists for Large Numerical Datasets

Lists are versatile in Python but can be memory hogs, especially with large numerical datasets. Consider this:

  • Arrays, unlike lists, store data in blocks of memory. This makes them more memory-efficient for storing large volumes of numerical data.
  • The array module in Python lets you create dense arrays of a uniform type. If you're working with vast amounts of data, switching to arrays can save significant memory.

Utilising Appropriate Data Structures
Picking the right data structure can be the difference between an efficient and a memory-bloated application. Consider further:

  • Data structures like sets are faster for membership tests compared to lists.
  • dicts, although efficient for lookups, can be memory-intensive. Consider alternatives like collections.namedtuple if you only need to store a few fields.

Lazy loading and generators
Not everything needs to be loaded all at once. Using lazy loading techniques assists in:

  • Memory is consumed only when it's needed.
  • Generators in Python, which yield items one by one using the yield keyword, are perfect for this. Instead of loading an entire list into memory, generators produce items on the fly, dramatically reducing memory usage.

Garbage collection tuning
Python's garbage collector helps reclaim memory from objects that are no longer in use. But it's not always perfect for a number of reasons:

  • Periodically, it's beneficial to manually run the garbage collector using gc.collect(), especially after deleting large objects.
  • If you're sure about object lifetimes, consider tweaking garbage collector settings or even temporarily disabling it for specific code blocks for performance gains.

A Few More Ways to Speed Up

NumPy for Numerical Computations
NumPy, Python's numerical computing, stores data more efficiently than standard Python lists by using a contiguous memory block, ensuring faster access and a smaller memory footprint. Unlike lists where each element is an individual Python object, NumPy arrays consist of a single object pointing to a block of similar data types. This structure not only streamlines memory usage but also enables vectorised operations, allowing NumPy to utilise the power of Advanced Vector Extensions (AVX) for simultaneous processing of data, elaborating remarkable performance boosts in numerical operations.

Image description The distinction between a dynamically typed list and a statically typed array (NumPy-style)

Writing C-Extensions

When working in C, developers can give up on some of Python's dynamic features to gain speed. For instance, static typing in C can offer faster execution times compared to Python's dynamic typing.

Thus. developers can write parts of their Python program in C to achieve faster performance. These C modules can then be imported into Python code as if they were regular Python modules. This method provides performance benefits as well as allows developers to manage the GIL manually at the C level. Through raising and lowering the GIL as required, one can achieve better concurrency in certain scenarios.

Here then comes Cython, a programming language that makes it easy to write C extensions for Python. It's essentially Python with additional syntax allowing for static type declarations, which can be compiled into C and then into machine code, offering performance improvements.

Just-In-Time Compilation (JIT) with Numba
Numba is a JIT compiler that translates a subset of Python and NumPy code into fast machine code using industry-standard LLVM compiler library. By using decorators, Python functions can be optimised for performance without extensive code changes.

Conclusion

It's important to recognise the efforts the community has invested in improving Python's performance over the years. The Faster CPython project is one such initiative aimed at enhancing the speed of the default Python interpreter. As a result, before looking into optimisations or rewriting parts of your code, you can check the Python version you're using. The latest versions may offer performance improvements that could suffice for your needs.

The quest for optimising Python applications highlights the crucial nature of the need to strike a balance between readability and performance. By keeping a pulse on performance and regularly refining the code, you can ensure the Python applications remain both agile and ready to adapt.

Top comments (0)