DEV Community

Pradeep Chhetri
Pradeep Chhetri

Posted on • Originally published at Medium on

Extending Python with Rust

Introduction:

Python is a great programming language but sometimes it can be a bit of slowcoach when it comes to performing certain tasks. That’s why developers have been building C/C++ extensions and integrating them with Python to speed up the performance. However, writing these extensions is a bit difficult because these low-level languages are not type-safe, so doesn’t guarantee a defined behavior. This tends to introduce bugs with respect to memory management. Rust ensures memory safety and hence can easily prevent these kinds of bugs.

Slow Python Scenario:

One of the many cases when Python is slow is building out large strings. In Python, the string object is immutable. Each time a string is assigned to a variable, a new object is created in memory to represent the new value. This contrasts with languages like Perl where a string variable can be modified in place. That’s why the common operation of constructing a long string out of several short segments is not very efficient in Python. Each time you append to the end of a string, the Python interpreter must allocate a new string object and copy the contents of both the existing string and the appended string into it. As the string under manipulation become large, this process can become increasingly slow.

Problem: Write a function which accepts a positive integer as argument and returns a string concatenating a series of integers from zero to that integer.

So let’s try solving the above problem in python and see if we can improve the performance by extending it via Rust.

Python Implementations:

Method I: Naive appending

This is the most obvious approach. Using the concatenate operator (+=) to append each segment to the string.

Method II: Build a list of strings and then join them

This approach is commonly suggested as a very pythonic way to do string concatenation. First a list is built containing each of the component strings, then in a single join operation a string is constructed containing all of the list elements appended together.

Method III: List comprehensions

This version is extremely compact and is also pretty understandable. Create a list of numbers using a list comprehension and then join them all together. This is just an abbreviated version of last approach and it consumes pretty much the same amount of memory.

Let’s measure the performance of each of these three approaches and see which one wins. We are going to do this using pytest-benchmark module.

Here is the result of the above benchmarks. Lower the value, better is the approach.

Just by looking at the Mean column, one can easily justify that the list comprehension approach is definitely the winner among three approaches.

Rust Implementations:

After trying out basic implementation of the above problem in Rust, and doing some rough benchmarking using cargo-bench, the result definitely looked promising. Hence, I decided to port the rust implementation as shared library using rust-cpython project and call it from python program.

To achieve this, I had create a rust crate with the following src/lib.rs.

Building the above crate created a . dylib file which needs to be rename .so.

Then, we ran the same benchmark including the rust one as before.

This time the result is more interesting.

The rust extension is definitely the winner. As you increase the number of iterations to even more, the result is even more promising.

Eg. for iterations = 1000, following are the benchmark results

Code:

You can find the code used in the post:

Conclusion:

I am very new to Rust but these results definitely inspires me to learn Rust more. If you know better implementation of above problem in Rust, do let me know.

PyO3 started a fork of rust-cpython but definitely has lot more active development and hence on my todo-list of experimentation.

Distributing of your python module will demand the rust extension to be compiled on the target system because of the variation of architecture. Milksnake is a extension of python-setuptools that allows you to distribute dynamic linked libraries in Python wheels in the most portable way imaginable.

Top comments (9)

Collapse
 
idkravitz profile image
Dmitry Kravtsov

Btw, you can do a single allocation approach with bytearray in Python. The benchmark should be interesting in comparison with compiled multi-allocation versions.

Collapse
 
p_chhetri profile image
Pradeep Chhetri

Sure, I will try that out. Thank you for the suggestions.

Collapse
 
idkravitz profile image
Dmitry Kravtsov • Edited

In addition to my suggestion, you don't actually need to calculate sum(log10(i)). The idea is simple: the concatenation of numbers 0..9 is 10*1 chars in length, for 0..100 is 10*1 + 90*2 chars in length, for 0..1000 it is 10*1 + 90*2 + 900*3, and so on. So you can tell what your resulting string length almost in O(log10(n)) calculations, which is super fast.

Thread Thread
 
p_chhetri profile image
Pradeep Chhetri

Awesome thanks a lot Dmitry.

Thread Thread
 
idkravitz profile image
Dmitry Kravtsov

You're wellcome. One last thing - as far as I know Rust doesn't have a C-equivalent of itoa, in other words there is no function in Rust stdlib to convert number to string in preallocated buffer. But there is an itoa crate which does exactly that, you can read about it here: docs.rs/itoa/0.4.5/itoa/

I feel strangely excited about messing with these benchmarks, so I may share my results in a few days when I'll try these things.

Thread Thread
 
p_chhetri profile image
Pradeep Chhetri

After reading all your comments, i am very excited. I am new to rust but your feedback is very motivating. I'll work on implementing your feedbacks. Thank you very much. Looking forward to your benchmarks too.

Thread Thread
 
idkravitz profile image
Dmitry Kravtsov

So I implemented a single allocation version in Rust with itoa crate. The results exceeded my expectations:

Name (time in us) Mean
test_rust_concat_mutable 7.9843 (1.0)
test_rust_concat2 90.8439 (11.38)
test_rust_concat 92.4594 (11.58)
test_concat_comprehension 202.4457 (25.36)
test_concat_join 238.2960 (29.85)
test_concat_basic 360.8961 (45.20)

Mutable implementation can be found in this gist

Thread Thread
 
p_chhetri profile image
Pradeep Chhetri

Nice Dmitry, this looks awesome.

Collapse
 
idkravitz profile image
Dmitry Kravtsov • Edited

Well, you benefit nothing from string mutability in your implementation, since basically you're constantly appending to a string. Which would result in O(n) or O(log(n)) allocations, depending on actual Rust string implementation. The much faster approach (if allocations are O(n), for log(n) the difference would be much smaller) would be to estimate the resulting string length (it's a sum i = from 0 to n floor(log_10(i))). Allocate string just once and fill it with data.