DEV Community

Donovan
Donovan

Posted on

Investigating Pydantic v2's Bold Performance Claims

Motivating Example

The other day I was experimenting with an application I wrote about a year ago that relies on a set of rules to process some data. Due to some concerns with ease of use and extensibility, I decided to redesign the rule schema and use Pydantic to parse them instead of Python's builtin dataclass.

Once the new schema was finished, I ran some performance tests which showed the new schema led to a 70% decrease in application performance. Part of this was due to changes in the structure of the rules, but the other aspect was switching to Pydantic. I decided to do some performance testing to see how much of a difference this change made.

Pydantic Overview

If you work with backend APIs in Python, you've probably used or heard of Pydantic, perhaps from FastAPI. The library advertises itself as "data validation and settings management using Python type annotations" and it makes (de)serialization of data a breeze.1

Consider the following example using Python's builtin dataclass:

from dataclasses import dataclass

@dataclass 
class User:
    id: int
    name: str
Enter fullscreen mode Exit fullscreen mode

If you pass data to this object's constructor that doesn't match the specified types, Python will still gladly create the object for you:

user = User(**{"id": "ABC", "name": 39})
print(user)
# > User(id='ABC', name=39)
Enter fullscreen mode Exit fullscreen mode

Using Pydantic instead, we will get an error2 if we try the same thing:

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str

user = User(**{"id": "ABC", "name": 39})
# > ValidationError: 1 validation error for User
# > id
# >   value is not a valid integer (type=type_error.integer)
Enter fullscreen mode Exit fullscreen mode

Hopefully this illustrates one of the many benefits of using Pydantic. I invite you to read the documentation to see the full capabilities of the library beyond this contrived example.

New Performance Claims

Pydantic's capabilities can make your programs more resilient and easier to read and write, but you may sacrifice some performance for these benefits. To address these concerns, the creators of Pydantic endeavored to rewrite the backend "with validation and serialisation logic implemented in Rust" to boost performance by "5-50x" over v1 in the new major version coming soon.3

For those unfamiliar, Pydantic is currently implemented in Python and this rewrite shifts most of the code to Rust, a systems programming language touted as "blazingly fast" and safe.

If you're familiar with Rust then these types of gains may not seem unreasonable, but "5-50x" is still a big claim, especially in the notoriously slow world of Python. I'm a big fan of Rust but I wanted to verify these claims for myself, as well as understanding how they compare to Python's builtin functionality.

Benchmarking

In my original example, my main bottleneck turned out to be in data deserialization - converting a "rule" into a Pydantic object.

To test this, we will setup some benchmarks using pytest-benchmark, some sample data with a simple schema, and compare results between Python's dataclass, Pydantic v1, and v2.

Setup

Here is the test setup which uses a simple model of a user:

from dataclasses import dataclass

import pytest
from pydantic import BaseModel


@dataclass
class UserDC:
    id: int
    first_name: str
    last_name: str
    age: int
    email: str


class UserPY(BaseModel):
    id: int
    first_name: str
    last_name: str
    age: int
    email: str


@pytest.mark.benchmarks
def test_dc_bench(test_user, benchmark):
    user = benchmark.pedantic(UserDC, kwargs=test_user, iterations=10, rounds=50_000)
    assert user.id == 1


@pytest.mark.benchmarks
def test_py_bench(test_user, benchmark):
    user = benchmark.pedantic(UserPY, kwargs=test_user, iterations=10, rounds=50_000)
    assert user.id == 1
Enter fullscreen mode Exit fullscreen mode

We are using Python's dataclass as a baseline for comparison since I will need to run these tests with two different versions of Pydantic installed.

All benchmarks are run on a 2021 MacBook Pro with M1 Pro and 32GB RAM with the following environment:

Test session starts (platform: darwin, Python 3.11.3, pytest 7.3.1, pytest-sugar 0.9.7)
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
Enter fullscreen mode Exit fullscreen mode

Benchmark Results

Using Pydantic v1

Image description

Results reproduced for visibility:

Name (time in ns) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
test_dc_parse_bench 166.5991 (1.0) 1,791.5998 (1.0) 178.0631 (1.0) 19.0275 (1.0) 179.0992 (1.0) 4.1997 (1.0) 210;1391 5,615.9867 (1.0) 50000 10
test_py_parse_bench 3,170.7998 (19.03) 9,162.4999 (5.11) 3,227.0666 (18.12) 69.3922 (3.65) 3,216.6994 (17.96) 20.7001 (4.93) 2192;2766 309.8789 (0.06) 50000 10

Using Pydantic v2

Image description

Reproduced for visibility:

Name (time in ns) Min Max Mean StdDev Median IQR Outliers OPS (Mops/s) Rounds Iterations
test_dc_parse_bench 170.8002 (1.0) 1,754.0995 (1.0) 183.2414 (1.0) 14.7975 (1.0) 183.3003 (1.0) 4.2011 (1.0) 542;2882 5.4573 (1.0) 50000 10
test_py_parse_bench 741.6995 (4.34) 4,591.6997 (2.62) 768.6778 (4.19) 21.8820 (1.48) 766.6997 (4.18) 8.3994 (2.00) 2424;3224 1.3009 (0.24) 50000 10

Key Takeaways

  1. Both Pydantic v1 and v2 perform significantly slower than dataclass
  2. Pydantic's performance varies more widely than dataclass
  3. Pydantic v2 performs significantly faster than v1
  4. Pydantic v2's performance varies less than v1

Analysis

The results here confirm my suspicion that switching from dataclass to Pydantic was a significant factor in the performance degradation that sparked this investigation. We can see that, on average, Pydantic v1 is about 18x slower than dataclass.

We can also see that Pydantic v2 is, on average, about 4x faster than v1 for this particular use-case. If you check another post from Pydantic, you may see the range "4-50x" instead of the aforementioned "5-50x", which technically means these results meet their claims, even if just barely. I won't split hairs over it since this example is incredibly simple and not the likely target of optimization.

What's important to note here is even "just" 4x performance improvements can be a major win, especially if these gains can be achieved with little to no changes.4 For me, this is an easy win to recoup some of the performance losses I was facing in my initial example.

I encourage you to checkout the official benchmarks for more realistic and detailed examples, and, as always, YMMV.

Conclusions

Will Pydantic's new major release live up to the hype? In most cases you will probably see improvements on the lower bound of their estimations, but, as mentioned, even this can be a big win. In my opinion, Pydantic brings a number of enhancements to Python applications that more than make up for any of its performance losses.

It's worth noting that these improvements will also impact other libraries and frameworks that rely on Pydantic, such as FastAPI and AWS Lambda Powertools, which could deliver some transitive performance improvements to various projects that don't directly depend on Pydantic themselves.

Footnotes
[1]
[2] This example will actually coerce 39 into a string using Pydantic v1 if you resolve the type error on id. Using Pydantic v2 will instead report a validation error for both fields.
[3]
[4] See the migration guide for specifics on upgrading

Top comments (1)

Collapse
 
samuelcolvin profile image
Samuel Colvin

Interesting, did you try with a more complex model? E.g. with compound types? You should see a bigger improvement with that.