TL;DR —
pip install python-dateutil-rs, change one import, get 5x–94x faster date parsing, recurrence rules, and timezone lookups. It's a line-by-line Rust port via PyO3 — no code changes required. GitHub | PyPI
python-dateutil Is Everywhere — and It's Pure Python
python-dateutil is one of the most depended-on packages in the Python ecosystem. With 300M+ monthly downloads on PyPI, it powers date parsing, relative deltas, recurrence rules, and timezone handling across countless applications.
But it's written in pure Python. For hot paths — parsing thousands of ISO timestamps in a data pipeline, expanding recurring calendar events, computing relative dates in a loop — that means leaving significant performance on the table.
What if you could make it up to 94x faster without changing a single line of your application code?
pip install python-dateutil-rs
That's python-dateutil-rs — a Rust-backed drop-in replacement I built using PyO3 and maturin. Same API, same behavior, dramatically faster.
And here's the thing: this is a naive, line-by-line port. The Rust code mirrors the original Python logic almost 1:1, with no Rust-specific optimizations yet. The speedups come purely from Rust's compiled nature — no dynamic dispatch, minimal GC pressure, no interpreter overhead.
The Approach: Module-by-Module Rewrite via PyO3
A full rewrite of a battle-tested library is risky. Instead, I took an incremental approach: rewrite each module in Rust independently, validate against the original test suite (~13,000 lines of tests, all passing), and ship value at every step.
The architecture uses maturin's mixed layout — a Rust extension module and Python wrapper live in the same package:
dateutil_rs (Python package)
├── _native (Rust extension via PyO3/maturin) ← all modules
└── Python wrappers ← API compatibility layer
Every module is now implemented in Rust: parser, relativedelta, rrule, tz, easter, utils, and weekday constants. The Python layer handles API compatibility — for example, serializing custom parserinfo lookup tables and forwarding them to the Rust parser:
from dateutil_rs._native import isoparse
from dateutil_rs._native import parse as _parse_rs
def parse(timestr, parserinfo=None, **kwargs):
if parserinfo is not None:
# Serialize custom lookup tables → forward to Rust
config = parserinfo._to_rust_config()
return _parse_rs(timestr, parserinfo_config=config, **kwargs)
return _parse_rs(timestr, **kwargs)
The API is identical to python-dateutil — just change the import:
# Before (python-dateutil)
from dateutil.parser import parse
from dateutil.relativedelta import relativedelta
from dateutil.rrule import rrule, MONTHLY
from dateutil.tz import gettz, tzutc
# After (python-dateutil-rs) — same code
from dateutil_rs.parser import parse
from dateutil_rs.relativedelta import relativedelta
from dateutil_rs.rrule import rrule, MONTHLY
from dateutil_rs.tz import gettz, tzutc
Benchmark Results
All benchmarks: Python 3.13, macOS Apple Silicon M3 (arm64), Rust 1.86 stable, release build (--release), measured with pytest-benchmark. Compared against python-dateutil 2.9.0.
At a Glance
| Module | Speedup Range | Highlight |
|---|---|---|
| Timezone | 1.7x – 94.3x | Batch gettz() with cache: 94.3x |
| ISO Parser | 5.1x – 23.5x | With microseconds: 23.5x |
| RRule | 3.1x – 20.0x |
rrulestr with dtstart: 20.0x |
| RelativeDelta | 4.9x – 18.7x | Multiply by scalar: 18.7x |
| General Parser | 1.3x – 3.5x | Fuzzy parsing: 3.5x |
| Easter | 3.2x – 6.2x | 1000 years batch: 6.2x |
Note on the 94.3x headline number: The timezone benchmark includes a Rust-side
RwLock<HashMap>cache (similar to python-dateutil's_TzFactory). Without caching, pure timezone operations still see 1.7x–3.8x speedups. The most representative single-operation speedup is 23.5x on ISO parsing.
ISO Parsing — Up to 23.5x Faster
| Benchmark | python-dateutil | dateutil-rs | Speedup |
|---|---|---|---|
| Date only | 0.81 µs | 0.08 µs | 10.7x |
| Datetime | 2.06 µs | 0.10 µs | 20.9x |
| Datetime + TZ | 3.11 µs | 0.61 µs | 5.1x |
| With microseconds | 2.67 µs | 0.11 µs | 23.5x |
| 7 various formats | 23.10 µs | 3.10 µs | 7.4x |
RelativeDelta — Up to 18.7x Faster
| Benchmark | python-dateutil | dateutil-rs | Speedup |
|---|---|---|---|
| Create simple | 0.94 µs | 0.19 µs | 4.9x |
| Add months | 1.62 µs | 0.13 µs | 12.7x |
| Subtract | 3.03 µs | 0.18 µs | 16.5x |
| Multiply by scalar | 1.49 µs | 0.08 µs | 18.7x |
| Sequential add ×12 | 19.41 µs | 1.70 µs | 11.4x |
| Month-end overflow | 1.74 µs | 0.13 µs | 13.3x |
RRule (Recurrence Rules) — Up to 20x Faster
| Benchmark | python-dateutil | dateutil-rs | Speedup |
|---|---|---|---|
| Weekly ×52 | 105.33 µs | 34.44 µs | 3.1x |
| Monthly ×120 | 448.46 µs | 114.12 µs | 3.9x |
| Yearly ×100 | 2,144.58 µs | 235.02 µs | 9.1x |
| rrulestr simple | 3.14 µs | 0.53 µs | 6.0x |
| rrulestr complex | 6.60 µs | 0.90 µs | 7.4x |
| rrulestr with dtstart | 15.61 µs | 0.78 µs | 20.0x |
Timezone — Up to 94.3x Faster (Cache-Inclusive)
| Benchmark | python-dateutil | dateutil-rs | Speedup |
|---|---|---|---|
| gettz various (×10) | 714.93 µs | 7.59 µs | 94.3x |
| gettz offset | 20.20 µs | 5.26 µs | 3.8x |
| resolve_imaginary | 6.54 µs | 3.19 µs | 2.0x |
| datetime_exists | 3.15 µs | 1.62 µs | 1.9x |
| convert chain | 6.74 µs | 4.08 µs | 1.7x |
The 94.3x speedup on batch gettz() comes from a process-global RwLock<HashMap> cache — similar to python-dateutil's _TzFactory, but with Rust's near-zero-cost Arc clone for cached timezone objects. Even without the cache hit, individual timezone operations see 1.7x–3.8x improvements from compiled execution alone.
Easter — Up to 6.2x Faster
| Benchmark | python-dateutil | dateutil-rs | Speedup |
|---|---|---|---|
| Single call | 0.51 µs | 0.13 µs | 3.9x |
| 1000 years | 437.88 µs | 70.46 µs | 6.2x |
Why Is It Fast? It's Just Rust Being Rust
Here's the surprising part: the Rust code is a faithful, almost line-by-line translation of the Python source. I didn't design clever algorithms or exploit Rust-specific data structures. The speedups come from the fundamental differences between compiled and interpreted execution.
Profiling revealed that Python's overhead isn't in any single place — it's everywhere. Every function call, every attribute access, every integer comparison, every object allocation pays an interpreter tax. In a tight loop, these micro-costs compound dramatically. That's why even a naive port sees 5x–94x improvements on batch operations.
Let's look at three modules to see exactly where the performance comes from:
isoparse: Byte-Level Parsing (23.5x)
The original isoparser in Python uses string slicing and int() conversion. The Rust version does the same logic — but on byte arrays instead of Python strings:
fn parse_isodate_common(&self, dt_str: &[u8]) -> Result<([i32; 3], usize), String> {
let mut c = [1i32; 3]; // [year, month, day]
// Year: parse 4 ASCII bytes directly
c[0] = parse_int(&dt_str[..4])?;
let mut pos = 4;
let has_sep = dt_str[pos] == b'-';
if has_sep { pos += 1; }
// Month: 2 bytes
c[1] = parse_int(&dt_str[pos..pos + 2])?;
pos += 2;
// Day: 2 bytes
// ...
}
Same algorithm as the Python version. No string allocation, no regex, no dynamic dispatch — just the inherent overhead of interpretation removed. The parse_int helper converts ASCII bytes to integers with a multiply-and-add loop. For a microsecond field like 123456, this is dramatically faster than Python's int("123456").
Result: "2024-01-15T10:30:00.123456+09:00" goes from 2.67 µs to 0.11 µs.
RelativeDelta: Fixed Struct vs Dynamic Attributes (18.7x)
Python's relativedelta stores fields as instance attributes with getattr/setattr patterns. The Rust version uses a fixed struct — same fields, same logic:
pub struct RelativeDelta {
// Relative (plural) fields — offsets
years: i32,
months: i32,
days: f64,
hours: f64,
minutes: f64,
seconds: f64,
microseconds: f64,
// Absolute (singular) fields — "set to this value"
year: Option<i32>,
month: Option<i32>,
day: Option<i32>,
// ...
}
Multiplication is just multiplying each field — no dictionary lookups, no attribute resolution, no type coercion overhead. Same algorithm, 18.7x faster.
RRule: Same Algorithm, Less Overhead (9.1x–20x)
Recurrence rules (RFC 5545) are the most complex module — they expand patterns like "every 3rd Tuesday of the month" into concrete dates. The Rust implementation follows the same expansion algorithm as python-dateutil:
- Stack-allocated date arithmetic via
chronoinstead of Pythondatetimeobjects - No per-iteration object allocation or GC pressure
- Integer comparisons instead of Python object protocol (
__lt__,__eq__)
For yearly rules over 100 occurrences, the overhead compounds: 2,144 µs → 235 µs. For rrulestr parsing, the gap is even wider — 20x faster because string-to-rule conversion involves heavy tokenization that benefits enormously from compiled execution.
What's Next: v1 — Actually Optimized Rust
Everything above is v0 — a compatibility-first port that proves Rust's floor is Python's ceiling. The Rust code deliberately mirrors Python's logic to ensure behavioral fidelity against 13,000 lines of inherited tests.
v1 is where things get interesting. It will be a clean Rust-native implementation with real optimizations:
| Optimization | Technique | Target Impact |
|---|---|---|
| Zero-copy parsing |
&str slices instead of String clones |
Parser: 5x–50x |
| Compile-time hash maps |
phf perfect hash for weekday/month lookup |
Parser: faster tokenization |
| Buffer reuse | Pre-allocated buffers cleared across iterations | RRule: 5x–15x |
| Minimal allocations |
SmallVec, stack buffers, arena patterns |
All modules |
| Streamlined API | Drop rarely-used features (fuzzy parse, POSIX tz) | Smaller, faster core |
v1 will ship as:
-
dateutilon crates.io — pure Rust crate, no PyO3 dependency -
dateutilon PyPI — thin PyO3 binding layer wrapping the Rust core
The v0 → v1 migration path:
v0.x: faithful python-dateutil port (current — 1.3x–94x faster)
v1.x: Rust-optimized core (target — 5x–100x faster)
If a naive port already delivers up to 94x speedups, imagine what happens when the Rust code is actually written like Rust.
What I Learned Building This
PyO3 Is Remarkably Ergonomic
PyO3's derive macros make exposing Rust types to Python straightforward:
#[pyclass(name = "relativedelta")]
pub struct RelativeDelta { /* ... */ }
#[pymethods]
impl RelativeDelta {
#[new]
fn new(/* 15+ optional kwargs */) -> Self { /* ... */ }
fn __add__(&self, other: &Bound<'_, PyAny>) -> PyResult<PyObject> {
// Handle datetime + relativedelta
}
}
The trickiest part was datetime ↔ chrono conversion. PyO3's chrono feature handles the basics, but timezone-aware datetimes require careful handling of the Python tzinfo protocol.
A Faithful Port Is a Valid Strategy
The instinct when rewriting in Rust is to redesign everything. I resisted that. By keeping the logic 1:1 with Python, I could validate correctness against the original test suite — and still ship massive speedups. The optimization pass (v1) comes later, built on a foundation that is already proven correct.
Where the Overhead Actually Is
The takeaway is clear: CPython's per-operation overhead is small in isolation, but it compounds multiplicatively in loops. A date parser that touches 10 fields per call, called 10,000 times, pays the interpreter tax 100,000 times. Rust pays it zero times.
Try It
pip install python-dateutil-rs
- GitHub: wakita181009/dateutil-rs
- PyPI: python-dateutil-rs
- Python: 3.10–3.14 on Linux and macOS (Windows support planned)
The project is open source (MIT). If your application use Ws python-dateutil, switching is a one-line import change. v0 is stable, fully tested, and API-compatible. v1 — the Rust-optimized core with zero-copy parsing and phf hash maps — is coming.
The numbers speak for themselves. And once zero-copy parsing and arena allocation land in v1, they're about to get a lot bigger.
Top comments (0)