I ran benchmarks on a 10M row CSV file. Polars finished in 0.3 seconds. Pandas took 12 seconds.
That's a 40x speedup with zero code complexity increase.
Here's what's happening and why you should care.
The Benchmark
Dataset: 10M rows, 8 columns (sales data). Operations: read CSV, filter, groupby, sort.
| Operation | Pandas | Polars | Speedup |
|----------------|----------|----------|---------|
| Read CSV | 8.2s | 0.9s | 9x |
| Filter rows | 0.4s | 0.02s | 20x |
| GroupBy + Agg | 2.8s | 0.15s | 19x |
| Sort | 1.1s | 0.08s | 14x |
| Total pipeline | 12.5s | 0.32s | 39x |
Polars is written in Rust. It uses Apache Arrow memory format. It parallelizes everything automatically.
The Code Comparison
Pandas:
import pandas as pd
df = pd.read_csv("sales.csv")
result = (
df[df["revenue"] > 100]
.groupby("category")
.agg({"revenue": "sum", "quantity": "mean"})
.sort_values("revenue", ascending=False)
)
Polars:
import polars as pl
df = pl.read_csv("sales.csv")
result = (
df.filter(pl.col("revenue") > 100)
.group_by("category")
.agg(
pl.col("revenue").sum(),
pl.col("quantity").mean(),
)
.sort("revenue", descending=True)
)
Almost identical syntax. 40x faster.
Why Polars Is Faster
- Rust backend — compiled, not interpreted
- Apache Arrow — zero-copy memory format
- Lazy evaluation — query optimizer (like a database)
- Automatic parallelism — uses all CPU cores
- No GIL — Rust threads, not Python threads
When to Use Polars vs Pandas
Use Polars when:
- Dataset > 100MB
- You need speed (ETL pipelines, data engineering)
- You're starting a new project
Stick with Pandas when:
- Team knows Pandas and migration cost is high
- You need specific Pandas ecosystem tools (some ML libraries)
- Dataset < 100MB (both are fast enough)
The Lazy API (Secret Weapon)
# Polars lazy — optimizer plans the best execution
result = (
pl.scan_csv("sales.csv") # Lazy — doesn't read yet
.filter(pl.col("revenue") > 100)
.group_by("category")
.agg(pl.col("revenue").sum())
.collect() # Now it reads + processes optimally
)
scan_csv + collect() = Polars reads only the columns/rows it needs. On a 10GB file, this is the difference between "runs in 2 seconds" and "crashes with OOM."
Getting Started
pip install polars
That's it. No compilation. No C dependencies. Just works.
Full Data Engineering Toolkit
If you're building data pipelines, check out:
- Awesome Data Engineering 2026 — 150+ tools
- Python Web Scraper Template — Production scraper
Have you switched from Pandas to Polars? What was your speedup? Drop your numbers in the comments.
I write about data engineering and Python performance. Follow for weekly deep dives.
Top comments (0)