DEV Community

Alex Spinov
Alex Spinov

Posted on

Polars Just Made Pandas Look Slow — Benchmarks Inside

I ran benchmarks on a 10M row CSV file. Polars finished in 0.3 seconds. Pandas took 12 seconds.

That's a 40x speedup with zero code complexity increase.

Here's what's happening and why you should care.


The Benchmark

Dataset: 10M rows, 8 columns (sales data). Operations: read CSV, filter, groupby, sort.

| Operation      | Pandas   | Polars   | Speedup |
|----------------|----------|----------|---------|
| Read CSV       | 8.2s     | 0.9s     | 9x      |
| Filter rows    | 0.4s     | 0.02s    | 20x     |
| GroupBy + Agg   | 2.8s     | 0.15s   | 19x     |
| Sort            | 1.1s    | 0.08s    | 14x     |
| Total pipeline  | 12.5s   | 0.32s   | 39x     |
Enter fullscreen mode Exit fullscreen mode

Polars is written in Rust. It uses Apache Arrow memory format. It parallelizes everything automatically.

The Code Comparison

Pandas:

import pandas as pd

df = pd.read_csv("sales.csv")
result = (
    df[df["revenue"] > 100]
    .groupby("category")
    .agg({"revenue": "sum", "quantity": "mean"})
    .sort_values("revenue", ascending=False)
)
Enter fullscreen mode Exit fullscreen mode

Polars:

import polars as pl

df = pl.read_csv("sales.csv")
result = (
    df.filter(pl.col("revenue") > 100)
    .group_by("category")
    .agg(
        pl.col("revenue").sum(),
        pl.col("quantity").mean(),
    )
    .sort("revenue", descending=True)
)
Enter fullscreen mode Exit fullscreen mode

Almost identical syntax. 40x faster.

Why Polars Is Faster

  1. Rust backend — compiled, not interpreted
  2. Apache Arrow — zero-copy memory format
  3. Lazy evaluation — query optimizer (like a database)
  4. Automatic parallelism — uses all CPU cores
  5. No GIL — Rust threads, not Python threads

When to Use Polars vs Pandas

Use Polars when:

  • Dataset > 100MB
  • You need speed (ETL pipelines, data engineering)
  • You're starting a new project

Stick with Pandas when:

  • Team knows Pandas and migration cost is high
  • You need specific Pandas ecosystem tools (some ML libraries)
  • Dataset < 100MB (both are fast enough)

The Lazy API (Secret Weapon)

# Polars lazy — optimizer plans the best execution
result = (
    pl.scan_csv("sales.csv")  # Lazy — doesn't read yet
    .filter(pl.col("revenue") > 100)
    .group_by("category")
    .agg(pl.col("revenue").sum())
    .collect()  # Now it reads + processes optimally
)
Enter fullscreen mode Exit fullscreen mode

scan_csv + collect() = Polars reads only the columns/rows it needs. On a 10GB file, this is the difference between "runs in 2 seconds" and "crashes with OOM."

Getting Started

pip install polars
Enter fullscreen mode Exit fullscreen mode

That's it. No compilation. No C dependencies. Just works.


Full Data Engineering Toolkit

If you're building data pipelines, check out:


Have you switched from Pandas to Polars? What was your speedup? Drop your numbers in the comments.


I write about data engineering and Python performance. Follow for weekly deep dives.

Top comments (0)