DEV Community

ANKUSH CHOUDHARY JOHAL
ANKUSH CHOUDHARY JOHAL

Posted on • Originally published at johal.in

Polars 1.0 lazy evaluation vs DuckDB 1.0 SQL on Parquet: which queries offload to the engine?

Polars 1.0 Lazy Evaluation vs DuckDB 1.0 SQL on Parquet: Query Offload Explained

When working with large Parquet datasets, two tools dominate modern analytics stacks: Polars 1.0 with its lazy evaluation engine, and DuckDB 1.0, the embedded OLAP database that runs SQL directly on Parquet files. A key question for engineers is: which queries offload processing to the respective engine, and how does that impact performance?

What Is Query Offload?

Query offload refers to pushing computation (filtering, projection, aggregation) down to the storage or engine layer, rather than loading full datasets into application memory. For Parquet, this means the engine reads only relevant row groups, columns, and statistics to execute queries without scanning entire files.

Polars 1.0 Lazy Evaluation: How Offload Works

Polars 1.0’s lazy API builds a logical query plan first, then optimizes it before execution. When reading Parquet via pl.scan_parquet(), Polars:

  • Reads Parquet file metadata (schema, row group stats, column chunk offsets) without loading data
  • Prunes row groups that don’t match filter predicates using min/max statistics
  • Selects only required columns (projection pushdown) to avoid reading unnecessary data
  • Offloads aggregations, joins, and filters to its multi-threaded Rust engine when possible

Queries that offload fully in Polars include: column selection, row group-prunable filters, simple aggregations (sum, count, mean) on scanned Parquet, and joins between lazy Parquet scans. Queries that require full data materialization (e.g., user-defined Python functions, complex cross-language operations) break offload and load data into memory.

DuckDB 1.0 SQL on Parquet: Offload Mechanics

DuckDB 1.0 treats Parquet files as first-class tables via its Parquet reader. When running SQL queries like SELECT col1, SUM(col2) FROM 'data.parquet' WHERE col3 > 100 GROUP BY col1, DuckDB:

  • Uses Parquet metadata to prune row groups and columns before scanning
  • Offloads all standard SQL operations (filters, aggregations, joins, window functions) to its vectorized execution engine
  • Supports predicate pushdown, projection pushdown, and aggregation pushdown to Parquet reads
  • Can offload joins between multiple Parquet files to the engine without materializing intermediate results

Almost all valid SQL queries on Parquet offload to DuckDB’s engine, including complex window functions, CTEs, and subqueries. The only exceptions are queries that call external UDFs or require returning full result sets to the client for post-processing.

Key Differences in Offload Coverage

Query Type

Polars 1.0 Lazy Offload

DuckDB 1.0 SQL Offload

Column selection (projection)

Full offload

Full offload

Row group pruning via filters

Full offload

Full offload

Simple aggregations (sum, count)

Full offload

Full offload

Complex window functions

Partial offload (may materialize)

Full offload

Joins between Parquet files

Full offload (lazy scans only)

Full offload

Python UDFs in query logic

No offload (loads data)

No offload (unless DuckDB UDF)

When to Choose Which?

Use Polars 1.0 lazy evaluation if you’re already working in Python, need tight integration with Pandas/NumPy ecosystems, or have custom logic that fits Polars’ expression API. It offloads most standard analytics queries efficiently.

Use DuckDB 1.0 SQL if you prefer declarative SQL, need to support ad-hoc queries from multiple tools, or run complex window functions and CTEs that DuckDB offloads fully. Its SQL interface lowers the barrier for non-Python users.

Conclusion

Both Polars 1.0 and DuckDB 1.0 offload most Parquet queries to their engines, but DuckDB covers a wider range of SQL-native operations, while Polars excels in Python-centric workflows with lazy optimization. Test both with your specific workload to measure offload efficiency and performance.

Top comments (0)