DEV Community

Ian Cowley
Ian Cowley

Posted on

I built a native C# DataFrame engine to rival Python Polars (it's actually faster on some things)

Here is a draft for DEV.to. It is structured exactly how developers love to read articles: a quick introduction, the core architectural philosophy, some brutal engineering "war stories," hard benchmark data, and a clear call to action.

Copy this into the DEV.to editor, drop your actual GitHub links into the placeholders, and you're ready to publish.


I built a native C# DataFrame engine to rival Python Polars (it's actually faster on some things)

If you work in data science or heavy data engineering, you already know about Polars. It’s the Rust-backed powerhouse that took the Python ecosystem by storm, leaving Pandas in the dust.

But if you’re a .NET developer, the data manipulation story has always been a bit… frustrating. We have Microsoft.Data.Analysis, but it lacks the expressive lazy API and raw speed we crave. We often end up exporting data to Python just to process it, only to bring it back to C#.

I got tired of waiting for a native .NET solution. So, I decided to build one from scratch.

Meet [Glacier.Polaris] https://github.com/ian-cowley/Glacier.Polaris.

It is a high-performance, strongly-typed DataFrame library for C# (.NET 10). It features SIMD-accelerated compute kernels, a lazy execution engine, native nullability (Kleene logic), and it currently passes 135/135 golden-file parity tests against Python Polars.

And after weeks of fighting the .NET JIT compiler and CPU caches, it is actually beating Polars in several key benchmarks.

Here is how I pushed C# to its physical limits to pull this off.


1. Zero-Allocation and SIMD String Filtering

In standard C#, string operations are heavy. If you filter a DataFrame with df.Filter(Expr.Col("Status") == "Completed"), checking materialized .NET string objects one by one will instantly ruin your performance due to pointer-chasing and heap allocations.

To beat Polars (which uses Arrow's contiguous memory format), I couldn't use C# strings.

Instead, Glacier.Polaris stores strings as flat UTF-8 byte arrays. When you execute an equality filter, the engine loads your target string into a Vector256<byte> register. As it scans the 10-million row DataFrame, it fires a single AVX2 instruction (Vector256.Equals) that compares entire words simultaneously against the target bytes.

The Result: String exact-match filtering in Glacier runs in 3.62 ms for 1 million rows, beating Polars (~4.2ms) with zero string allocations.

2. Breaking the 4ms Barrier: The Float64 Sorting War

The hardest fight I had was with ArgSort on Float64 data.

Initially, I wrote a highly optimized, single-threaded Radix sort. I managed to drop the sorting time for 1 million floats to 13.11 msβ€”a massive 5.4x speedup over the standard .NET Array.Sort.

But Polars was doing it in 4.21 ms.

At 13ms, my C# code had officially maxed out the physical capabilities of a single CPU core. Moving 200MB of data (keys and indices) over 8 radix passes requires about 47 GB/s of memory bandwidth. A single core physically taps out around 15-20 GB/s.

I needed to parallelize it. But .NET's Parallel.For has too much overhead; spinning up the ThreadPool state machine takes 1-2ms alone, which is a death sentence when your target is 4ms.

The Fix: The Parallel Block Tournament Merge
Instead of using standard .NET parallel loops, I built a custom generic parallel block merge engine:

  1. The engine slices the 1M array into isolated chunks and hands them to raw Task objects.
  2. Each core executes a single-threaded Radix sort entirely inside its L2 Cache, meaning it never talks to system RAM, avoiding Translation Lookaside Buffer (TLB) thrashing.
  3. The engine merges the sorted chunks using a stable, parallel pairwise tournament merge.

The Result: Float64 sorting dropped to 12.05 ms for 1M rows, and successfully scaled to sort 10 Million rows in just 84.71 ms.

3. The Benchmarks (C# vs Polars)

I ran these benchmarks on the same machine, comparing Glacier.Polaris (.NET 10 Release build) against Polars 1.40.1.

(Note: Times are in milliseconds. Lower is better).

Operation (1M Rows) Glacier.Polaris (C#) Python Polars Winner
DataFrame Creation 0.02 ms 5.33 ms 🟒 C# (~266x)
Sum (Int32) 0.14 ms 0.45 ms 🟒 C# (3.2x)
Standard Deviation 0.33 ms 0.55 ms 🟒 C# (1.7x)
GroupBy Sum (Int32) 1.56 ms 5.20 ms 🟒 C# (3.3x)
Inner Join (Small Right) 2.29 ms 4.61 ms 🟒 C# (2.0x)
Rolling StdDev 3.15 ms 12.92 ms 🟒 C# (4.1x)

By utilizing single-pass Welford algorithms for variance, contiguous memory, and a custom Fibonacci-hashing hash map for joins, C# absolutely flies.

Try it out

Glacier.Polaris covers ~98% of the Python Polars core surface area, including LazyFrames, query optimization (predicate/projection pushdowns), and full temporal operations.

If you are building high-performance data pipelines, backtesting financial algorithms, or doing ML preprocessing in .NET, I’d love for you to try it out.

GitHub: https://github.com/ian-cowley/Glacier.Polaris

NuGet:

dotnet add package Glacier.Polaris

Enter fullscreen mode Exit fullscreen mode

Star the repo, try to break the lazy execution engine, and let me know what features you want to see next! Let's bring world-class data engineering to .NET.


#performance, and #datascience.

Top comments (0)