Stop Paying the "Python Object Tax": 10M Rows in 0.08s with C and Parallel mmap

#python #programming #c #performance

I was benchmarking some data ingestion pipelines on my Nitro 16 (Ryzen 7) and honestly got pretty frustrated with how much overhead Python adds to basic I/O. Even with optimized Pandas code, processing 10M rows was hitting a wall because of how Python wraps every single data point in a high-level object.I decided to go "to the metal" to see what the hardware is actually capable of. I built Axiom Turbo-IO, a C-bridge that utilizes two specific systems-level optimizations:1. Memory Mapping (mmap)Instead of standard file I/O (which involves multiple user-space copies), I mapped the entire file directly to the virtual address space. This bypasses the "copying tax" and lets the OS handle paging.2. Parallel PthreadsI split the file into chunks and processed them across all 8 CPU cores simultaneously. By bypassing the Python Global Interpreter Lock (GIL), I’m getting near-instantaneous throughput.The "Grit": Boundary HardeningThe hardest part was ensuring data integrity. When you split a file into 8 chunks, you almost always cut a line in half. I had to write a custom "Skip and Overlap" algorithm to ensure that every thread finds the start of its first full line and finishes its last partial line. No double-counting, no lost data.📊 The Benchmark (10 Million Rows)EngineExecution TimeRAM UsageEfficiencyStandard Python~0.873s~1.5 GBBaselineAxiom Turbo-IO0.083s~8 KB19.08x FasterWhy I’m Open-Sourcing ThisI believe a small C/C++ bridge can save a massive amount of cloud compute cost in a production environment. If you're running massive logs through a high-RAM AWS instance, you might be overpaying for memory you don't actually need.GitHub Repository: https://github.com/naresh-cn2/Axiom-Turbo-IOLet's Talk PerformanceHow are you guys handling 100GB+ datasets? Are you sticking with Polars/DuckDB, or are you writing custom bridges for hyper-specific tasks?P.S. If your pipeline is currently crawling or hitting "Out of Memory" errors, I'm doing 3 free 10-minute performance audits this week. DM me or open an issue on GitHub if you want a second pair of eyes on your ingestion logic.

DEV Community

Stop Paying the "Python Object Tax": 10M Rows in 0.08s with C and Parallel mmap

Top comments (0)