Alberto Cardenas

Posted on Dec 18, 2025

The Bear Awakens: From Pure Speed to Massive Endurance (640 Million Rows Tested)

#testing #dataengineering #performance #showdev

1. Beyond the Sprint: The Obsession with 11 Seconds

Just a week ago, I closed a chapter of this journey with a strange mix of pride and frustration. I had achieved something that, on paper, seemed impossible for my standard laptop in a local development environment: processing 50 million records in just over 11 seconds. The numbers on my terminal were green and glowing. I had matched, and even surpassed by margins of half a second, established giants like Polars in that specific test. It was an indisputable technical victory, the kind of result you want to frame and post on every social media channel.

But in the solitude of the compiler, when the noise of digital applause fades, you know when something isn't quite right. Deep down, I knew this was a "sprint victory." I had optimized pardoX to run the 100-meter dash, tightening every screw, every memory allocation, and every execution thread for that specific scenario of 50 million rows. I had created a pure speed runner, explosive and fast, but terribly specialized. And real-world data engineering—the kind I face in the trenches every day, not in my testing lab—is rarely a short, clean race. The reality is a brutal, messy marathon, full of unforeseen obstacles.

The fundamental problem with my obsession for optimizing for a "sprint" is that I made the engine fragile. In my relentless quest to break the 12-second barrier and win the battle of benchmarks on medium datasets, I started noticing subtle cracks in pardoX's armor. If I looked closely at the telemetry, beyond the final time, the story was concerning: memory consumption showed aggressive spikes, like an athlete desperately gasping for air. To achieve that speed, I was pushing my hardware to the limit, consuming resources voraciously. On my 16GB machine, this was manageable. But what would happen if I tried running this on a smaller cloud instance? Or if I had my browser open with twenty tabs? The engine ran the risk of collapsing not from a lack of speed, but from resource exhaustion.

Here I faced my true dilemma as a software architect. I had two very clear paths. The first was the easy path: stay in my comfort zone, keep shaving milliseconds in the "Mid Data" range, celebrate my victory against the big players, and release a tool that was "the fastest for 50 million." It's an attractive, sellable, and safe value proposition. But I felt it betrayed my own original vision for pardoX.

The second path was the painful one: risk breaking what already worked. Accepting that my current architecture, while fast, was not resilient enough for the vision of "Universality" I promised. If I wanted pardoX to be truly agnostic—a tool capable of living in any environment and processing any volume without fear—I couldn't rely on perfect conditions. I needed stability under extreme pressure. I needed to stop thinking about how to run faster and start thinking about how to run forever without getting tired.

The decision was drastic but necessary: stop micro-optimizing for the pure speed of the small dataset. I stopped looking at the 11-second stopwatch and started looking at system stability monitors. I realized that to scale towards hundreds of millions or billions of rows, I had to stop treating RAM as an infinite resource I could "borrow" to gain speed, and start managing it for what it truly is on most of our corporate laptops: a scarce and precious treasure.

Redesigning the flow architecture meant changing my own philosophy about the engine. I moved from a model that tried to "swallow" data as fast as possible, to a model of "controlled breathing." I had to teach the bear to pace itself, to understand that it's useless to reach the halfway point in record time if you're going to pass out before the finish line. It was a process of personal technical maturation: abandoning my vanity of immediate milliseconds in exchange for the robustness of industrial engineering.

It wasn't just about being fast; anyone can be fast once. The real challenge, and what defines a professional tool versus an academic toy, is the ability to be relentless. I wanted to build an engine that could look at a 600-million-row file—a monster that would make my own laptop tremble—and process it with the same calm and stability with which it processes a small file. That was the moment pardoX ceased to be my speed experiment and began to become critical infrastructure.

2. Raising the Stakes: The Leap into the "Heavyweight" Category

Comfort is a silent trap. After stabilizing pardoX in the 50-million-row range, I felt that dangerous satisfaction of "job done." The engine was flying, memory was under control, and benchmarks were consistently green. I could have stopped there. I could have packaged that version, put a nice bow on it, and released it to the world as "the ultimate solution for your medium CSVs." It would have been a reasonable success. But real engineering isn't about reasonableness; it's about pushing limits until something breaks, then rebuilding it stronger.

I decided it was time to leave the safety of the kiddie pool. If pardoX truly aspired to be a universal engine, it couldn't be scared of volumes that would make a server sweat. So I raised the stakes. I generated two new stress scenarios, specifically designed to break my own architecture. The first: 150 million rows. A considerable leap, tripling the usual load, perfect for seeing if memory cracks turned into fractures. But that wasn't enough. I needed a final monster, a "Boss" level that separated toys from industrial tools. Thus, the 640-million-row dataset was born. We are talking about a volume of data that, in raw format, far exceeds the physical memory of my laptop. It was a declaration of war against my own hardware.

The goal of this experiment wasn't simply to see if pardoX could finish the job. Eventually, any poorly optimized script can process 600 million rows if you give it three days and enough disk swap space. No, my goal was to verify if our fundamental "Zero-Copy" theory and dynamic resource allocation—that "brain" we had programmed to detect and respect hardware—held up when the hydraulic pressure of the data multiplied tenfold. Would the engine remain agile? Or would it become slow and clumsy under its own weight, as happens to so many systems when they scale?

My philosophy for this stage was clear: it's not about if you can process it, but how your machine feels while doing it. There is an abysmal difference between a tool that hijacks your computer, freezing the mouse and making the fans sound like a jet turbine about to take off, and a tool that works in silence, with the cold efficiency of a professional. I wanted pardoX to be the latter. I wanted to be able to process 640 million rows and still listen to music on Spotify without interruptions. I wanted to prove that high performance doesn't have to be synonymous with user suffering.

So I prepped the ring. In one corner, DuckDB, the robust tank of local SQL. In the other, Polars, the Rust speedster that had dominated my nightmares and dreams. And in the center, pardoX, with its new "massive endurance" architecture. I took a deep breath, closed all unnecessary browser tabs (a sacred ritual for any engineer before a benchmark), and launched the first test: the 150 million.

What I saw in the terminal made me smile. It wasn't just that it finished; it was how it finished. 42.5 seconds. A sustained throughput speed of nearly 90 MB/s. But most importantly: RAM remained stable, like a flat line on a healthy heart monitor. There were no panic spikes, no swap usage.

To put this in perspective, let's compare it with the giants.

Polars, the gold standard, crossed the finish line in 56.5 seconds. Still incredibly fast, don't get me wrong, but pardoX had managed to beat it by 14 seconds in this stretch.

And DuckDB... well, DuckDB is solid, but its 98 seconds reminded us that the overhead of a full SQL engine comes at a price when you just want to move data fast. PardoX wasn't just competing; it was leading. We had managed to get an i5 laptop to process data at a speed of 3.5 million rows per second, sustained for nearly a minute. The "Zero-Copy" theory wasn't just academic; it was a real, tangible competitive advantage in the physical world.

3. The Evidence: When Engineering Beats Brute Force

I have always believed that real engineering is not proven with promises in a README, but with extreme evidence in the terminal. In my career, I've learned that data is stubborn; it doesn't care what programming language is trendy, nor what elegant architecture you drew on the whiteboard. Data only cares about one thing: do you have the capacity to process it or not? So, with that mindset, I prepared the final stage. It wasn't a game or a synthetic simulation. It was 320 real CSV files, each loaded with 2 million records. 640 million rows in total. A volume of data that typically requires budget approval for a Spark cluster or an expensive EC2 instance. But I was going to face it right here, on the bare metal of my local laptop.

DuckDB was the first to enter the ring. It's a tool I deeply respect for its SQL robustness and analytical capability, but the reality of massive ingestion was harsh.

The clock stopped at 535 seconds. Almost 9 minutes watching a progress bar. In a daily workflow, 9 minutes is an eternity; it's enough time to go for coffee, come back, check emails, answer a Slack message, and completely lose the "flow" of what you were doing. It wrote at a speed of ~26 MB/s. It's solid, it didn't crash, but it felt heavy, far from saturating the disk's capacity.

Then came Polars, the current gold standard and the rival to beat. Polars is fast, incredibly fast, and watching it work is always a lesson in humility for any developer.

It held strong and finished the task in 269 seconds. An impressive time for that monstrous amount of data, maintaining a write speed of ~57 MB/s. Most engineers would be satisfied here. It is excellent performance. But my obsession with pardoX wasn't about being "good enough," it was about finding the physical limit of the hardware.

And then, pardoX took control. I closed my eyes for a second, took a deep breath, and ran the script.

257 seconds. 4 minutes and 17 seconds. When I saw that number, I knew we had crossed a threshold. We hadn't just survived the 640 million monster; we had tamed it. We were more than twice as fast as DuckDB and managed to beat Polars at its own endurance game by a margin of 12 seconds.

But beyond winning the race by a few seconds, what truly filled me with professional satisfaction was the consistency shown in the telemetry. If you analyze the numbers coldly, you'll see something revealing. While other engines start to degrade their performance exponentially as data volume increases—drowning in their own memory management—pardoX maintained a stoic pace. We processed at a speed of ~62 MB/s and maintained a pace of ~2.4 million rows per second, sustained, for over four minutes.

That stability is pure gold. It means the engine isn't drowning; it's breathing. It means the "Zero-Copy" architecture and dynamic thread management scale linearly and don't collapse under the pressure of 320 simultaneous files. This evidence tells me something clear: when you stop relying on hardware brute force (asking for more RAM) and start relying on precision engineering (managing what you have better), the limits of the possible expand. I didn't need a cluster. I didn't need 128GB of RAM. I only needed a design that respected every byte and every CPU cycle. And the results are there, bright and green in my terminal, proving that David can beat Goliath if he has the right sling.

4. The Internal Alchemy: Dynamic Tuning and "Zero Friction"

Often in software development, we confuse complexity with quality. We tend to believe that to solve a massive problem, we need baroque architectures, distributed microservices, and complex genetic algorithms. But my experience in the trenches with pardoX has taught me the opposite: extreme speed is not born from complexity; it is born from order. To get a modest laptop to process 640 million rows without collapsing, I didn't need to invent new physics; I needed to become an obsessive conductor.

The massive leap in performance you saw in the previous chapter wasn't luck. It was the result of rewriting the very heart of the engine, a transformation I call "The Internal Alchemy." Until recently, pardoX operated with somewhat naive logic: if it had 8 cores, it tried to use them all to the max, throwing read and write threads into the fight like a free-for-all brawl. The result was a civil war inside the CPU. Read threads competed for the same clock cycles as compression threads. They stepped on each other's toes, blocked each other, and generated what we technically call "contention." The CPU spent more time deciding who to serve than processing actual data.

To fix this, I had to endow pardoX with consciousness. I implemented a new logical module that acts as the system's "Brain." Now, before processing a single byte, the engine wakes up and "reads" the environment. It assumes nothing. It interrogates the operating system: "How much real RAM do I have available? How many physical and logical cores exist?" With this information, it makes surgical decisions before the race even begins.

I designed a strict allocation table, an internal map of truth. Instead of using risky floating-point math formulas that sometimes rounded down and left cores idle, I created deterministic logic. If the "Brain" detects 8 cores, it knows exactly what to do: it assigns a specific group of threads exclusively for reading and another isolated group for compression and writing. It's not a suggestion; it's martial law. By isolating these resource pools, we ensure that reading never chokes writing and vice versa. The data flow became a perfect assembly line where every worker has their space and rhythm, eliminating the internal friction that previously held us back.

But organizing the CPU wasn't enough. We had another mortal enemy in massive loads: the Operating System. When you try to write a single 20GB file in one go, the file system starts to suffer. Write buffers fill up, disk cache saturates, and suddenly, the whole system freezes to "flush" memory to disk. Those are the moments when your mouse stops responding and the music cuts out. It's the operating system drowning.

To prevent this, I implemented smart fragmented writing logic. Instead of forcing the system to swallow a giant data monolith, pardoX now manages the output in manageable blocks automatically. But the key isn't just splitting; it's how we do it. I designed a "Zero Friction" mechanism. The engine monitors the row flow in real-time. When it detects that a segment has reached an optimal size, it closes that channel instantly, forcing a controlled flush to disk, and opens the next one milliseconds later.

This constant rotation has a magical effect: it allows the operating system to "breathe." By closing segments periodically, we release system resources and allow the disk cache to clear naturally, without causing pressure spikes. It's the difference between trying to run a marathon while holding your breath and running it with rhythmic, controlled breathing. The bear no longer runs until it passes out; it runs with a stable heart rate.

This combination of hardware awareness and intelligent resource management is the true alchemy. There is no black magic, just a deep understanding of how the metal works under our fingers. We have moved from brute force to surgical precision, and that is why today we can look a massive dataset in the eye and say: "Bring it on, I'm not afraid of you."

5. The Future of the Bear: From Converter to Integral ETL Engine

So far, I’ve celebrated speed. I’ve toasted to the benchmarks and felt good watching pardoX devour 640 million rows without breaking a sweat. But if I’m honest with myself, and with you, ingestion speed is just the first step. Converting a CSV to Parquet in record time is incredibly useful; it saves disk space, accelerates my subsequent queries, and cleans up the mess legacy systems often leave behind. But as a Data Engineer, I don’t earn a living simply by moving boxes from one side of the warehouse to the other. My real work begins when I open those boxes.

The natural evolution of pardoX cannot stop at being the “fastest converter in the west.” If I stopped there, I would have built a glorified utility, a vitamin-boosted script I use once and forget. My vision is different. The frustration that birthed this project didn’t come just from the slowness of reading files, but from the impossibility of working with them locally. What good is loading 50 million rows in 10 seconds if, the moment I try to JOIN with another table, my RAM explodes and the kernel kills the process?

This is where I am aiming the heavy artillery now. I am working deep in the engine’s guts to endow it with real analytical capabilities. I don’t want pardoX to be an intermediate step; I want it to be the processing core. I am advancing critical work on the Join engine. And I’m not talking about simple lookups I could do in Excel. I’m talking about joining massive datasets from heterogeneous sources—imagine crossing a CSV sales dump with a JSON product catalog and a historical record from a legacy system—all in memory, in real-time, and without the overhead that typically kills pandas or makes Spark overkill for my single machine.

The technical challenge here is fascinating. Joining data is, computationally, much more expensive than simply reading it. It requires maintaining state, building hash tables in memory, and managing “spill” (when data doesn’t fit and must temporarily go to disk) in a way that doesn’t destroy performance. I am applying the same “Zero-Copy” and “Hardware Awareness” philosophy to these operations. I want to be able to filter, group, sort, and join data with the same fluidity with which I currently convert it.

I imagine being able to execute complex logical operations—“give me the average sales by region, but only for products that haven’t had returns in the last 30 days, crossing with the mainframe inventory table”—and having the answer arrive in seconds, on my laptop, without needing to upload anything to the cloud or configure a cluster. That is my goal. I am building the primitives for aggregations and window operations that are CPU-efficient, leveraging every available cycle.

The final vision is to turn pardoX into an integral ETL engine. Not just “step 1” of my pipeline, but the engine that powers the entire transformation process. I want it to be the Swiss Army knife I pull out when the problem is too big for conventional desktop tools, but too small or urgent for the bureaucracy of corporate Big Data infrastructure. I am building the missing link between the local script and industrial-scale data engineering. And if the ingestion results are any indicator, what’s coming with analytical processing is going to change how I think about what is possible to do “locally.”

6. Your Turn: Building the Tool You Actually Need

If you've read this far, you've probably seen yourself reflected in some part of this story. Everything I've built with pardoX, every line of optimized code, and every battle against the compiler, wasn't born from an academic desire to reinvent the wheel. It was born from a very personal and tangible frustration. It was born from sitting in an office at 7 PM, staring at a progress bar that refused to move on my corporate i5 laptop, knowing that RAM was at 99% and that if I moved the mouse, the whole system would crash. It was born from the helplessness of knowing the data was there, but my tools were too heavy or too slow to reach it.

But I know my experience isn't unique. I know that out there, thousands of engineers, analysts, and data scientists are fighting their own silent battles. Maybe your pain isn't ingestion speed. Maybe your nightmare is that Java installation that breaks environment variables every time you try to run a simple process. Or perhaps it's the absurd complexity of configuring a local Spark cluster just to process a file that is "too big for Excel" but "too small for the cloud." Or simply, the constant fear of that MemoryError message in Pandas when the client sends you the monthly report.

That's why this chapter isn't about what I've done, but about what we can do together. pardoX isn't designed to be a technical curiosity in my GitHub repo; it's designed to be the tool you use on Monday morning to solve that problem that keeps you awake. But to achieve that, I need to get out of my own head and get into yours.

I want to open a direct line of communication with you. I don't want assumptions; I want real, dirty, complicated use cases. What do you hate most about your current tool? Is it the syntax? Is it the installation? Is it the way it handles dates or special characters? What is that feature you've always wished existed but no popular library seems to prioritize?

I am working tirelessly to package all this power into version 0.1 beta. It won't be perfect, but it will be fast, lightweight, and above all, honest. I want to make sure that when the bear wakes up and reaches your hands, it's not just capable of running fast, but of solving the problems that truly hurt you. So I invite you to write to me, to comment, to share your data horror stories. Let's build the tool we deserve, not the one the industry imposes on us.

7. Final Reflection and Farewell

Looking back at those nights debugging memory leaks and those moments of euphoria when the terminal hit a new record, I realize that pardoX has become something more than just a binary to me. It is a statement of principles. It is the stubborn refusal to accept that “slow and heavy” must be the standard in our industry. Sometimes we forget that behind every system, every report, and every query, there is a person waiting for an answer. Optimizing is not just a technical matter or vanity; it is a profound form of respect for other people’s time and, above all, for our own. When we manage to turn a 9-minute task into a 4-minute one, we are not just saving electricity; we are reclaiming life. That is the true victory of efficient engineering.

Due to the dates, this is likely my last technical report until January. So, from the bottom of my heart, I wish you a Merry Christmas and a prosperous 2026, full of blessings, health, and of course, lots of clean code, without bugs or failures in production.

The Forgotten Sector My fight is for that “forgotten sector.” They are the engineers maintaining 20-year-old banking systems. They are the PHP developers supporting an entire country’s e-commerce. They are the analysts with no cloud budget whose “Data Lake” is a folder full of CSVs on a corporate laptop. They also deserve speed. They also deserve modern tools. pardoX is my love letter to that sector.

On Noise and Opinions On this path, I have learned to filter out the noise. The internet is full of opinions on which tool is “the best.” But honestly, I try not to get distracted by theoretical debates or benchmark wars. I focus on what builds. If you come to tell me that Rust is better than C++ or vice versa, I probably won’t answer. But if you come with an idea, with a strange use case, with a bug you found processing data from a pharmacy in a remote village... then we are on the same team.

📬 Contact Me: Tell Me Your Horror Story As I mentioned earlier, I need to get out of my head and into your reality. Send me your use cases, your frustrations, and those data “horror stories” that no one else understands. I am here to read them.

Direct Email: iam@albertocardenas.com (I read all emails that provide value or propose solutions).
LinkedIn: linkedin.com/in/albertocardenasd (Let’s connect. Mention you read the “pardoX” series for a quick accept).
X (Official PardoX): x.com/pardox_io (News and releases).
X (Personal): x.com/albertocardenas (My day-to-day in the trenches).
BlueSky: bsky.app/profile/pardoxio.bsky.social

Thank you for reading this far and for joining me on this journey. See you in the compiler in 2026.

Alberto Cárdenas.