<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rahmad Afandi</title>
    <description>The latest articles on DEV Community by Rahmad Afandi (@rahmadafandi).</description>
    <link>https://dev.to/rahmadafandi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3825470%2F91bcb61b-cb8d-4bf2-9fc8-6ba35c2af3b4.jpg</url>
      <title>DEV Community: Rahmad Afandi</title>
      <link>https://dev.to/rahmadafandi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rahmadafandi"/>
    <language>en</language>
    <item>
      <title>How I Made Python Excel Exports 9x Faster with Rust</title>
      <dc:creator>Rahmad Afandi</dc:creator>
      <pubDate>Sun, 15 Mar 2026 15:27:46 +0000</pubDate>
      <link>https://dev.to/rahmadafandi/how-i-made-python-excel-exports-9x-faster-with-rust-36lh</link>
      <guid>https://dev.to/rahmadafandi/how-i-made-python-excel-exports-9x-faster-with-rust-36lh</guid>
      <description>&lt;h1&gt;
  
  
  How I Made Python Excel Exports 9x Faster with Rust
&lt;/h1&gt;

&lt;p&gt;At my day job, I work with systems that handle a lot of data. One recurring pain point: &lt;strong&gt;Excel exports&lt;/strong&gt;. Every time users triggered a large report — hundreds of thousands of rows — the server would spike. CPU pegged, response times tanked, other requests queued up behind one export job.&lt;/p&gt;

&lt;p&gt;I tried everything. &lt;code&gt;xlsxwriter&lt;/code&gt; with &lt;code&gt;constant_memory&lt;/code&gt; mode. Background workers. Chunked exports. But the fundamental problem remained: Python is slow at writing millions of cells to Excel. A 1M-row export takes &lt;strong&gt;~52 seconds&lt;/strong&gt;. That's 52 seconds of CPU time blocking the server.&lt;/p&gt;

&lt;p&gt;I started researching alternatives in my spare time. That's when I discovered &lt;a href="https://github.com/PyO3/pyo3" rel="noopener noreferrer"&gt;PyO3&lt;/a&gt; — a bridge between Rust and Python — and &lt;a href="https://github.com/PyO3/maturin" rel="noopener noreferrer"&gt;maturin&lt;/a&gt;, which makes building Rust-Python packages dead simple. And I found &lt;a href="https://github.com/jmcnamara/rust_xlsxwriter" rel="noopener noreferrer"&gt;rust_xlsxwriter&lt;/a&gt; — a Rust port of xlsxwriter by the same original author.&lt;/p&gt;

&lt;p&gt;The pieces clicked. What if I could keep the Python API but move the heavy lifting to Rust?&lt;/p&gt;

&lt;p&gt;My first prototype was rough, but it already showed &lt;strong&gt;~6x speedup&lt;/strong&gt; over xlsxwriter. That was enough. The server spikes went away, exports finished before users could complain, and I moved on to other work.&lt;/p&gt;

&lt;p&gt;The project sat at 6x for a long time. It solved my problem, so there was no urgency to push further. But every now and then, in my spare time, I'd come back to it — tweak something, try an idea, benchmark again.&lt;/p&gt;

&lt;p&gt;Recently, I paired up with &lt;strong&gt;Claude&lt;/strong&gt; (Anthropic's AI) for an intensive optimization session. Over a single long conversation, we profiled every layer, found bottlenecks I'd missed, and pushed the performance from 6x all the way to &lt;strong&gt;9x faster&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This is the story of that journey — from the initial problem to the final 9x.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Python's &lt;code&gt;xlsxwriter&lt;/code&gt; is great. Clean API, well-maintained, does everything you need. But it's pure Python, and for large datasets, that's a problem — especially on a server where one slow export can affect everything else:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;xlsxwriter&lt;/span&gt;

&lt;span class="n"&gt;wb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;xlsxwriter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Workbook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report.xlsx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;constant_memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;ws&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;wb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_worksheet&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# This loop is the bottleneck
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;million_records&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;()):&lt;/span&gt;
        &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Pure Python per-cell overhead
&lt;/span&gt;
&lt;span class="n"&gt;wb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c1"&gt;# ⏱ ~52 seconds for 1M rows
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every &lt;code&gt;ws.write()&lt;/code&gt; call goes through Python's interpreter. Type checking, method dispatch, object allocation — &lt;strong&gt;multiplied by millions of cells&lt;/strong&gt;. The actual Excel XML generation is fast, but the Python overhead per cell kills performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Idea: What if the hot loop was in Rust?
&lt;/h2&gt;

&lt;p&gt;The plan was simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Accept Python data (dicts, DataFrames, generators)&lt;/li&gt;
&lt;li&gt;Cross the FFI boundary &lt;strong&gt;once&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Do all the heavy lifting in Rust&lt;/li&gt;
&lt;li&gt;Return the Excel file&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The First Result
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;rustpy_xlsxwriter&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;write_worksheet&lt;/span&gt;

&lt;span class="nf"&gt;write_worksheet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;million_records&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report.xlsx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sheet_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# ⏱ ~8 seconds for 1M rows (first version — ~6x faster)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No fancy API, just a function call. But already &lt;strong&gt;6x faster&lt;/strong&gt; — enough to solve the server problem.&lt;/p&gt;

&lt;p&gt;Later, I added the &lt;code&gt;FastExcel&lt;/code&gt; builder class for a nicer developer experience. But that came after the core was proven.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Optimization Journey (6x → 9x)
&lt;/h2&gt;

&lt;p&gt;Getting from 6x to 9x wasn't a single change. It was a series of optimizations, each building on the last. Most of these came out of that intensive session with Claude.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The Obvious Win: Move the Loop to Rust
&lt;/h3&gt;

&lt;p&gt;The first version simply moved the per-cell loop from Python to Rust. Instead of Python calling &lt;code&gt;ws.write()&lt;/code&gt; a million times, Python passes the entire list once, and Rust iterates it internally.&lt;/p&gt;

&lt;p&gt;This alone gave &lt;strong&gt;~5x speedup&lt;/strong&gt; — just from avoiding Python interpreter overhead per cell. This was my initial version.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Cache Everything
&lt;/h3&gt;

&lt;p&gt;The original code created a new &lt;code&gt;Format&lt;/code&gt; object for every cell that needed formatting. For 1M rows with float formatting, that's 1M allocations.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Before: new Format per cell ❌&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;format&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;Format&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.set_num_format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"0.00"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;worksheet&lt;/span&gt;&lt;span class="nf"&gt;.write_number_with_format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// After: create once, reuse ♻️&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;float_fmt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;Format&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.set_num_format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"0.00"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// once&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;worksheet&lt;/span&gt;&lt;span class="nf"&gt;.write_number_with_format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;float_fmt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// reuse&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same for datetime formats, bold formats — create once, reuse across millions of cells.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Skip the Type Cascade
&lt;/h3&gt;

&lt;p&gt;Python is dynamically typed, so for each cell we need to figure out: is it a string? int? float? bool? datetime? None?&lt;/p&gt;

&lt;p&gt;The naive approach tries each type in sequence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="py"&gt;.cast&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;PyString&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="py"&gt;.cast&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;PyFloat&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;  &lt;span class="c1"&gt;// 1 failed cast&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="py"&gt;.cast&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;PyBool&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;   &lt;span class="c1"&gt;// 2 failed casts&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="py"&gt;.cast&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;PyInt&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;     &lt;span class="c1"&gt;// 3 failed casts&lt;/span&gt;
&lt;span class="c1"&gt;// ... and so on&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a column that's always integers, every cell wastes 3 failed casts before finding the right one. Multiply by millions of cells.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix: First-row type caching.&lt;/strong&gt; Detect the type from row 1, cache it per column, and try that type first for all subsequent rows. Cache miss? Fall back to the full cascade.&lt;/p&gt;

&lt;p&gt;This eliminated &lt;strong&gt;~39 million unnecessary type checks&lt;/strong&gt; on my 1M × 13-column test data.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Dict Values, Not Keys
&lt;/h3&gt;

&lt;p&gt;A subtle one. When iterating a Python dict, you can either:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Slow: hash lookup per key ❌&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;header&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;row_dict&lt;/span&gt;&lt;span class="nf"&gt;.get_item&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;header&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// hash lookup&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Fast: iterate values directly ✅&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;row_dict&lt;/span&gt;&lt;span class="nf"&gt;.values&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="c1"&gt;// sequential iteration&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Python dicts maintain insertion order, so &lt;code&gt;values()&lt;/code&gt; gives us the same column order as &lt;code&gt;keys()&lt;/code&gt;. This skips &lt;strong&gt;N hash lookups per row&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Arrow Zero-Copy for DataFrames
&lt;/h3&gt;

&lt;p&gt;This was the biggest architectural change. Pandas and Polars DataFrames store data in columnar Arrow format internally. Instead of converting each cell to a Python object and passing it to Rust:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DataFrame → Python objects (slow) → Rust → Excel
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I read the Arrow memory buffers directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DataFrame → Arrow C Data Interface → Rust reads memory → Excel
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Zero Python object allocation. Zero per-cell FFI calls.&lt;/strong&gt; The data never leaves native memory until it's written to Excel.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Compiler Optimizations
&lt;/h3&gt;

&lt;p&gt;Free performance with zero code changes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[profile.release]&lt;/span&gt;
&lt;span class="py"&gt;lto&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"thin"&lt;/span&gt;        &lt;span class="c"&gt;# Link-Time Optimization: cross-crate inlining&lt;/span&gt;
&lt;span class="py"&gt;codegen-units&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;   &lt;span class="c"&gt;# Better optimization (slower compile)&lt;/span&gt;
&lt;span class="py"&gt;strip&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"symbols"&lt;/span&gt;   &lt;span class="c"&gt;# Smaller binary, better cache&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gave ~10-15% on top of everything else.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Final Numbers
&lt;/h2&gt;

&lt;p&gt;I run &lt;code&gt;python benchmark.py&lt;/code&gt; to measure everything:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Input&lt;/th&gt;
&lt;th&gt;500K rows&lt;/th&gt;
&lt;th&gt;1M rows&lt;/th&gt;
&lt;th&gt;vs Python&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Records&lt;/strong&gt; (list of dicts)&lt;/td&gt;
&lt;td&gt;2.99s&lt;/td&gt;
&lt;td&gt;5.94s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~9x faster&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pandas DataFrame&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1.21s&lt;/td&gt;
&lt;td&gt;2.41s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~8x faster&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Polars DataFrame&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1.20s&lt;/td&gt;
&lt;td&gt;2.42s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~7x faster&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CSV output&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.16s&lt;/td&gt;
&lt;td&gt;0.32s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~5x faster&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;DataFrames are faster than Records because of the Arrow zero-copy path — no Python objects involved at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the API Looks Like
&lt;/h2&gt;

&lt;p&gt;I wanted it to feel Pythonic, not like a Rust library:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;rustpy_xlsxwriter&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastExcel&lt;/span&gt;

&lt;span class="c1"&gt;# Simple
&lt;/span&gt;&lt;span class="nc"&gt;FastExcel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output.xlsx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;sheet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Users&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Full-featured
&lt;/span&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;FastExcel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report.xlsx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;secret&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;float_format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.00&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;datetime_format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dd/mm/yyyy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;bold_headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;freeze&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sheet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Employees&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;employee_df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Pandas or Polars
&lt;/span&gt;    &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sheet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Revenue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;revenue_records&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# list of dicts
&lt;/span&gt;    &lt;span class="c1"&gt;# auto-saves on exit
&lt;/span&gt;
&lt;span class="c1"&gt;# CSV/TSV — same API, just change extension
&lt;/span&gt;&lt;span class="nc"&gt;FastExcel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;sheet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pandas &amp;amp; Polars DataFrames (Arrow zero-copy)&lt;/li&gt;
&lt;li&gt;Generators for memory-efficient streaming&lt;/li&gt;
&lt;li&gt;Context manager with auto-save&lt;/li&gt;
&lt;li&gt;Freeze panes, password protection&lt;/li&gt;
&lt;li&gt;Float/datetime formatting, bold headers&lt;/li&gt;
&lt;li&gt;CSV/TSV output (~5x faster than Python csv)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;io.BytesIO&lt;/code&gt; for web frameworks&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Profile before optimizing.&lt;/strong&gt; My first instinct was "rewrite everything in Rust." But the actual bottleneck was the per-cell Python overhead, not the Excel generation. Understanding this shaped every optimization.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The FFI boundary is the enemy.&lt;/strong&gt; Every Python↔Rust call has overhead. The goal is to cross it as few times as possible. Arrow zero-copy is the ultimate expression of this — cross the boundary zero times per cell.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Diminishing returns are real.&lt;/strong&gt; Going from 1x to 5x was one optimization (move the loop). Going from 5x to 7x took three more. Going from 7x to 9x took six more. Know when to stop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. PyO3 is incredible.&lt;/strong&gt; Seriously. Writing a Python extension in Rust with PyO3 is surprisingly ergonomic. Type conversions, error handling, GIL management — it handles everything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Tests matter more than benchmarks.&lt;/strong&gt; I have 102 tests that verify actual cell content via openpyxl read-back. Without these, every optimization would have been a gamble.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;rustpy-xlsxwriter
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;rustpy_xlsxwriter&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastExcel&lt;/span&gt;

&lt;span class="n"&gt;records&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1_000_000&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="nc"&gt;FastExcel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fast.xlsx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;sheet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub: &lt;a href="https://github.com/rahmadafandi/rustpy-xlsxwriter" rel="noopener noreferrer"&gt;rahmadafandi/rustpy-xlsxwriter&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with Rust, PyO3, and rust_xlsxwriter. MIT licensed. Contributions welcome.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>rust</category>
      <category>performance</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
