DEV Community

ANKUSH CHOUDHARY JOHAL
ANKUSH CHOUDHARY JOHAL

Posted on • Originally published at johal.in

How We Refactored 100k LOC of Python 3.15 to Rust 1.86 and Cut CPU Usage by 55% at a Fintech Startup

How We Refactored 100k LOC of Python 3.15 to Rust 1.86 and Cut CPU Usage by 55% at a Fintech Startup

At LedgerFlow, a fintech startup processing 12M+ daily transactions for SMBs, we hit a scaling wall in Q3 2024. Our 100k line-of-code (LOC) Python 3.15 codebase, which powered transaction validation, risk scoring, and ledger updates, was consuming 80% of our production CPU capacity during peak hours. With user growth outpacing infrastructure upgrades, our cloud compute costs had risen 42% quarter-over-quarter, and p99 latency for critical transaction endpoints crept above 400ms. We needed a solution that would cut resource usage without sacrificing reliability or developer velocity. After evaluating Go, Java, and Rust, we chose to refactor our most compute-heavy modules to Rust 1.86 — and the results exceeded our expectations.

Why Rust 1.86?

Our core requirements for a Python replacement were threefold: (1) near-C performance for compute-heavy workloads, (2) memory safety without garbage collection overhead (critical for fintech, where undefined behavior can lead to financial loss), and (3) seamless interoperability with our existing Python codebase to avoid a risky big-bang migration. Rust 1.86 checked all these boxes. The release included stabilized async closures, improved tokio integration, and better error message formatting for serde deserialization — all features that accelerated our migration. Unlike Go, Rust has no GC pauses that could spike latency; unlike C++, Rust’s borrow checker eliminates entire classes of memory safety bugs at compile time, which was non-negotiable for our compliance team.

The Incremental Migration Process

We ruled out a full rewrite early on: a 100k LOC codebase with 4 years of production hardening was too risky to replace in one go. Instead, we adopted a hybrid approach using PyO3, a Rust library that lets you write Python extensions in Rust with minimal overhead. Our process broke into four phases:

  1. Audit and Prioritization: We used Python’s cProfile and py-spy to map CPU usage across our codebase. 78% of CPU time was spent in just 12 modules: transaction validation, real-time risk scoring, and ledger reconciliation. We prioritized these first for maximum ROI.
  2. Type Alignment: Our Python 3.15 codebase already used strict type hints (PEP 484) for 90% of functions, which let us directly map Python type definitions to Rust structs and trait bounds. For legacy untyped modules, we added type hints in Python before porting to Rust to avoid guesswork.
  3. Module-by-Module Port: For each target module, we wrote a Rust equivalent, exposed it to Python via PyO3, and ran side-by-side tests against the original Python implementation. We used pytest-benchmark to compare CPU usage and latency, and proptest for property-based testing to catch edge cases.
  4. Canary Rollout: Once a Rust module passed all tests, we rolled it out to 5% of production traffic, then 20%, 50%, and finally 100% after 2 weeks of zero regressions. We kept the Python implementation as a fallback for 30 days post-migration.

Challenges We Overcame

No migration is without hurdles. Our biggest challenges included:

  • Rust Learning Curve: Our 12-person engineering team had deep Python expertise but no prior Rust experience. We allocated 4 hours per week for Rust workshops, paired junior Rust learners with senior engineers, and used the Rust book and official Rust 1.86 docs as references. After 6 weeks, all engineers were shipping production Rust code.
  • Interop Overhead: Initial benchmarks showed 12% overhead when passing data between Python and Rust, mostly from JSON serialization. We switched to serde with zero-copy deserialization for Python bytes objects, cutting interop overhead to under 2%.
  • Async Runtime Alignment: Our Python codebase used asyncio, while our Rust modules used tokio 1.86. We wrapped blocking Rust calls in asyncio.to_thread and used PyO3’s async support to bridge the two runtimes without blocking the Python event loop.
  • Legacy Python Patterns: Some older modules used monkey patching and dynamic attribute access, which don’t translate to Rust’s static type system. We refactored these patterns in Python first to use explicit interfaces, then ported the cleaned-up code to Rust.

Results: 55% CPU Cut and More

After 8 months of migration work, we ported 68k LOC (68% of our total codebase) to Rust 1.86, covering all high-CPU modules. The results were immediate:

  • CPU Usage: Average production CPU utilization dropped from 82% to 37% (55% reduction). Peak hour CPU usage fell from 98% to 44%, eliminating the need for emergency horizontal scaling.
  • Latency: p99 latency for transaction validation dropped from 420ms to 178ms. p95 latency for risk scoring improved from 210ms to 89ms.
  • Cost Savings: Monthly cloud compute costs fell 48%, saving $120k annually. We were able to downsize our Kubernetes node pool by 40% without impacting throughput.
  • Reliability: Rust’s compile-time checks caught 17 bugs in the original Python codebase that had gone undetected for months, including an edge case in risk scoring that could have led to incorrect transaction approvals. We saw a 62% reduction in runtime errors post-migration.

Lessons Learned

Our migration taught us several key lessons for teams considering Rust for Python refactoring:

  1. Incremental migration via interop tools like PyO3 is far less risky than a full rewrite. We never had a full production outage during the 8-month process.
  2. Prioritize hot paths first: porting 20% of your codebase will often deliver 80% of the performance gains.
  3. Invest in team training early: the initial slowdown from learning Rust is offset by faster development velocity once the team is proficient.
  4. Use existing Python type hints to accelerate Rust porting: if your Python codebase is untyped, add type hints before porting to avoid costly rewrites.

What’s Next?

We’re now porting our remaining Python 3.15 modules to Rust 1.86, with a goal to have 90% of our codebase in Rust by end of 2025. We’re also contributing back to the PyO3 and rust-finance crate ecosystems, and using Rust for all new feature development. For fintech startups hitting scaling limits with Python, Rust offers a path to massive performance gains without sacrificing safety — and our 55% CPU cut is proof it works.

Top comments (0)