A Founder's Perspective on Language Choice When Performance Meets Reliability
Python is the default language for AI. The ecosystem is unmatched: PyTorch, TensorFlow, scikit-learn, Hugging Face Transformers. Every tutorial, every course, every blog post assumes you're working in Python. Fighting this current feels like swimming upstream.
So why did I rebuild the core infrastructure for Tagnovate in Rust? The short answer: production requirements that Python couldn't meet. The longer answer is a story about tradeoffs, learning curves, and what it actually takes to serve AI at scale.
The Python Prototype
Our first version was pure Python. The stack looked familiar to anyone in the AI space:
- FastAPI for the web layer
- LangChain for RAG orchestration
- sentence-transformers for embeddings
- PostgreSQL + pgvector for vector storage
Development velocity was excellent. We went from concept to working demo in three weeks.
Then we deployed to production with our first real client. The cracks appeared immediately.
Production Pain Points
| Issue | Impact |
|---|---|
| Response times 500ms - 3s | Unpredictable user experience |
| Memory usage grew unboundedly | Required manual restarts |
| Race conditions in caching | Data inconsistency |
| GC pauses during peak load | Timeout errors |
For a demo, these issues are minor. For a system handling guest requests at Hilton Garden Inn, they're unacceptable. Hospitality runs on reliability—nobody wants their room service inquiry to timeout.
The Rust Rewrite Decision
Rewriting working code is usually a mistake. The second-system effect claims more startups than competition. But we weren't proposing a full rewrite—we were targeting the performance-critical path.
The Hybrid Architecture
┌─────────────────────────────────────────────────────────┐
│ PRODUCTION SYSTEM │
├─────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ RUST (Hot Path) │ │
│ │ • Embedding generation │ │
│ │ • Vector similarity search │ │
│ │ • Response orchestration │ │
│ │ • Request handling │ │
│ └─────────────────────────────────────────────────┘ │
│ │ │
│ API Boundary │
│ │ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ PYTHON (Cold Path) │ │
│ │ • Model training │ │
│ │ • Data preprocessing │ │
│ │ • Admin tasks │ │
│ │ • Experimentation │ │
│ └─────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────┘
The languages communicate through a well-defined API boundary. This approach minimised risk while capturing the performance benefits. We could migrate incrementally, validating each component before moving to the next.
What Rust Actually Provides
1. Predictable Latency
No garbage collection pauses. Our P99 latency dropped from 2.1 seconds to 180ms. More importantly, the variance collapsed. Users experience consistent performance, which builds trust.
Python P99: 2100ms ████████████████████████████████████████░░
Rust P99: 180ms ████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
2. Memory Efficiency
Rust's ownership model means memory is freed precisely when it's no longer needed. Our memory footprint dropped 60%, allowing us to handle more concurrent connections on the same hardware.
// Memory is freed when `embeddings` goes out of scope
fn process_query(query: &str) -> Vec<f32> {
let embeddings = generate_embeddings(query);
let results = search_vectors(&embeddings);
format_response(results)
// embeddings dropped here, memory freed immediately
}
3. Fearless Concurrency
Rust's compiler catches data races at compile time. The bugs that plagued our Python caching layer became impossible to write in Rust.
// This won't compile - Rust prevents the data race
fn broken_cache() {
let cache = HashMap::new();
thread::spawn(|| {
cache.insert("key", "value"); // ERROR: cannot borrow as mutable
});
}
// This compiles - proper synchronisation enforced
fn safe_cache() {
let cache = Arc::new(RwLock::new(HashMap::new()));
let cache_clone = cache.clone();
thread::spawn(move || {
cache_clone.write().unwrap().insert("key", "value"); // OK
});
}
If the code compiles, it's thread-safe.
4. Deployment Simplicity
A Rust binary has no runtime dependencies. No virtualenv issues, no pip conflicts, no Python version mismatches between development and production.
# Python Dockerfile
FROM python:3.11
COPY requirements.txt .
RUN pip install -r requirements.txt # Hope nothing breaks
COPY . .
CMD ["python", "main.py"]
# Rust Dockerfile
FROM rust:1.75 as builder
COPY . .
RUN cargo build --release
FROM debian:bookworm-slim
COPY --from=builder /app/target/release/server /server
CMD ["/server"] # Just works
The binary we build locally is the binary that runs in production.
The Learning Curve Reality
I won't pretend the transition was easy. Rust's borrow checker is unforgiving:
// Day 1: Why won't this compile?!
fn process(data: Vec<String>) -> String {
let first = &data[0];
data.clear(); // ERROR: cannot borrow as mutable
first.clone()
}
// Day 30: Oh, that would have been a use-after-free bug
fn process(data: Vec<String>) -> String {
let first = data[0].clone(); // Clone first
drop(data); // Then clear
first
}
The first month was frustrating. I was slower than I'd ever been.
But the compiler's strictness is actually teaching. Every error message points to a potential bug that would have manifested at runtime in Python. By the third month, I was writing code that worked correctly on the first run more often than not.
The Ecosystem Has Matured
The gaps that existed two years ago are closing rapidly:
| Need | Rust Solution |
|---|---|
| ML inference | Candle (HuggingFace) |
| Async networking | Tokio |
| JSON handling | Serde |
| HTTP server | Axum |
| Database | SQLx, Diesel |
| Vector operations | ndarray, nalgebra |
Real-World Performance Comparison
After the migration, here's what our metrics looked like:
| Metric | Python | Rust | Change |
|---|---|---|---|
| P50 latency | 850ms | 65ms | 13x faster |
| P99 latency | 2.1s | 180ms | 11x faster |
| Memory usage | 2.4GB | 960MB | 60% less |
| Concurrent users | ~100 | ~500 | 5x more |
| Deployment size | 1.2GB | 45MB | 96% smaller |
| Cold start | 8s | 200ms | 40x faster |
For Tagnovate and MenuGo, this translates to:
- Guests get instant responses
- We serve more clients on less infrastructure
- Deployments are reliable and fast
- 3am pages became rare
When to Choose Rust (and When Not To)
Choose Rust When:
✅ Performance is a product requirement, not just nice-to-have
For Tagnovate, sub-second responses define the user experience. Guests don't wait.
✅ Reliability is non-negotiable
If your system going down means angry hotel guests, the compiler's strictness is an asset.
✅ You're building infrastructure
Code that will run for years benefits from Rust's correctness guarantees.
✅ You're willing to invest in learning
The first few months are genuinely hard. Budget for the learning curve.
Stick with Python When:
❌ You need to ship next week
Stick with what you know. Velocity matters early.
❌ You're prototyping
Python's flexibility accelerates experimentation.
❌ Performance is acceptable
If your batch job runs overnight and completes on time, why optimise?
❌ Your team doesn't have buy-in
Forcing Rust on reluctant developers creates friction.
The Practical Migration Path
If you're considering Rust for AI workloads, here's what worked for us:
Phase 1: Identify the Hot Path
Profile your system. Find the 20% of code consuming 80% of resources. For us, it was embedding generation and vector search.
Phase 2: Define Clear Boundaries
Design an API contract between Rust and Python. Keep it simple—JSON over HTTP or gRPC.
Phase 3: Migrate Incrementally
One component at a time. Validate performance gains before moving to the next.
Phase 4: Keep Python for the Right Tasks
Model training, data preprocessing, and experimentation stay in Python. Use the right tool for the job.
Conclusion
The AI industry is maturing. The prototyping phase—where Python's flexibility trumps everything—is giving way to a production phase where reliability and performance matter.
Rust isn't right for every AI project. But for systems where latency affects user experience, where reliability is non-negotiable, and where you're willing to invest in the learning curve—Rust delivers.
At Tagnovate, we're processing 10,000+ daily AI interactions with 99.9% uptime. At MenuGo, restaurant guests get instant menu answers. The technology works because we chose the right tool for production.
The question isn't whether Rust can handle AI workloads. The question is whether you're ready to build systems that perform like the technology deserves.
Key Takeaways
- Python is great for prototypes; production has different requirements
- Hybrid architectures let you use the right tool for each job
- Rust's compiler catches bugs that would crash Python at runtime
- The learning curve is real but the payoff is substantial
- The ecosystem is ready—Candle, Tokio, and friends are production-grade
Building AI systems that need to perform? I'd love to compare notes. Connect on LinkedIn or check out how we apply these principles at Tagnovate and MenuGo.
About the Author
Mayuresh Shitole is the Founder and CTO of AmbiCube Pvt Ltd. He builds AI systems in Rust for the hospitality industry, serving clients from Hilton Garden Inn to local restaurants. Winner of the TigerData Agentic Postgres Challenge, he holds an MSc in Computer Engineering from the University of Essex. Follow his technical writing on DEV.to.
Top comments (0)