This week at API Days Paris, I watched something rare: a client and consultant presenting together about an AI project that actually shipped to production.
Cyrille Martraire (CTO at Arolla) and Thomas Nansot (Director of Engineering at a major European mobility platform) walked through their journey using AI to validate a critical API migration—one handling hundreds of millions of tickets annually.
What made it different? They shared the dead ends, the failed approaches, and how their final solution was nothing like the original plan.
The Problem
Thomas's company needed to migrate from a legacy API to a new architecture. The stakes were high—any regression could affect millions of transactions. Traditional testing would be prohibitively expensive and slow, especially since every contract change meant redoing the work.
They needed a way to guarantee non-regression between APIs with completely different structures.
The 6 Key Learnings
1. 🔨 Hack First, Polish Never
When Thomas reached out, Cyrille didn't ask for requirements docs. He immediately built a hacky prototype with fake systems to prove the concept could work.
The lesson: In AI projects, velocity of learning beats polish. You can't plan your way through uncertainty—you prototype your way through it.
# Quick prototype approach
legacy_response = fake_legacy_api()
new_response = fake_new_api()
ai_compares(legacy_response, new_response)
# Does it work? Kind of? Good enough to continue!
2. ⚡ AI Generates Code > AI Runs Tests
Their first production attempt was elegant: AI agent does everything end-to-end.
It was also broken:
- Slow: 2+ minutes per test
- Expensive: ~$1 per test run
- Unreliable: Random failures
The breakthrough? Use AI to generate the test code, not run the tests.
# ❌ Approach 1: Live AI (expensive, slow)
for each test:
result = ai.compare(legacy_api(), new_api()) # $$$
# ✅ Approach 2: Generated code (cheap, fast)
test_code = ai.generate_comparison_code() # Once
for each test:
result = test_code.run() # $0, deterministic
Cost comparison:
- Live AI: $1 × 1000 tests = $1000
- Generated: $2 to generate + $0 × 1000 = $2
The pattern: AI works "offline" to create tools, then those tools do the actual work.
3. 🗄️ MCP: Query JSON Like a Database
The API schemas were massive. Cramming everything into the LLM context caused attention issues—even when it technically fit, quality degraded.
Solution: Model Context Protocol (MCP)
Instead of:
prompt = f"Analyze this entire JSON: {10mb_schema}"
Do this:
mcp = JSONMCPServer(huge_schema)
email = mcp.query_path("passenger.email")
keys = mcp.list_keys("journey.segments")
They specifically recommended the "JSON-to-MCP" tool.
Why it matters: We're discovering design patterns for LLMs at scale. MCP is like moving from "here's a phone book" to "here's a search interface."
4. 🎲 Accept Being Surprised
Thomas's quote hit hard:
"As a manager, I expected to have a clear vision of what would work. I had to admit the solution was completely different from what I imagined—and it was better."
What they tried:
- Full AI approach → Too slow & expensive
- Slice & compare → Too complex
- Generated code + MCP → Success!
The winning solution wasn't in the original plan. They needed short feedback cycles and willingness to pivot.
The mindset: If you bring too much certainty to AI projects, you limit yourself. Let the technology surprise you.
5. 💾 Offline AI > Online AI (Sometimes)
Key insight: "AI is sometimes better offline."
When to use each:
| Pattern | Use Case | Cost | Speed |
|---|---|---|---|
| Live AI | Dynamic decisions, personalization | High per use | Variable |
| Generated | Repetitive tasks, validation | One-time | Fast |
Examples of offline AI:
- ✅ AI generates test suites → run 1000x
- ✅ AI writes Terraform modules → apply repeatedly
- ✅ AI creates validation rules → check all data
- ✅ AI generates docs templates → reuse forever
6. 🎓 Knowledge Transfer > Expert Does Everything
After proving the technical concept, Cyrille's role shifted from "maker" to "coach."
The evolution:
- External expert builds solution
- Proves it works, gets buy-in
- Expert teaches internal team hands-on
- Team runs it independently
- Learnings apply to other projects
Impact: Even AI-skeptical engineers got excited about these techniques for their own work.
Real value = Solving the problem + Building internal capability
Practical Takeaways
If you're considering AI for production:
✅ Do:
- Start with hacky prototypes
- Consider generated artifacts over live decisions
- Use MCP-style patterns for large data structures
- Plan for short feedback cycles
- Build internal capability, not just solutions
❌ Don't:
- Wait for perfect requirements
- Assume "full AI" is always the answer
- Fight context window limits—work around them
- Plan everything upfront
- Keep expertise external
The Messy Middle Is the Point
What I appreciated most was their honesty. Too many AI talks show polished end results and skip the dead ends.
But the dead ends are the story. That's where the learning happens.
They didn't have a perfect plan. They had a hypothesis, willingness to iterate, and courage to be surprised.
That's probably the most valuable lesson of all.
About the speakers:
Cyrille Martraire - CTO at Arolla, author of "Living Documentation," deeply embedded in software craft community
Thomas Nansot - Director of Engineering managing ~150 engineers on a mobility distribution platform serving millions across Europe
If you attended API Days Paris or have experience with AI in production, I'd love to hear your takeaways in the comments!
Architecture Deep Dive: Curious about the infrastructure patterns that enable safe migration at this scale? I explored the API gateway, shadow testing, and rollback architecture in a companion article: link.
Discussion Questions
- Have you tried the "AI generates code" pattern in your projects? How did it compare to live AI?
- What's your biggest challenge with LLM context windows?
- How do you balance exploration vs. planning in AI projects?
Connect with me:

Top comments (0)