Soumia

Posted on Dec 12, 2025

Start Hacking Now: What a €500M API Migration Taught Me About AI in Production

#ai #mcp #api #llm

This week at API Days Paris, I watched something rare: a client and consultant presenting together about an AI project that actually shipped to production.

Cyrille Martraire (CTO at Arolla) and Thomas Nansot (Director of Engineering at a major European mobility platform) walked through their journey using AI to validate a critical API migration—one handling hundreds of millions of tickets annually.

What made this talk different wasn't a polished success story. It was their honesty about the dead ends, the failed approaches, and how the solution they landed on was nothing like what they'd planned.

Note: This migration is still in progress. What's been proven successful is the AI testing approach itself—the rest of the migration is ongoing.

Here's what I'm taking with me.

The Problem

Thomas's company needed to migrate from a legacy API to a new architecture. The stakes were high—any regression could affect millions of transactions. Traditional testing would be prohibitively expensive and slow, especially since every contract change meant redoing the work.

They needed a way to guarantee non-regression between APIs with completely different structures.

The 6 Key Learnings

1. 🔨 Hack First, Polish Never

When Thomas reached out, Cyrille didn't ask for requirements docs. He immediately built a hacky prototype with fake systems to prove the concept could work.

The lesson: In AI projects, velocity of learning beats polish. You can't plan your way through uncertainty—you prototype your way through it.

# Quick prototype approach
legacy_response = fake_legacy_api()
new_response = fake_new_api()
ai_compares(legacy_response, new_response)
# Does it work? Kind of? Good enough to continue!

2. ⚡ AI Generates Code > AI Runs Tests

Their first production attempt was elegant: AI agent does everything end-to-end.

It was also broken:

Slow: 2+ minutes per test
Expensive: ~$1 per test run
Unreliable: Random failures

The breakthrough? Use AI to generate the test code, not run the tests.

# ❌ Approach 1: Live AI (expensive, slow)
for each test:
    result = ai.compare(legacy_api(), new_api())  # $$$

# ✅ Approach 2: Generated code (cheap, fast)
test_code = ai.generate_comparison_code()  # Once
for each test:
    result = test_code.run()  # $0, deterministic

Cost comparison:

Live AI: $1 × 1000 tests = $1000
Generated: $2 to generate + $0 × 1000 = $2

The pattern: AI works "offline" to create tools, then those tools do the actual work.

3. 🗄️ MCP: Query JSON Like a Database

The API schemas were massive. Cramming everything into the LLM context caused attention issues—even when it technically fit, quality degraded.

Solution: Model Context Protocol (MCP)

Instead of:

prompt = f"Analyze this entire JSON: {10mb_schema}"

Do this:

mcp = JSONMCPServer(huge_schema)
email = mcp.query_path("passenger.email")
keys = mcp.list_keys("journey.segments")

They specifically recommended the "jsontools" tool.

Why it matters: We're discovering design patterns for LLMs at scale. MCP is like moving from "here's a phone book" to "here's a search interface."

4. 🎲 Accept Being Surprised

Thomas's quote hit hard:

"As a manager, I expected to have a clear vision of what would work. I had to admit the solution was completely different from what I imagined—and it was better."

What they tried:

Full AI approach → Too slow & expensive
Slice & compare → Too complex
Generated code + MCP → Success!

The winning solution wasn't in the original plan. They needed short feedback cycles and willingness to pivot.

The mindset: If you bring too much certainty to AI projects, you limit yourself. Let the technology surprise you.

5. 💾 Offline AI > Online AI (Sometimes)

Key insight: "AI is sometimes better offline."

When to use each:

Pattern	Use Case	Cost	Speed
Live AI	Dynamic decisions, personalization	High per use	Variable
Generated	Repetitive tasks, validation	One-time	Fast

Examples of offline AI:

✅ AI generates test suites → run 1000x
✅ AI writes Terraform modules → apply repeatedly
✅ AI creates validation rules → check all data
✅ AI generates docs templates → reuse forever

6. 🎓 Knowledge Transfer > Expert Does Everything

After proving the technical concept, the role shifted from "maker" to "coach."

The evolution:

Cyrille builds hacky prototype to prove concept
Olivier Penhoat (Arolla senior software engineer) implements production system
Olivier shifts to coach—showing the team hands-on, then letting them try
Team runs it independently
Learnings apply to other projects 🚀

Impact: Even AI-skeptical engineers got excited about these techniques for their own work.

Real value = Solving the problem + Building internal capability

Practical Takeaways

If you're considering AI for production:

✅ Do:

Start with hacky prototypes
Consider generated artifacts over live decisions
Use MCP-style patterns for large data structures
Plan for short feedback cycles
Build internal capability, not just solutions

❌ Don't:

Wait for perfect requirements
Assume "full AI" is always the answer
Fight context window limits—work around them
Plan everything upfront
Keep expertise external

The Messy Middle Is the Point

What I appreciated most was their honesty. Too many AI talks show polished end results and skip the dead ends.

But the dead ends are the story. That's where the learning happens.

They didn't have a perfect plan. They had a hypothesis, willingness to iterate, and courage to be surprised.

That's probably the most valuable lesson of all.

About the team:

Speakers:

Cyrille Martraire - CTO at Arolla, author of "Living Documentation"
Thomas Nansot - Director of Engineering managing ~150 engineers

Implementation:

Olivier Penhoat - Senior Software Engineer at Arolla

Acknowledgments:

Thanks to Cyrille Martraire for reviewing this article and providing corrections and clarifications about the project.

If you attended API Days Paris or have experience with AI in production, I'd love to hear your takeaways in the comments!

Curious about the infrastructure patterns that enable safe migration at this scale? I explored the API gateway, shadow testing, and rollback architecture in a companion article: Deep Dive: High-Level Architecture for Large-Scale API Migration.

Discussion Questions

Have you tried the "AI generates code" pattern in your projects? How did it compare to live AI?
What's your biggest challenge with LLM context windows?
How do you balance exploration vs. planning in AI projects?

Connect with me:

DEV Community