DEV Community

Cover image for I keep wondering: if AI already writes 22-46% of new code, what is MAI-Code-1 really training on?
carlosortet
carlosortet

Posted on

I keep wondering: if AI already writes 22-46% of new code, what is MAI-Code-1 really training on?

A budgeting note that bites everyone the first time: with any reasoning model you pay for the 'thinking tokens' even though they never show up in the response (they can run up to ~6x the cost of input tokens), and you eat 40 to 90 seconds of latency on tasks a classic LLM answers instantly.

Reserve reasoning for deliberate work (architecture reviews, migrations, incident post-mortems). Use a fast model for autocomplete. More thinking is not always better.

For price context while you wait for MAI numbers: OpenAI o3 sits at $10/$40 per 1M in/out tokens, Gemini 2.5 Pro at $1.25/$10 with a 200K thinking budget.

Reasoning model vs classic LLM, the 30-second version

If you've shipped with both you can skip this. If not: a classic LLM answers on instinct, predicting the next token in one shot. Fast, cheap, and it stumbles on multi-step problems. A reasoning model thinks first: it generates an internal chain (the thinking tokens), tries paths, checks itself, then answers. It learns this through reinforcement learning, rewarded when the chain lands on the right answer.

You know that friend who answers before you finish the question? Sometimes right, often not. Kahneman's "Thinking, Fast and Slow" nailed it years ago: System 1 is fast, System 2 is deliberate. An RLM is System 2, billed by the token.

Copilot was never a model

This is the part most people get wrong, and it changes how you evaluate this launch. Copilot is a layer, not a model. Different engines run underneath, selectable from a picker, same as Perplexity. Until now that engine was OpenAI's GPT.

We learned this the hard way doing brand-visibility audits for clients. Auditing "Copilot" was really auditing ChatGPT, because that was the model under the hood. Same engine, different shell. With MAI, Copilot gets reasoning and a voice of its own, so that equivalence breaks.

The honest part, from the lab

I'll be straight, because that's the only part of these posts worth reading.

Copilot "out of the box" frustrated us for two years. The Office integrations underdelivered. You'd open Copilot in Excel and it would tell you it couldn't touch the data, while a Claude with a Playwright plugin handled the whole thing without breaking a sweat. At some point our internal read was blunt: we'd be better off dropping Office entirely and letting the models generate the deck in HTML.

What we actually want is a system where the AI orchestrates the workflow and the human brings experience and judgment. We still don't have a clean market solution for that, the kind that does collaborative project knowledge plus full AI integration. So, like a lot of teams, we run a funky homemade stack: Obsidian + SharePoint + GitHub wired to Codex, agents and Claude Code. It works. It is not a long-term answer, honestly.

So a model trained on GitHub data and built for GitHub lands right on a pain we know. But it also raises a question I genuinely can't answer, and I'd love your take in the comments.

Here's the uncomfortable math. By 2025 GitHub said Copilot was generating up to 46% of code in files where it's enabled (61% for Java), at an acceptance rate around 27-30%. DX's Q4 2025 report put ~22% of merged code as AI-authored. Google has said ~25-30% of its new code is AI-assisted. Nobody measures the total cleanly (there's no reliable detector for AI-written code after the fact), but the flow of new code already sits somewhere in the 20-46% range, and climbing.

So when Microsoft says MAI-Code-1 is "trained on GitHub code", a meaningful slice of that code was almost certainly written by Copilot, Codex or Claude in the first place. Which sharpens the question: is "trained on GitHub" really that different from "trained on model outputs"? That's model collapse in practice. A CMU study and an analysis of 800+ popular GitHub projects already flag code quality degrading after AI adoption. If new models learn from increasingly AI-written code, they risk amplifying their own mistakes.

I don't have the number for MAI-Code-1 specifically (Microsoft hasn't disclosed the AI-generated share of its training set). But "clean, licensed data" and "free of AI contamination" are not the same claim. What's your read?

Why this matters beyond the benchmark

The strategy is the actual story. Microsoft put 13 billion dollars into OpenAI and up to 5 billion into Anthropic, and it sells both on Azure. On April 27, 2026 the OpenAI exclusivity fell (the trigger was OpenAI's up-to-50-billion deal with Amazon). So Microsoft is now investor, distributor, infra provider and direct competitor, all at once, and it would rather own the engine than rent it.

For us, the practical reads:

  • Adoption won't be decided by leaderboards. It'll be decided by identity, logging, data boundaries, retention, SLAs, predictable cost and IP indemnity. Microsoft is very good at selling that management plane.
  • Clean IP lineage is a real feature, not marketing. "No distillation" is a procurement checkbox for regulated teams (the DeepSeek "trained on whose outputs?" mess in 2025 made that concrete).
  • Lock-in gets subtler. A model tuned on your flows, wired to your identity, billed through Azure is powerful precisely because it's hard to move. Test it, meter it, demand portability.

The skeptic's asterisk

Stay level-headed. The benchmarks are Microsoft's own and no external lab has reproduced them. The comparison is selective (beats Sonnet 4.6 broadly, ties Opus 4.6 only on one code metric). Public figures wobble, including that 256K vs 128K context. New, real, in-house models? Yes. A capability revolution? Probably not. Neither do I care too much. The value is independence, cost and control.

That, and one quieter outcome: Microsoft's AI future starts to look a lot less borrowed and far more promising. I know, we are always afraid of Microsoft having too much control over our workflow, but if we are not concerned about Google taking control of the web or TikTok taking control of our kids' minds, is this so worrying? After we accepted Microsoft taking ownership of GitHub, we knew sth like this was coming.


Written by Carlos Ortet for 498A, the AI R&D division behind Zoopa. Originally published, with glossary and FAQ, on the Zoopa blog.

Sources: Introducing MAI-Thinking-1 (Microsoft AI) · Introducing MAI-Code-1-Flash (Microsoft AI) · MAI-Code-1-Flash on the GitHub Changelog

On AI-written code share: GitHub Copilot statistics · "AI is writing 46% of all code" · Model collapse explained (IT Pro) · CMU: AI is still making code worse
www.carlosortet.com

Top comments (0)