I Stopped Letting AI Copy My Code. Here's What I Built Instead.

Every time I asked an LLM to refactor a large file, the same thing happened. It would rewrite the code, hallucinate imports, rename variables it shouldn't have touched, and silently drop utility functions from the bottom of the file.

The output always looked right at first glance. But then you'd run it and get runtime errors in places you never asked it to change. That's the worst kind of bug — the kind that passes code review because the AI's version reads better than the original.

So I stopped letting AI copy my code. I built Refactory instead.

The idea is simple: let the AI do what it's actually good at, and keep it away from what it's bad at.

AI is great at reading a 5,000-line monolith and telling you where the module boundaries should be. It can see the dependency graph, identify cohesive clusters of functions, and suggest a clean decomposition. That's genuine understanding.

But ask it to actually move the code into those modules? That's where it falls apart. It hallucinates import paths. It renames things for clarity. It truncates functions that don't fit in the context window. It invents syntax that looks plausible but doesn't parse.

So Refactory splits the job. The LLM (Claude, GPT, Gemini, whatever you like) analyzes your code and produces a decomposition plan. Then a deterministic extraction engine handles the straightforward 80% — the routine moves, the obvious extractions, the stuff that's a waste of the AI's time and tokens. The LLM still handles the genuinely tricky parts where judgment matters. But the bulk of the code movement is mechanical, character-by-character from the original source. No hallucinations on the boring stuff.

Minimize tokens, maximize syntax validity. Don't burn AI on work a preprocessor can do deterministically.

I've run it against 15 monoliths totaling about 32,000 lines and they all pass mechanical extraction. The full pipeline scores 0.89 on a 5,200-line flagship test. The mechanical extraction is free — you only spend tokens on planning and the edge cases that actually need intelligence.

The JS and Python extractors are free and open source under AGPL. The engine itself is deterministic, so there's no moat in copying code correctly. The value is in the language-specific parsers and the planning prompts. More languages are coming as commercial packs.

If you want to try it: github.com/codedrop-codes/refactory

I'd genuinely love to hear from anyone dealing with large legacy codebases. What's your worst AI refactoring story?