DEV Community

Abhishek Sharma
Abhishek Sharma

Posted on

Your prompts have a vendor lock-in problem and it's hiding in plain text

I've been writing prompts for Claude for a while now. XML tags everywhere. <instructions>, <thinking>, nested structure. It works great.

Then I tried using the same prompts with GPT-4. They didn't just underperform — they fell apart. Restructured everything to Markdown, added bold headers, swapped the XML nesting for flat sections. Worked great on GPT-4. Tried that on Claude. Degraded.

I figured this was just me being bad at prompting. It's not.

Sclar et al. at ICLR 2024 measured this systematically. Removing colons from a prompt template — literally just the colon characters — swings LLaMA-2-13B from 82.6% to 4.3%. That's a 78 percentage point spread from punctuation. He et al. found the best format for one model family overlaps less than 20% with the best format for another. This isn't a quirk. It's a pattern.

The industry has a bunch of names for it. "Prompt sensitivity." "Prompt brittleness." "Model drifting." All describing symptoms. None connecting it to the thing software engineers have been dealing with since 1974: coupling.

The short version: your prompt is coupled to your model the same way a function can be coupled to another module's private internals. Works fine until you swap the dependency. Then everything breaks and you don't know why because the dependency was invisible — it's just formatting.

I went and looked at how real tools handle this today. It's ugly.

Aider has 313 model-specific configurations. Not kidding — 2,718-line YAML file. The fun part: most models get the system prompt "You NEVER leave comments describing code without implementing it!" while Claude-3.7-sonnet gets the exact opposite — "Do what they ask, but no more." Same tool, same goal, contradictory instructions because each model's expectations are different.

Claude Code only works with Claude by default. People have built proxy layers, Node.js monkey-patches, and Ollama compatibility shims to get it working with other models. Some LLM vendors have started supporting Anthropic's API schema just to capture Claude Code's user base. That's a market responding to coupling.

Cursor straight up tells you to "switch to a different model and try again" when prompts underperform. That's the official guidance.

So I checked if anyone builds tooling for this. Went through DSPy, Guidance, Outlines, LMQL, PromptLayer, Braintrust, Humanloop, Maxim AI, MLflow, Prompty, Promptomatix. Eleven tools.

DSPy optimizes what your prompt says. Guidance constrains what the model outputs. PromptLayer versions your prompts. All useful. None of them touch how the prompt is structurally formatted for a specific model. That's just... not a category anyone's building in.

So I built one. promptc is a transparent HTTP proxy. It sits between your app and the model API, rewrites prompt structure based on the target model's preferences. XML to Markdown, section reordering, delimiter swaps. Optional second pass using a local Ollama model for deeper stuff like converting "Let's think step by step" to <thinking> blocks.

Zero code changes to your setup. Set ANTHROPIC_BASE_URL=http://localhost:4000 and go.

I also wrote a paper formalizing the whole thing — borrowed the coupling taxonomy from Larry Constantine's 1974 structured design work (the coupling/cohesion stuff from every CS degree). Content coupling, common coupling, data coupling — they all map. The paper has the full analysis, the tool survey, and case studies on Aider/Claude Code/Cursor.

Is this the most important problem in AI? No. But if you're running multi-model deployments — and with pricing ranging 200x from Gemini Flash-Lite at $0.075/M to Claude Opus 4 at $15/M, most teams are — then every prompt you've tuned for one model is a debt you'll pay when you switch. And right now, nobody's building the tooling to handle that debt.

Paper: https://github.com/shakecodeslikecray/promptc/blob/master/paper/sharma2026_prompt_coupling.pdf
Code: github.com/shakecodeslikecray/promptc

First paper. Feedback welcome. Especially the "you're wrong because..." kind.

Top comments (0)