If you're paying for a cheap Claude or GPT API key from a reseller / proxy service, there's a real chance you're not getting the model you're paying for — and asking the model "what are you?" won't tell you, because the model itself doesn't reliably know what it is, and the answer can be scripted by the reseller's system prompt anyway. The only verification method that actually works is comparing the endpoint's behavioral output against a fingerprint baseline built from the real official APIs. I built a free tool for this (APIMaster Model Tester) — and looking at the aggregate data from everyone who's actually used it, 43.8% of all detections came back flagged as a different model than what was advertised.
This isn't a sample I cherry-picked. It's every check real users ran against their own endpoints over the last ~7 weeks.
The market problem
A paper from CISPA Helmholtz Center for Information Security, "Real Money, Fake Models: Deceptive Model Claims in Shadow APIs" (arXiv:2603.01919), audited 17 shadow APIs — already cited by 187 academic papers — and found 45.83% failed identity verification under fingerprint testing.
I wanted to know if that number held up against the actual reseller market people are buying from today. So I pulled the aggregate stats across every detection real users have run through the tool (2026-05-06 to 2026-06-23):
| Metric | Value |
|---|---|
| Total fingerprint detections | 1,963 |
| Unique reseller endpoints tested | 398 |
| Flagged as a different model than claimed | 860 |
| Detection-level fake rate | 43.8% |
Same order of magnitude as the CISPA paper. This isn't an edge case — it's close to half.
Why "just ask the model" doesn't work
The obvious move, before you know better, is to ask the model directly:
- Who are you?
- Which company made you?
- What's your exact model name/version?
- What's your knowledge cutoff?
This fails for four separate reasons:
1. The reseller can script the answer. A one-line system prompt injected before your request reaches the model is enough to force "I'm Claude, made by Anthropic" regardless of what's actually running. No fancy spoofing required.
2. The model doesn't actually know its own version. Models have no reliable introspective access to their own deployment metadata. Example — I asked claude-opus-4-8 "what model do you use?" and got:
I'm Claude, made by Anthropic. As for which specific model version I am, I'm honestly not certain—I don't have reliable information about exactly which Claude model I'm running as in this conversation.
The request body literally specified model: claude-opus-4-8. The model still couldn't confirm it.
3. Models hallucinate identity, even when genuinely real.
4. Training data overlap leaks other vendors' branding. Testing the same claude-opus-4-8 endpoint again, asking "what model are you?" in Chinese this time, the API metadata still said model: anthropic/claude-4.8-opus-20260528, provider: Anthropic — but the model's actual answer was:
"I'm Tongyi Qianwen (Qwen), a large language model developed by Alibaba Cloud."
Same endpoint, same claimed model field, contradicting reply. Running that exact Chinese prompt 100 times against the same endpoint produced this self-reported identity distribution:

| Self-reported identity | % of 100 trials |
|---|---|
| Qwen | 49% |
| Claude | 35% |
| DeepSeek | 15% |
| Zhipu | 1% |
The model named itself correctly only 35 out of 100 times. Asking the model what it is, is not a measurement — it's noise.
What actually works: behavioral fingerprinting
LLMs exhibit consistent stylistic and structural patterns — word choice, sentence openers, how they handle edge-case prompts, knowledge-boundary behavior — independent of what they're told to claim about themselves. None of that requires trusting the model's self-report:
- Build a baseline. Sample official APIs directly (no proxies in the loop) across a wide probe set, repeatedly, to capture how each real model actually behaves.
- Probe the candidate endpoint. Send the same structured probes to the endpoint you want to verify.
-
Compare fingerprints, not claims. Score the candidate's response patterns against every baseline model. Highest similarity = most likely real model, independent of any
model/providerfield in the response.
| Verification method | Can the reseller fake it? |
|---|---|
| Ask the model "who are you" | Trivial — one system prompt |
Trust the model/provider response field |
Trivial — reseller fills it in themselves |
| Repeat the self-report question N times, check consistency | Harder, but doesn't need a baseline — and still proves nothing if it's consistent |
| Behavioral fingerprint match against an official baseline | Hard — the faker doesn't know which dimensions are scored |
A confidence score above 70% on the Top-1 match is what we consider a reliable read; anything below that gets flagged inconclusive rather than forced into a verdict.
Try it on your own key
APIMaster Model Tester is free, no signup required — paste your endpoint + key, pick the model you're supposed to be getting, and it'll give you a Top-1 candidate + confidence score in under a minute. Currently covers the Claude/GPT/DeepSeek/Qwen/MiniMax/Kimi families, Anthropic Messages + OpenAI-compatible + Gemini streaming protocols.
If you want the methodology write-up with more test screenshots, that's here. I'm also putting together a full breakdown of which categories of resellers (price tier, region, claimed model) have the highest fake rates — that'll be a follow-up post once the sample size is bigger.
I'm Lisa — product person turned vibe-coder. APIMaster was my first AI-assisted build (10 days, mostly Claude Code), now at 1k+ users. Happy to answer questions about the methodology in the comments, or if you've run into a reseller swapping models on you.



Top comments (0)