<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Luo Lisa</title>
    <description>The latest articles on DEV Community by Luo Lisa (@luo_lisa_7ddaec4447d774e5).</description>
    <link>https://dev.to/luo_lisa_7ddaec4447d774e5</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3998312%2Fadc0cc3b-4212-4a81-a0c2-0f9eeeba96b7.png</url>
      <title>DEV Community: Luo Lisa</title>
      <link>https://dev.to/luo_lisa_7ddaec4447d774e5</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/luo_lisa_7ddaec4447d774e5"/>
    <language>en</language>
    <item>
      <title>Our Users Ran 1,963 Real-World "Is This Model Real?" Checks. 43.8% Came Back Fake.</title>
      <dc:creator>Luo Lisa</dc:creator>
      <pubDate>Tue, 23 Jun 2026 08:35:35 +0000</pubDate>
      <link>https://dev.to/luo_lisa_7ddaec4447d774e5/our-users-ran-1963-real-world-is-this-model-real-checks-438-came-back-fake-3af</link>
      <guid>https://dev.to/luo_lisa_7ddaec4447d774e5/our-users-ran-1963-real-world-is-this-model-real-checks-438-came-back-fake-3af</guid>
      <description>&lt;p&gt;If you're paying for a cheap Claude or GPT API key from a reseller / proxy service, there's a real chance you're not getting the model you're paying for — and asking the model "what are you?" won't tell you, because &lt;strong&gt;the model itself doesn't reliably know what it is&lt;/strong&gt;, and the answer can be scripted by the reseller's system prompt anyway. The only verification method that actually works is comparing the endpoint's behavioral output against a fingerprint baseline built from the real official APIs. I built a free tool for this (&lt;a href="https://apimaster.ai/ai-api-model-tester?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=verify-api" rel="noopener noreferrer"&gt;APIMaster Model Tester&lt;/a&gt;) — and looking at the aggregate data from everyone who's actually used it, &lt;strong&gt;43.8% of all detections came back flagged as a different model than what was advertised.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This isn't a sample I cherry-picked. It's every check real users ran against their own endpoints over the last ~7 weeks.&lt;/p&gt;

&lt;h2&gt;
  
  
  The market problem
&lt;/h2&gt;

&lt;p&gt;A paper from CISPA Helmholtz Center for Information Security, &lt;a href="https://arxiv.org/abs/2603.01919" rel="noopener noreferrer"&gt;"Real Money, Fake Models: Deceptive Model Claims in Shadow APIs"&lt;/a&gt; (arXiv:2603.01919), audited 17 shadow APIs — already cited by 187 academic papers — and found &lt;strong&gt;45.83%&lt;/strong&gt; failed identity verification under fingerprint testing.&lt;/p&gt;

&lt;p&gt;I wanted to know if that number held up against the actual reseller market people are buying from today. So I pulled the aggregate stats across every detection real users have run through the tool (2026-05-06 to 2026-06-23):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Total fingerprint detections&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1,963&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unique reseller endpoints tested&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;398&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flagged as a different model than claimed&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;860&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Detection-level fake rate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;43.8%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Same order of magnitude as the CISPA paper. This isn't an edge case — it's close to half.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F14bkt0022zdx6n4gfk13.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F14bkt0022zdx6n4gfk13.png" alt="APIMaster detection result showing Suspicious flag with 77% confidence" width="800" height="421"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why "just ask the model" doesn't work
&lt;/h2&gt;

&lt;p&gt;The obvious move, before you know better, is to ask the model directly:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Who are you?&lt;/li&gt;
&lt;li&gt;Which company made you?&lt;/li&gt;
&lt;li&gt;What's your exact model name/version?&lt;/li&gt;
&lt;li&gt;What's your knowledge cutoff?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This fails for four separate reasons:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. The reseller can script the answer.&lt;/strong&gt; A one-line system prompt injected before your request reaches the model is enough to force "I'm Claude, made by Anthropic" regardless of what's actually running. No fancy spoofing required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The model doesn't actually know its own version.&lt;/strong&gt; Models have no reliable introspective access to their own deployment metadata. Example — I asked &lt;code&gt;claude-opus-4-8&lt;/code&gt; "what model do you use?" and got:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I'm Claude, made by Anthropic. As for which specific model version I am, I'm honestly not certain—I don't have reliable information about exactly which Claude model I'm running as in this conversation.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The request body literally specified &lt;code&gt;model: claude-opus-4-8&lt;/code&gt;. The model still couldn't confirm it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fjusvdqumg33hl5o2v281.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fjusvdqumg33hl5o2v281.png" alt="Claude model answers " width="800" height="644"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Models hallucinate identity, even when genuinely real.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Training data overlap leaks other vendors' branding.&lt;/strong&gt; Testing the same &lt;code&gt;claude-opus-4-8&lt;/code&gt; endpoint again, asking "what model are you?" in Chinese this time, the API metadata still said &lt;code&gt;model: anthropic/claude-4.8-opus-20260528&lt;/code&gt;, &lt;code&gt;provider: Anthropic&lt;/code&gt; — but the model's actual answer was:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I'm Tongyi Qianwen (Qwen), a large language model developed by Alibaba Cloud."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fm7w0krfxvmufy64ghruf.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fm7w0krfxvmufy64ghruf.jpeg" alt="Same endpoint with model field still showing Claude, but the model claims to be Qwen" width="800" height="661"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Same endpoint, same claimed model field, contradicting reply. Running that exact Chinese prompt 100 times against the same endpoint produced this self-reported identity distribution:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fjomgpsxp9evrlcfof091.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fjomgpsxp9evrlcfof091.jpg" alt="Distribution of self-reported identity over 100 repeated trials of the same prompt" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Self-reported identity&lt;/th&gt;
&lt;th&gt;% of 100 trials&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Qwen&lt;/td&gt;
&lt;td&gt;49%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude&lt;/td&gt;
&lt;td&gt;35%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek&lt;/td&gt;
&lt;td&gt;15%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Zhipu&lt;/td&gt;
&lt;td&gt;1%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The model named itself correctly only 35 out of 100 times. &lt;strong&gt;Asking the model what it is, is not a measurement — it's noise.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually works: behavioral fingerprinting
&lt;/h2&gt;

&lt;p&gt;LLMs exhibit consistent stylistic and structural patterns — word choice, sentence openers, how they handle edge-case prompts, knowledge-boundary behavior — independent of what they're told to claim about themselves. None of that requires trusting the model's self-report:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Build a baseline.&lt;/strong&gt; Sample official APIs directly (no proxies in the loop) across a wide probe set, repeatedly, to capture how each real model actually behaves.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Probe the candidate endpoint.&lt;/strong&gt; Send the same structured probes to the endpoint you want to verify.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compare fingerprints, not claims.&lt;/strong&gt; Score the candidate's response patterns against every baseline model. Highest similarity = most likely real model, independent of any &lt;code&gt;model&lt;/code&gt;/&lt;code&gt;provider&lt;/code&gt; field in the response.&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Verification method&lt;/th&gt;
&lt;th&gt;Can the reseller fake it?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Ask the model "who are you"&lt;/td&gt;
&lt;td&gt;Trivial — one system prompt&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Trust the &lt;code&gt;model&lt;/code&gt;/&lt;code&gt;provider&lt;/code&gt; response field&lt;/td&gt;
&lt;td&gt;Trivial — reseller fills it in themselves&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Repeat the self-report question N times, check consistency&lt;/td&gt;
&lt;td&gt;Harder, but doesn't need a baseline — and still proves nothing if it's consistent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Behavioral fingerprint match against an official baseline&lt;/td&gt;
&lt;td&gt;Hard — the faker doesn't know which dimensions are scored&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A confidence score above 70% on the Top-1 match is what we consider a reliable read; anything below that gets flagged inconclusive rather than forced into a verdict.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it on your own key
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://apimaster.ai/ai-api-model-tester?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=verify-api" rel="noopener noreferrer"&gt;APIMaster Model Tester&lt;/a&gt; is free, no signup required — paste your endpoint + key, pick the model you're supposed to be getting, and it'll give you a Top-1 candidate + confidence score in under a minute. Currently covers the Claude/GPT/DeepSeek/Qwen/MiniMax/Kimi families, Anthropic Messages + OpenAI-compatible + Gemini streaming protocols.&lt;/p&gt;

&lt;p&gt;If you want the methodology write-up with more test screenshots, that's &lt;a href="https://apimaster.ai/blog/how-to-verify-claude-openai-api-real" rel="noopener noreferrer"&gt;here&lt;/a&gt;. I'm also putting together a full breakdown of which categories of resellers (price tier, region, claimed model) have the highest fake rates — that'll be a follow-up post once the sample size is bigger.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm Lisa — product person turned vibe-coder. APIMaster was my first AI-assisted build (10 days, mostly Claude Code), now at 1k+ users. Happy to answer questions about the methodology in the comments, or if you've run into a reseller swapping models on you.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>openai</category>
      <category>api</category>
    </item>
  </channel>
</rss>
