<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Thomas </title>
    <description>The latest articles on DEV Community by Thomas  (@thoams_aidetection).</description>
    <link>https://dev.to/thoams_aidetection</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3901081%2F6bd58b9a-a62f-4897-a001-3a6ff6c7bc8d.png</url>
      <title>DEV Community: Thomas </title>
      <link>https://dev.to/thoams_aidetection</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/thoams_aidetection"/>
    <language>en</language>
    <item>
      <title>Realtime deepfake software is a SaaS product now</title>
      <dc:creator>Thomas </dc:creator>
      <pubDate>Thu, 07 May 2026 22:21:00 +0000</pubDate>
      <link>https://dev.to/thoams_aidetection/realtime-deepfake-software-is-a-saas-product-now-13no</link>
      <guid>https://dev.to/thoams_aidetection/realtime-deepfake-software-is-a-saas-product-now-13no</guid>
      <description>&lt;p&gt;I've been half-following the deepfake-in-the-wild beat for a while. Most of it has been static image stuff fake profile photos, AI generated headshots on LinkedIn, that kind of thing. I run suspicious images through &lt;a href="https://www.aiornot.com" rel="noopener noreferrer"&gt;AI or Not&lt;/a&gt; when something looks off, flag it, move on.&lt;/p&gt;

&lt;p&gt;But the &lt;a href="https://www.404media.co/hello-boss-inside-the-chinese-realtime-deepfake-software-powering-scams-around-the-world/" rel="noopener noreferrer"&gt;404 Media investigation into "HELLO BOSS" software&lt;/a&gt; shifted my sense of where the floor actually is. This isn't someone uploading a faked image. This is a live video call where the person on screen is not the person on screen.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the pipeline actually looks like
&lt;/h2&gt;

&lt;p&gt;The software they describe isn't magic it's a real-time face swap layer that sits between the camera input and whatever video call software the scammer is using. The rough architecture:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Scammer's webcam]
        ↓
[Face detection + landmark extraction]
        ↓
[Target face model (pre-trained on victim's photos)]
        ↓
[Rendered output frame]
        ↓
[Virtual camera driver — OBS, v4l2loopback, etc.]
        ↓
[Zoom / WhatsApp / Teams / any WebRTC app]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The virtual camera driver is the key piece most people miss. Tools like OBS Virtual Camera on Windows/Mac or &lt;code&gt;v4l2loopback&lt;/code&gt; on Linux let you present any video source as a system webcam. The calling app has no idea it's not getting real hardware input.&lt;/p&gt;

&lt;p&gt;The face-swap model itself runs inference on every frame typically 24–30 fps which used to require a beefy GPU. Consumer grade hardware can handle it now, and cloud GPU instances are cheap enough that you can rent the compute if you don't own it.&lt;/p&gt;

&lt;h2&gt;
  
  
  "Hello boss" isn't a technical term it's a script
&lt;/h2&gt;

&lt;p&gt;The name comes from one of the primary use cases: impersonating a company executive on a video call to authorize a wire transfer. Subordinate gets a call from what looks like their CEO on screen, hears a voice that's been cloned or at least pitch-shifted, and gets told to move money somewhere.&lt;/p&gt;

&lt;p&gt;The "hello boss" phrasing is the greeting on the other end the scammer picking up a call from someone who thinks they're reaching their boss, not the scammer impersonating the boss outbound. Either way, the social engineering depends entirely on the live video being convincing enough to short-circuit skepticism.&lt;/p&gt;

&lt;p&gt;This same stack powers romance scams and "pig butchering" investment fraud, where the point is sustained trust over weeks, not a one time wire transfer. A static fake photo stops working the moment someone asks for a video call. A real time face swap keeps the fiction going.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this breaks video KYC assumptions
&lt;/h2&gt;

&lt;p&gt;A lot of identity verification flows have converged on "liveness check + face match" as the standard:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. User holds up ID document → OCR extracts name, DOB, document number
2. User records short selfie video → liveness detection (blink, turn head)
3. Face on selfie matches face on document → pass
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pipeline assumes the face in the selfie video is the person's real face. Real time face swap defeats step 3 entirely if the attacker pre trains their model on photos of the person whose identity they're stealing. It also defeats liveness checks the swap handles arbitrary head movements and expressions in real time, so asking someone to blink or smile doesn't help.&lt;/p&gt;

&lt;p&gt;Some vendors are adding texture analysis, illumination consistency checks, and temporal coherence scoring to catch the artifacts that face swap models still produce at frame boundaries and occlusion edges. But that arms race is already underway and the defenders are not obviously winning.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd actually do differently if I were building this today
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Don't trust video alone.&lt;/strong&gt; Pair any video verification step with a second channel SMS OTP, authenticator app, document scan from a different session so compromising the video doesn't compromise the whole flow.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Log the raw video stream for async review.&lt;/strong&gt; Real-time detectors aren't reliable enough to be gatekeepers. Use them as signals, not hard blocks, and let a human or a more thorough model review borderline cases after the fact.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Add device fingerprinting.&lt;/strong&gt; Face swap pipelines route through a virtual camera driver. The camera device name exposed by browser WebRTC APIs (&lt;code&gt;MediaDeviceInfo.label&lt;/code&gt;) will often be "OBS Virtual Camera" or similar. That's not a perfect signal, but it's a cheap one worth logging.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Test your liveness checks against an actual face swap.&lt;/strong&gt; There are open source models you can run locally. If your liveness check passes, you need to know now, not when a fraud team calls you.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Assume the video is synthetic and design accordingly.&lt;/strong&gt; Treat video verification as a corroborating signal, not a root-of-trust.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Checking clips yourself
&lt;/h2&gt;

&lt;p&gt;When I see video circulating that seems suspicious a celebrity endorsing something, an executive making a statement AI or Not is what I pull up first. It handles video files, not just images, so you can actually run the clip rather than screenshotting a frame and hoping the compression didn't wash out the artifacts. It's not a forensic lab, but it's fast and the confidence scores are useful for triage.&lt;/p&gt;

&lt;p&gt;The problem the 404 Media story describes is harder to catch after the fact because there usually isn't a recording it happened on a live call. But for anything that &lt;em&gt;did&lt;/em&gt; get recorded, that kind of tooling matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  The SaaS part is the part that scales
&lt;/h2&gt;

&lt;p&gt;What makes this story different from "deepfakes exist, film at 11" is the distribution model. The software described in the 404 Media piece is sold through Telegram channels at subscription prices. That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Low barrier to entry. You don't need to train a model or write code.&lt;/li&gt;
&lt;li&gt;Sellers have support channels, update cadences, and refund policies.&lt;/li&gt;
&lt;li&gt;Supply scales with demand, not with technical skill.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the same trajectory malware took. Ransomware-as-a-service normalized the idea that you could be a criminal without being a programmer. Deepfake-as-a-service is doing the same thing for identity fraud.&lt;/p&gt;

&lt;p&gt;The underlying models will keep improving. The virtual camera trick is already table stakes. At some point, real-time voice cloning (already fairly mature) and real-time video swap running together on consumer hardware will be seamless enough that "get on a video call" is no longer a meaningful trust signal.&lt;/p&gt;

&lt;p&gt;That's not a prediction it's a present tense engineering problem.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>machinelearning</category>
      <category>testing</category>
    </item>
    <item>
      <title>Deepfakes are coming for your KYC flow</title>
      <dc:creator>Thomas </dc:creator>
      <pubDate>Tue, 05 May 2026 17:47:06 +0000</pubDate>
      <link>https://dev.to/thoams_aidetection/deepfakes-are-coming-for-your-kyc-flow-36e0</link>
      <guid>https://dev.to/thoams_aidetection/deepfakes-are-coming-for-your-kyc-flow-36e0</guid>
      <description>&lt;h1&gt;
  
  
  Deepfakes are coming for your KYC flow
&lt;/h1&gt;

&lt;p&gt;I've spent a lot of time lately running suspicious images through &lt;a href="https://www.aiornot.com" rel="noopener noreferrer"&gt;AI or Not&lt;/a&gt;profile photos, government ID scans, the kind of stuff that shows up in fraud reports. What I've noticed over the past few months is that the gap between "obviously fake" and "passes a casual check" has basically closed.&lt;/p&gt;

&lt;p&gt;The Atlantic piece that dropped last week made that concrete in a way I hadn't seen written up this clearly before: &lt;a href="https://www.theatlantic.com/technology/2026/05/chatgpt-images-deepfakes-fraud/687023/" rel="noopener noreferrer"&gt;ChatGPT's image generator is being used to produce fake identity documents&lt;/a&gt;. Not clumsy Photoshop jobs. Documents that are clearing automated KYC pipelines.&lt;/p&gt;

&lt;p&gt;If you've shipped user onboarding with identity verification and a huge slice of devs reading this have this is directly your problem.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; Generative AI just made document fraud cheap and scalable. Automated KYC flows that rely on image pattern-matching alone are increasingly inadequate. Here's what the attack surface looks like and what engineers should be thinking about.&lt;/p&gt;




&lt;h2&gt;
  
  
  What actually changed
&lt;/h2&gt;

&lt;p&gt;For years, fake ID fraud was rate limited by skill. You needed someone who knew Photoshop, knew the security features on a specific state's license, and could produce something that wouldn't get flagged by pixel-level analysis.&lt;/p&gt;

&lt;p&gt;GPT-4o's image model changed the cost curve. The model produces realistic textures, plausible holograms, and correctly formatted fields. You don't need skills anymore you need a prompt and maybe a few iterations.&lt;/p&gt;

&lt;p&gt;The fraud pattern that's emerging looks roughly like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Generate a base ID image via text-to-image model
2. Composite a deepfake face (separate tool, increasingly free)
3. Submit via mobile camera capture which adds noise that masks artifacts
4. Liveness check: spoof with a face-swap running on a phone screen
5. Document check: pass, because the image quality clears the automated threshold
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Step 4 is what gets me. Liveness detection was supposed to be the backstop. "Is there a real human face in front of the camera right now?" The answer used to be reliably yes or no. Now you can run a real-time face swap locally on a consumer GPU, hold it up to the camera, and a surprising number of liveness checks will pass it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The KYC stack most of us are actually using
&lt;/h2&gt;

&lt;p&gt;If you've wired up identity verification in the last few years, your stack probably looks something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User uploads ID
  → Document classification (front/back, ID type)
  → OCR extraction (name, DOB, ID number)
  → Template match / security feature check
  → Face match (ID photo vs. selfie)
  → Liveness check (passive or active)
  → Risk score returned → approve / flag / deny
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every one of those steps except maybe OCR extraction is now under active attack. Template matching is fooled by generated images that have the right visual structure. Face matching is fooled by deepfakes. Liveness is fooled by face-swap tools.&lt;/p&gt;

&lt;p&gt;The vendors selling these components know this. But "we're working on it" and "our model is continuously updated" doesn't help you when someone is opening accounts in your app today.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd actually change in my stack
&lt;/h2&gt;

&lt;p&gt;This is the numbered section I'd want to find if I were googling this problem at midnight.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Stop treating document image quality as a proxy for authenticity.&lt;/strong&gt; A generated image can be sharper and more consistent than a real scan. Quality score going up is now mildly suspicious, not reassuring.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Add metadata checks on the upload itself.&lt;/strong&gt; Real phone camera images have EXIF data, sensor noise profiles, compression artifacts from the camera pipeline. A generated image saved as JPEG and uploaded will have different statistical fingerprints. This isn't foolproof, but it raises the cost.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cross-reference extracted data, don't just validate format.&lt;/strong&gt; An ID number that passes a checksum algorithm but doesn't correspond to a real issued document (where that data is available) is a signal. So is a name/DOB combo that doesn't appear anywhere in any public record.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Add friction that's hard to automate at scale.&lt;/strong&gt; Passive liveness is dead for high-stakes flows. Active liveness (follow this dot, turn your head) is still meaningfully harder to spoof at scale, even if it's not impossible.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Build a manual review queue and actually use it.&lt;/strong&gt; Automated systems are being beaten. A human looking at a document for 30 seconds catches things the model misses. For any account type that accesses real money, that review step is cheap compared to fraud losses.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Log everything for post-hoc analysis.&lt;/strong&gt; When a fraud pattern emerges, you want to be able to go back through your approved applications and find accounts that match. You can't do that if you didn't retain the raw evidence.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The platform problem nobody wants to talk about
&lt;/h2&gt;

&lt;p&gt;OpenAI added safeguards to prevent ChatGPT from generating IDs on request. The Atlantic piece goes into how those safeguards are being bypassed specific prompt patterns, regional model variants, fine tuned open-source alternatives.&lt;/p&gt;

&lt;p&gt;This is the part that genuinely annoys me: the safety mitigations are playing whack-a-mole with prompts, while the underlying capability — a model that can produce photorealistic documents is out in the world and getting more accessible, not less.&lt;/p&gt;

&lt;p&gt;I ran a few of the example images from the fraud reports I've seen through AI or Not and the detection confidence on generated IDs is meaningfully higher than on a real scan, which is useful — but that only helps if you're running detection in your pipeline at all. Most KYC vendors aren't explicitly checking "is this image AI-generated" as a step. They're checking "does this look like a valid document." Those are different questions now.&lt;/p&gt;

&lt;p&gt;For anyone building tooling in this space, AI or Not supports image and video detection via API, which means you could add a synthetic-media check as one layer in a multi-signal stack. Not a silver bullet, but it's a signal that wasn't in most fraud stacks two years ago.&lt;/p&gt;

&lt;h2&gt;
  
  
  The uncomfortable conclusion
&lt;/h2&gt;

&lt;p&gt;The fraud economics have flipped. For most of the history of KYC, the cost to create a convincing fake was higher than the expected value of the fraud, except for sophisticated criminal operations. That's no longer true for a wide range of account types.&lt;/p&gt;

&lt;p&gt;If you've built identity verification and you haven't audited it against generated documents and real time face swap tools in the last six months, you're probably running on assumptions that are out of date.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>webdev</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>The liar's dividend has a second payout, and devs helped build it</title>
      <dc:creator>Thomas </dc:creator>
      <pubDate>Thu, 30 Apr 2026 20:25:36 +0000</pubDate>
      <link>https://dev.to/thoams_aidetection/the-liars-dividend-has-a-second-payout-and-devs-helped-build-it-22ji</link>
      <guid>https://dev.to/thoams_aidetection/the-liars-dividend-has-a-second-payout-and-devs-helped-build-it-22ji</guid>
      <description>&lt;h1&gt;
  
  
  The liar's dividend has a second payout, and devs helped build it
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; The "liar's dividend" isn't just about faking things. It's about claiming real things are fake. Detection infrastructure the very thing we built to fight deepfakes is now being used as cover. This is a systems design problem as much as a machine learning one.&lt;/p&gt;




&lt;p&gt;I've been sitting with a Forbes piece on digital forensics and deepfakes for a few days, and the part that stuck wasn't the forensics. It was a phrase: "the liar's dividend's second payout."&lt;/p&gt;

&lt;p&gt;The first payout, if you haven't heard the term, comes from &lt;a href="https://scholarship.law.bu.edu/faculty_scholarship/640/" rel="noopener noreferrer"&gt;Chesney and Citron's 2019 paper&lt;/a&gt; on deepfakes and democracy. The idea is simple and brutal: once people know synthetic media exists, a bad actor can claim &lt;em&gt;any&lt;/em&gt; real, damaging media is fake. You don't need to make a convincing deepfake. You just need enough public doubt to muddy the water.&lt;/p&gt;

&lt;p&gt;The second payout is what we built next. And I mean "we" literally — developers, ML engineers, product teams. We built detection tools. Classification APIs. Real-time flagging pipelines. And in doing so, we handed the liars a new prop.&lt;/p&gt;




&lt;h2&gt;
  
  
  How the escape hatch works
&lt;/h2&gt;

&lt;p&gt;Consider the logic a bad actor now has available:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;IF  (incriminating_media EXISTS)
AND (public_awareness_of_deepfakes == HIGH)
AND (detection_tools PRODUCE != 100% certainty)
THEN
   claim "this is AI-generated"
   point to ambiguous classifier output as "proof"
   wait for news cycle to move on
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This isn't hypothetical. In 2023, a &lt;a href="https://apnews.com/article/slovakia-election-deepfake-audio-bbc-2023" rel="noopener noreferrer"&gt;Slovakian election audio clip&lt;/a&gt; of a candidate allegedly discussing election fraud circulated two days before polls opened. The candidate's party called it AI-generated. Analysts were split. The election happened before anyone reached consensus.&lt;/p&gt;

&lt;p&gt;That's the second payout: the detection ecosystem itself becomes the alibi. A shrug from a classifier is now a press release.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I actually see when I run stuff through detection
&lt;/h2&gt;

&lt;p&gt;I use &lt;a href="https://www.aiornot.com" rel="noopener noreferrer"&gt;AI or Not&lt;/a&gt; when something looks off to me — it handles images, video, and audio, which covers most of what circulates on social platforms. The output is a confidence score, not a verdict. That matters.&lt;/p&gt;

&lt;p&gt;A 73% "likely AI" rating on a clip is meaningful signal. It is not a court finding. The problem is that a 73% rating is also something a bad actor can screenshot and frame as "even the detectors aren't sure."&lt;/p&gt;

&lt;p&gt;This isn't a flaw in AI or Not specifically. It's a fundamental property of probabilistic classification. Every detection system that produces a confidence score below 100% will have that score weaponized by someone. We built the weapon while trying to build the shield.&lt;/p&gt;




&lt;h2&gt;
  
  
  The four things I'd do differently (as a builder)
&lt;/h2&gt;

&lt;p&gt;If I were shipping something in this space today, here's where I'd change my assumptions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Design for legal weight, not just accuracy.&lt;/strong&gt; A 92% confidence score means nothing in a courtroom without a chain of custody, a known model version, and a documented methodology. If your output might ever be used as evidence, treat it that way from day one — not as an afterthought.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Log model provenance explicitly.&lt;/strong&gt; Which version of the detector flagged this? What training data was it exposed to? These questions matter the moment someone disputes a finding in public. Most APIs I've worked with don't surface this at all.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Build in uncertainty communication by default.&lt;/strong&gt; Instead of a single score, surface a distribution. "This result falls in a range where the model produces false positives 18% of the time under these image conditions." Harder to misquote.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Think about the adversarial UI, not just the adversarial input.&lt;/strong&gt; We spend a lot of time thinking about adversarial examples that fool detectors. We spend almost no time thinking about how bad actors will &lt;em&gt;present&lt;/em&gt; detector output to audiences who don't understand what it means.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The forensics paradox
&lt;/h2&gt;

&lt;p&gt;Here's the thing about digital forensics being the "only sure answer" to deepfakes: it requires a trusted institution to perform it, a trusted chain of custody for the media, and a public that believes the institution. All three of those are eroding simultaneously.&lt;/p&gt;

&lt;p&gt;A forensic finding from a university lab means less when half your audience thinks universities are politically captured. A chain of custody argument lands differently when the platform hosting the media is actively in a political fight.&lt;/p&gt;

&lt;p&gt;I'm not saying detection tools are useless — I keep using AI or Not because the signal is real and it's gotten my antenna up on things I would have scrolled past. But I've started thinking of detection as one input into a much larger trust problem, not as a solution to it.&lt;/p&gt;

&lt;p&gt;The liar's dividend was always about epistemics, not technology. We built better detectors and handed the epistemics problem a new set of props.&lt;/p&gt;




&lt;h2&gt;
  
  
  What actually changes the calculus
&lt;/h2&gt;

&lt;p&gt;A few things that seem underbuilt relative to the detection side:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Provenance standards.&lt;/strong&gt; The &lt;a href="https://c2pa.org/" rel="noopener noreferrer"&gt;C2PA spec&lt;/a&gt; attaches cryptographic provenance to media at capture time. If the camera signs the frame and the signature breaks on edit, that's a different kind of evidence than a classifier score. It's not widespread yet, but it's the right direction.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Legal frameworks for false claims of AI generation.&lt;/strong&gt; Right now there's almost no cost to wrongly claiming something is a deepfake. A few jurisdictions are looking at this; none have moved fast enough.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adversarial red-teaming of the human layer.&lt;/strong&gt; We red-team models constantly. We almost never red-team how users and journalists will misread or be manipulated by model output.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>security</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>LLM Drift: Why Your AI Detection Pipeline is Quietly Decaying (Kimi K2 Benchmark)</title>
      <dc:creator>Thomas </dc:creator>
      <pubDate>Mon, 27 Apr 2026 21:09:02 +0000</pubDate>
      <link>https://dev.to/thoams_aidetection/llm-drift-why-your-ai-detection-pipeline-is-quietly-decaying-kimi-k2-benchmark-3gml</link>
      <guid>https://dev.to/thoams_aidetection/llm-drift-why-your-ai-detection-pipeline-is-quietly-decaying-kimi-k2-benchmark-3gml</guid>
      <description>&lt;p&gt;A short field report on what current AI detectors actually do when you point them at frontier reasoning model output, and what I changed in my own detection workflow.&lt;/p&gt;

&lt;p&gt;I integrate AI detection into a few small side projects—content moderation pre-filters, writing quality flags, etc. The more I relied on detection, the more concerned I became that I was trusting numbers based on stale benchmarks.&lt;/p&gt;

&lt;p&gt;This week, a benchmark study confirmed my worst fears. It tested two popular detectors against 47 essays generated by Kimi K2 in "thinking mode," which mimics modern, high-variance LLM output.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffejw7xpx6alsly100auf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffejw7xpx6alsly100auf.png" alt=" " width="800" height="194"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;ZeroGPT missed 62% of the AI content. For context, the same study notes that ZeroGPT classifies the 1776 U.S. Declaration of Independence as 99% AI-generated. If a detector flags famously human text as AI, the false-positive ceiling is high enough to invalidate its positives on actual AI text.Why Legacy Detection Fails Modern LLMs&lt;/p&gt;

&lt;p&gt;If you've shipped &lt;a href="//www.aiornot.com"&gt;AI detection&lt;/a&gt;, you probably integrated it once, picked a confidence threshold, and considered the job done. This is the failure mode the benchmark exposes: Detector accuracy is not stable across model generations.&lt;/p&gt;

&lt;p&gt;Most public detectors were built around three assumptions about older LLM output:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Low perplexity: Text is predictable and falls below a certain perplexity score $\rightarrow$ Flag as AI.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Uniform structure (Low Burstiness): Sentences have low variance in length and structure $\rightarrow$ Flag as AI.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Predictable features: Use of function-word patterns and standard transition phrases $\rightarrow$ Flag as AI.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Reasoning models like Kimi K2, Gemini 2.5 Pro, and GPT-5 break all three:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Output is contextually adaptive, meaning perplexity varies wildly within a single response.&lt;/li&gt;
&lt;li&gt;Sentence variance increases during exploratory "thinking" passages.&lt;/li&gt;
&lt;li&gt;Token distributions are deliberately broadened to mimic human reasoning rhythms.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your detector hasn't been retrained on current reasoning model output, it’s classifying against a distribution that no longer exists in production. The 38% accuracy is the result of this structural drift.Actionable Fixes: Hardening the Detection Pipeline&lt;/p&gt;

&lt;p&gt;After re-checking my own setup, here are the four concrete changes I made 1 Confidence Threshold Raised to 0.85&lt;/p&gt;

&lt;p&gt;A 0.62 mean confidence on a fully AI-positive test set indicates that individual high-looking scores can still be coin flips. For anything that triggers an action (like a submission rejection or account flag), I now require multi-signal corroboration or human review if the score is below 0.85.2. Build a Held-Out Test Set from Current Models&lt;/p&gt;

&lt;p&gt;I’m now generating my own validation samples from current frontier models (Kimi K2, Claude Sonnet 4.6, GPT-5, Gemini 2.5 Pro) and running them through my detection layer monthly.&lt;/p&gt;

&lt;p&gt;The set also includes "human-positive" texts (like the Declaration) to constantly monitor the false-positive rate.&lt;/p&gt;

&lt;h1&gt;
  
  
  Pseudo-code for the monitoring set I now keep around
&lt;/h1&gt;

&lt;p&gt;HELD_OUT = {&lt;br&gt;
    "ai_positive": [&lt;br&gt;
        # 50 samples each from current frontier models&lt;br&gt;
        kimi_k2_samples,&lt;br&gt;
        claude_sonnet_4_6_samples,&lt;br&gt;
        gpt_5_samples,&lt;br&gt;
        gemini_2_5_pro_samples,&lt;br&gt;
    ],&lt;br&gt;
    "human_positive": [&lt;br&gt;
        # public-domain texts written before 2020&lt;br&gt;
        declaration_of_independence,&lt;br&gt;
        federalist_papers_excerpts,&lt;br&gt;
        public_domain_essays,&lt;br&gt;
    ],&lt;br&gt;
}&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Treat Detection as a Probabilistic Component&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Even 97% accuracy means a 3% misclassification rate at scale. For anything where the cost of an error is real, detection must be a signal, not a verdict.4. Verify Modality Fit&lt;/p&gt;

&lt;p&gt;I use AI or Not for image and audio checks in my projects because it covers multiple modalities. The Kimi K2 benchmark gave me a current-model accuracy number for the text side, which closed a vital verification gap I couldn't easily verify on my own.A Minimum-Viable Detector-Monitoring Pattern&lt;/p&gt;

&lt;p&gt;If you are running detection in a production pipeline, this is the basic ML hygiene that keeps the integration from silently failing:&lt;/p&gt;

&lt;p&gt;LOOP (monthly):&lt;br&gt;
   for detector in production_pipeline:&lt;br&gt;
       accuracy_ai      = run(detector, HELD_OUT.ai_positive)&lt;br&gt;
       accuracy_human   = run(detector, HELD_OUT.human_positive)&lt;br&gt;
       mean_confidence  = avg_confidence(detector, HELD_OUT.ai_positive)&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   if accuracy_ai     &amp;lt; baseline.ai - 0.05:    alert("AI detection regressed")
   if accuracy_human  &amp;lt; baseline.human - 0.05: alert("FP rate increased")
   if mean_confidence &amp;lt; baseline.conf - 0.10:  alert("Detector going uncertain")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Most teams I've seen integrate detection once and never check it again. This pattern is essential because accuracy decays per model generation.TL;DR&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;97% vs 38% on Kimi K2 essays shows a structural, not a tuning, gap.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Detector accuracy decays per model generation. Re-benchmark quarterly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Test false-positive rate against famously human text (the Declaration of Independence is a free check).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Raise your confidence threshold; one number is not a verdict.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Build a held-out test set from current models and monitor it on cadence.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're running detection in production and you can't name the generation of model you benchmarked against, you have an invisible calibration gap. The benchmark was the wake-up call; the monitoring pattern is what makes the fix permanent.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>testing</category>
    </item>
  </channel>
</rss>
