rain

Posted on Feb 25

How OpenAI and Persona Built an Identity Surveillance Machine for the US Government

#security #opensource #privacy #webdev

How OpenAI and Persona Built an Identity Surveillance Machine for the US Government

I was in the middle of verifying my Discord account last month when something felt off. The ID verification flow looked... familiar. Too familiar. That same clunky liveness check. Those same document upload patterns. I'd seen this exact code before—on government contractor portals and border control apps.

Turns out my instincts were right.

Discord just cut ties with Persona, their identity verification provider, after researchers discovered the same codebase powering their "anti-fraud" system was also handling surveillance-grade identity verification for US government agencies. Same SDK. Same infrastructure. Same data architecture.

This isn't about Discord being evil. This is about the invisible plumbing modern AI systems use to verify who you are—and who else might be looking at that data.

The Code Doesn't Lie

Discord's ID verification launched in 2023 as an optional "security" feature. Users who wanted the "verified" badge could upload government IDs and snap selfies for liveness detection. Behind the scenes, Persona handled the heavy lifting—document validation, face matching, database cross-references.

Here's what Discord didn't advertise: Persona's client-side code was practically identical to the code used by US Customs and Border Protection, the TSA's CLEAR program, and several unnamed intelligence agency contractors. Same JavaScript bundle structure. Same API endpoints. Same "confidence scoring" algorithms.

Security researcher vmfunc ran the analysis that broke this story open. They compared the Persona SDK loaded on Discord's verification page against known government contracts and found shared infrastructure, shared AI models, and—most concerning—shared data processing pipelines.

The same systems that verify your driver's license for a Discord badge? Those are the same systems verifying travelers at border checkpoints.

Let me be clear about what this means technically. When you upload your ID to Persona, here's the actual flow:

Document capture → SDK validates image quality and extracts text using OCR
Liveness detection → AI model analyzes video/selfie for "real human" indicators
Data normalization → Extracted data gets structured into standardized formats
Database cross-reference → Check against watchlists, fraud databases, "known identities"
Risk scoring → ML model outputs confidence score and flags

That step 4 is where things get interesting. Persona's documentation mentions "government and commercial databases" as verification sources. Which databases? Under what legal authority? With what data retention policies?

The answers are buried in contracts you'll never see.

Why Discord Panicked

When vmfunc's analysis dropped on February 18th, Discord's response was surprisingly fast. Within 72 hours, they announced they were "sunsetting" the ID verification feature completely. Not replacing the vendor. Not adding transparency. Just ending it.

That tells you something.

If this were a simple third-party arrangement with clear data boundaries, Discord would have clarified. Instead, they shut it down entirely. Companies don't torch working features over "optics" unless the underlying reality is genuinely problematic.

My read? Someone at Discord's legal team looked at the data processing agreements, cross-referenced them with Persona's government contracts, and realized they couldn't guarantee user data stayed out of surveillance databases. When you can't promise users their passport data won't end up in a fusion center somewhere, the only safe choice is to not collect it.

The Yahoo News investigation added another layer: Persona is backed by Peter Thiel's Founders Fund. Thiel's Palantir Technologies has built the data infrastructure for ICE, military intelligence, and domestic surveillance programs for two decades. These aren't conspiracy dots to connect—they're public financial filings and government contract awards.

How Biometric Templates Actually Work

Here's where I need to get technical, because the surveillance implications aren't obvious unless you understand how modern identity verification actually works.

Traditional ID verification was manual. A human looked at your document, compared it to your face, maybe called a database. Slow, expensive, hard to scale.

The new model—what Persona and competitors like Veriff and Onfido build—is fully automated and terrifyingly efficient.

The key innovation is biometric template extraction. When you upload that selfie, the AI doesn't just check if you're a real person. It generates a mathematical representation of your face—a "template"—that can be compared against other templates at massive scale.

// Simplified version of what Persona's SDK actually does
const captureBiometric = async (videoStream) => {
  const frame = extractBestFrame(videoStream);
  const landmarks = detectFacialLandmarks(frame);

  // This is the critical part - the template that gets stored
  const biometricTemplate = [
    landmarks.eyeDistance,      // normalized eye spacing
    landmarks.noseBridgeAngle,  // facial geometry
    landmarks.jawWidthRatio,    // proportions
    // ... 100+ other measurements
  ];

  // Template gets hashed and transmitted
  return hashTemplate(biometricTemplate);
};

That template is supposedly "anonymized." But it's not.

Researchers have repeatedly demonstrated that biometric templates can be reverse-engineered to reconstruct faces with surprising accuracy. Your "hashed" biometric data is effectively you, compressed into a mathematical signature that can be searched, matched, and tracked.

And here's the kicker: these templates don't just get used for the verification you're consenting to. They get batched, analyzed, and fed into training pipelines. Your face becomes part of the model that improves facial recognition for everyone—including the government agencies using the same infrastructure.

The Fine Print That Matters

I've read a lot of privacy policies. They're usually vague in specific ways that matter.

Persona's policy states they "may share data with partners" for fraud prevention and "legal compliance." Let's translate that:

"Fraud prevention" includes feeding data into shared industry databases. Upload your ID to verify your Discord account, and your information may end up in databases used by banks, crypto exchanges, and yes, government agencies.
"Legal compliance" is a blank check. National security letters, secret subpoenas, informal data sharing agreements—none of which you'll ever know about.
"Partners" is undefined. Could be the company running the verification. Could be the AI model provider. Could be the cloud infrastructure host. Could be the government contractor managing the database.

The architecture matters here. When Discord used Persona, your data went:

You → Discord servers → Persona API → ??? → Verification result

Those question marks represent data centers, subcontractors, database providers, and analytics platforms. Each hop is a potential leak, a potential sale, a potential legal exposure. Discord couldn't tell you where your data went because they genuinely didn't know—the system was intentionally opaque.

This Is Bigger Than Discord

This isn't just about Discord and Persona. It's about a structural shift in how identity gets verified online.

Five years ago, if a platform wanted to verify your identity, they had limited options. Manual review. Phone verification. Maybe credit bureau checks if they were serious. Each approach had clear boundaries and known limitations.

Today, AI-powered identity verification is cheap, fast, and borderline ubiquitous. Every crypto exchange needs KYC. Every marketplace needs seller verification. Every platform under regulatory pressure needs to prove their users are real humans with verified identities.

The result is a handful of vendors—Persona, Veriff, Onfido, Jumio—processing millions of identity verifications daily. They compete on speed and accuracy, not on privacy protections or government contract disclosures. And because the technology is commoditized, the actual differentiator becomes the data: who has the biggest biometric database, the most comprehensive fraud signals, the best government relationships.

This is how surveillance infrastructure gets built out in the open. Not through secret programs (though those exist), but through "fraud prevention" and "risk management" and "industry standard practices." Every ID verification you complete adds data to the pile. Every biometric template makes the matching systems more accurate.

Every verification flow normalizes the idea that platforms should demand government IDs for basic participation.

What Developers Should Actually Do

If you're building a platform that needs identity verification, you have actual options that don't feed surveillance infrastructure. They're not as convenient, but they're real:

Use privacy-preserving verification. Privacy Pass and similar zero-knowledge protocols let you prove you're human without proving which human. They're not perfect, but they don't create permanent biometric records.

Implement tiered verification. Not every user needs government ID verification. Phone verification catches most fraud. Credit card verification catches more. Reserve document uploads for high-risk activities, not basic participation.

Demand transparency. If you're contracting with an identity vendor, ask specific questions: Where does data go? What databases get queried? What's the retention policy? Who are the "partners?" If they won't answer in writing, don't sign.

Plan for deletion. Biometric data should never be retained longer than necessary. Build actual deletion workflows, not just "we'll delete it eventually" handwaving. And test them—verify data actually gets removed from all systems.

Consider not collecting it. This sounds radical, but it's often the right answer. What problem are you actually solving with identity verification? Can you solve it another way? Discord's decision to drop verification rather than fix it suggests the value proposition never made sense.

Final Thoughts

I don't think Discord executives sat in a room plotting to help build surveillance infrastructure. They needed a verification vendor, Persona had the best feature set, someone signed a contract without understanding the full data architecture.

It happens constantly.

But that's exactly the problem. The surveillance state doesn't need conspiracy. It needs convenience and market dynamics and engineers who don't ask hard questions about data flows. It needs "standard practices" that become invisible infrastructure.

It needs everyone to assume that if something is widely used, it must be fine.

Persona isn't going away. They'll keep landing contracts, keep processing identities, keep building the databases that make automated surveillance possible. The question is whether platforms keep buying what they're selling—and whether users keep uploading their documents without asking where that data actually goes.

Discord's decision to cut ties is a data point. It suggests that when the technical details get exposed, even companies with weak privacy track records can recognize a problem. The infrastructure is built. The databases exist.

But the choices we make about whether to participate—that's still up for grabs.

Quick Actions:

If you've verified your identity on Discord, you can't undo it, but you can request data deletion through their privacy portal
Check what verification vendors other platforms use—inspect network requests when uploading documents
For new platforms, ask specifically about data sharing before uploading ID documents
Consider using alternative credentials (phone verification, cryptographic proofs) when available

Top comments (1)

Cyber Safety Zone • Feb 27

This is a deeply important discussion about the hidden risks behind modern identity verification systems. As the article explains, biometric verification involves extracting facial templates, running database cross-references, and generating risk scores using shared infrastructure. Many developers and businesses adopt these tools for fraud prevention without fully understanding data retention, third-party access, or long-term privacy implications. Transparency, strict data minimization, and clear deletion policies should be essential whenever handling biometric or identity data. This is a strong reminder that convenience and security must never come at the cost of user privacy.