DEV Community

dwikbtn
dwikbtn

Posted on

Is Your Data Training Someone’s AI? I Built a Browser Extension to Find Out.

I’ve been thinking a lot about AI lately. Not the cool stuff, the other side of it.

Every time I scroll the internet, I keep wondering: “Is this website using my data to train AI?”

And the more I looked into it, the more uncomfortable it felt. A lot of companies are either vague about their AI policies, or they hide everything deep in their Terms of Service. Some are very clear about training on user content. Some pretend to be clear. Some don’t mention anything at all.

Either way, the normal user has no idea.

So I decided to build something that would tell you instantly. Whether a site you’re visiting might use your data to train AI.

And that’s how WTOM (WhoTrainedOnMe) started.

The Problem: Zero Transparency

Most AI models today were trained using massive scraped datasets.
Your posts, your art, your photos, your comments — anything public.

But here’s the issue:

  • Most platforms bury their AI training info
  • Opt-out options (if they exist) are hidden
  • Some don’t allow opt-out at all
  • And many use vague phrases like “improving our services”

It feels like the internet is being quietly vacuumed into AI systems, and nobody told us.

As someone who works with creators and builds products, this didn’t sit right with me. People deserve to know what's happening to their data.

What I Wanted to Build

I wanted something extremely simple:

When you visit a website, your browser should whisper:
“Hey, this site might use your data to train AI. Want to opt out?”

No drama.
No fear-mongering.
Just transparency.

If the platform offers an official opt-out, great.If not, you should still have a way to push back, like sending a protest email.

And it all had to feel natural. Minimal UI. Zero setup. Just works.

Researching AI Training Across the Web

This part was… a journey.

I read a lot of TOS documents. Too many, honestly.
Some were clear.
Some felt intentionally unclear.
Some were 20+ pages of lawyer-speak.

I ended up categorizing platforms into:

  • Explicit — clearly training AI on user data
  • Vague — “may use your content for service improvement”
  • Unclear — no mention or confusing
  • Denied — explicitly say they don’t train AI with your data

It was surprising how inconsistent everything is.

Even more surprising:
Some sites used in major training datasets never mentioned AI at all.

Building WTOM (The Fun Part)

The extension itself is built with:

  • WXT
  • React
  • Supabase

A structured config system so each platform has its own rules

WTOM works like this:

  • You visit a website
  • WTOM checks its domain against the database
  • If it’s flagged, a small widget appears on the page

The widget shows:

  • Whether the site trains AI
  • How transparent they are
  • The link to their TOS
  • Whether you can opt out
  • What you can do next

You don’t need to open any settings.It just pops up when needed.

Opt-Out and Protest

This part matters.

Some platforms have official opt-out: toggles, forms, or email requests.
WTOM connects you directly to those. But many don’t offer anything.
So WTOM also lets you send a protest email — something like:

“I don’t consent to my data being used to train AI.”

You can even choose your tone:

  • Formal
  • Gentle
  • Bold

Your name and email auto-fill locally (nothing is stored).

It’s not perfect, but it gives you a voice where there wasn’t one.

Things That Surprised Me While Building

  • How many platforms don’t mention AI at all
  • How many hide their opt-out behind 3–5 clicks
  • How inconsistent the language is
  • How many creators feel powerless
  • How many AI training datasets contain images/art pulled from everywhere

One of the wildest things was seeing how many domains show up repeatedly in scraped datasets — Pinterest, DeviantArt, blogs, Flickr, Tumblr, etc.
These are places where artists post work.
No wonder creators feel like they’re being used without consent.

Why This Matters

AI isn’t going away, and I’m not against AI at all.
What I’m against is the lack of transparency.

Users should:

  • Know what’s happening
  • Know who’s training on their data
  • Have a way to say no
  • Have a simple tool that makes it easy

WTOM isn’t going to fix the whole system, but it’s a step toward giving people real control.

What’s Next for WTOM

  • A few things I’m planning next:
  • Adding more domains
  • Improving accuracy of training + transparency labels
  • A public domain reference page on the website
  • Better onboarding
  • More educational content
  • Maybe a Transparency Score system later This is a long-term project, but I’m excited about where it’s headed.

Try WTOM

If you’re curious whether sites you browse might be using your data to train AI, try WTOM:

Chrome/Chromium versions: Check it here
Firefox version: Check it here
If you want to support the project, I’ll leave a BuyMeACoffee link soon.

Final Thoughts

Building WTOM made me realize how little visibility we have into how our digital life is used.
Everything we create online — art, text, comments, photos — can quietly become training data.

WTOM won’t stop AI, but it gives you information, and information gives you choices.

If this project helps even a few people feel more in control, that’s already a win.

Top comments (0)