DEV Community

Evan S
Evan S

Posted on

Why I Refused to Build a Web Scraper for My AI Facial Matching App


Building an AI facial matching app right now presents a massive ethical dilemma for developers.

If you want to build a "find your doppelganger" tool, the easiest, fastest architectural route is obvious: spin up a Python script, hook into some open-source reverse-image APIs, and scrape the open web for massive datasets. It’s cheap, it’s fast, and it’s what almost everyone else is doing.

But it is also a massive privacy nightmare.

Every day, thousands of automated bots are crawling the web, quietly cataloging biometric data from background faces in tourist photos, public forums, and event galleries. They log facial structures into unregulated databases without user consent.

As a developer, I realized I couldn't ethically contribute to that ecosystem. So, when I started building DopplGrid, I set a hard constraint: Zero open-web scraping. Here is how I approached the architecture instead.

The Closed-Network Solution

I decided to build the antithesis of an open-web scraper: a 100% closed, opt-in biometric network.

Instead of searching the internet, the app acts as a private photo radar. It only scans and matches photos uploaded within its own secure Firebase ecosystem. If you aren't explicitly opted-in and mapped in the database, the engine cannot see you.

Mapping Geometry, Not Pixels

Most lazy scrapers use basic pixel-matching, which is terribly inaccurate and leads to false flags. To make the closed network actually valuable, I had to ensure the matching was mathematically precise.

The DopplGrid engine maps 128 unique points of a user's facial geometry (similar to how FaceID operates) and stores that map in a secure personal vault.

This allows for two completely private use cases:

  1. Global Matching: Users can scan the opted-in network to securely find their exact biometric doppelganger.
  2. Similarity Testing: Two users can scan their faces and the algorithm will calculate their exact mathematical similarity percentage (great for settling family debates about who the baby looks like).

The Developer Responsibility

We can't rewind the clock on the internet, but we can change how we build biometric tools moving forward. We need to stop relying on platforms that scrape data indiscriminately and start building platforms rooted in absolute consent.

The reality is that user facial data is already floating around the internet. Our job should be giving them tools to take active ownership of it, rather than harvesting it from them.

I just pushed the React build live. If you are interested in privacy-first architecture, ethical AI, or just want to test out the matching engine, I’d love for the dev community to tear it apart and give me feedback on the UI.

Check out the secure vault at DopplGrid.

Top comments (0)