DEV Community

CaraComp
CaraComp

Posted on • Originally published at go.caracomp.com

Your Job Application Just Sold 3 Pieces of You

THE HIDDEN IDENTITY PIPELINE IN JOB BOARDS

As developers, we often think of job platforms as glorified CRUD applications. You upload a PDF, the backend parses it into a SQL database, and a recruiter queries it. But a recent study analyzed by Biometric Update reveals a much more complex—and ethically questionable—data pipeline. Eight out of nine major job platforms are now monetizing digital identities, turning what should be a simple matching service into a massive biometric and behavioral data engine.

For those of us working in computer vision and facial recognition, this news highlights a critical shift in how training data is acquired. It’s no longer just about scraping public web images; it’s about harvesting high-fidelity biometric signals during "liveness checks" and identity verification steps.

The Biometric Feature Extraction Problem

When a platform requires a "selfie" or a video liveness check to verify a candidate, they aren't just checking an ID. They are capturing biometric signatures. From a technical perspective, this means extracting facial landmarks and converting them into vector embeddings.

The industry claim is often that this data is "anonymized." However, as anyone working with Euclidean distance analysis knows, the vector itself is the identity. If you have a high-dimensional representation of a face, stripping the "Name" column from your database doesn't make the data anonymous; the geometry remains a unique, persistent identifier.

This is where the dev community needs to pay attention. Most platforms are licensing these datasets to train AI models. Your "verification" video might be the ground truth for a third-party GAN or a facial recognition algorithm you never consented to join.

Comparison vs. Surveillance: A Technical Distinction

At CaraComp, we see this data sprawl as a major red flag for the professional investigation community. There is a fundamental difference between facial recognition (scanning crowds for surveillance) and facial comparison (performing side-by-side analysis of specific case photos).

The job platforms selling data are feeding the surveillance side of the house. In contrast, tools built for private investigators and OSINT professionals focus on Euclidean distance analysis to compare two specific images provided by the investigator. We believe this is the more ethical path for the industry: technology that serves the investigator's specific case rather than selling the user's biometric footprint to the highest bidder.

The Infrastructure of Behavioral Identity

Beyond biometrics, these platforms are building behavioral profiles. Every API call—from a search query to a "time-on-page" metric for a job description—is being aggregated. For a developer, this is a masterclass in telemetry, but for a user, it’s a privacy nightmare.

These platforms use this telemetry to create "behavioral identity profiles." When combined with biometric data, you have a 360-degree digital twin of a candidate that can be sold to insurance companies or background-check vendors.

The Investigator’s Edge

For solo investigators and small PI firms, the high cost of enterprise-grade tools (often $1,800+/year) has historically pushed people toward less secure, consumer-grade search tools. But as data privacy becomes a central issue, the need for professional, affordable tools that don't sell your data is paramount.

We built CaraComp to bridge this gap. For $29/mo, investigators get the same high-tier Euclidean distance analysis used by federal agencies, without the enterprise price tag or the invasive data-sharing practices of "free" or social-media-based tools. We provide court-ready reports and batch processing, ensuring that the technology stays in the hands of the professional, not the data broker.

If you’ve ever spent hours manually comparing photos across a case, you know how critical accuracy is. Don't rely on platforms that treat your identity—or your subjects' identities—as a product.

Try CaraComp free → caracomp.com

Discussion for the devs: When building identity verification into your apps, what's your stack for ensuring biometric data isn't leaked into your general analytics pipeline?

Top comments (0)