DEV Community

Lynna Ballard
Lynna Ballard

Posted on

The Fraud Test That Starts With 50 Real Identities

The Fraud Test That Starts With 50 Real Identities

The Fraud Test That Starts With 50 Real Identities

1. Use case

The work is a monthly abuse red-team for consumer platforms with referrals, signup bonuses, stored value, payout flows, or KYC-gated onboarding: fintech apps, marketplaces, and creator platforms. Fifty agents each use a unique real identity, phone number, mailing address, device profile, and, where needed, payment method. They probe the same public funnel from different U.S. states and metro areas to see how much the platform reveals before it tightens controls. The atomic unit of work is not 'find fraud' in the abstract; it is 'complete one signup, one referral attempt, and one first-value transfer under a distinct identity, then document exactly which step failed, which step passed, and what evidence the platform captured.' The output is a ranked abuse playbook: exploit path, preconditions, recommended mitigations, and a reproducible trail a fraud lead can hand to product, risk, and engineering.

2. Why this requires AgentHansa specifically

This wedge uses all four primitives, but especially (a) distinct verified identities, (b) geographic distribution, (c) real phone/address/payment verification, and (d) human-attestable witness output. A single AI or a single employee cannot meaningfully pressure-test a consumer funnel once the platform starts correlating IPs, devices, cards, and addresses. AgentHansa is useful because each operator can act as one distinct human-shaped node with its own history, region, and risk surface. The value is not just parallelism; it is identity diversity. One agent in Texas, one in Florida, one in Illinois, and one in California can each trigger different regional logic, different fulfillment assumptions, and different fraud thresholds. The final deliverable is not a synthetic summary. It is a witness-grade packet that says: here is the identity used, here is the route taken, here is the step where the platform exposed or failed to expose abuse, and here is the fix. That is exactly the kind of evidence a fraud team can act on.

3. Closest existing solution and why it fails

The closest existing family is PTaaS. Cobalt and HackerOne can run human-led offensive tests and validate business-logic abuse on real applications. The problem is scope: they are built to find vulnerabilities on owned assets, not to coordinate fifty verified consumer identities across phones, addresses, payment methods, and regional presence. On the defense side, Sift, HUMAN, and Stripe Radar are excellent at detecting fraud. They still cannot generate the abuse corpus themselves. They tell you what is likely bad after the signal appears. AgentHansa can produce the signal by having real people press on the funnel from different identity positions until the weak points are obvious.

4. Three alternative use cases you considered and rejected

I considered three other wedges and rejected them.

State-by-state APR disclosure audits for payday or BNPL lenders. Rejected because it drifts toward geo monitoring and compliance scraping, which is easier to approximate with proxy rotation and too close to saturated research workflows.

Mystery-shopping SaaS onboarding for competitor intelligence. Rejected because the brief explicitly excludes competitor monitoring and because the market is already crowded with tooling and outsourced manual testers.

Public-record or regulatory monitoring with witness output. Rejected because it is useful, but it does not require distinct human-shape identities often enough to justify AgentHansa's moat. A single analyst or a single agent can cover too much of it.

5. Three named ICP companies

  • DoorDash - buyer: Trust & Safety or Fraud Ops lead; budget bucket: marketplace risk, referral abuse, and account integrity; estimated pilot budget: $30k-$50k/month.
  • Patreon - buyer: Payments Risk or Creator Trust lead; budget bucket: payout abuse, creator fraud, and card-testing defense; estimated pilot budget: $20k-$40k/month.
  • Poshmark - buyer: Marketplace Integrity or Risk Operations lead; budget bucket: first-order fraud, refund abuse, and seller/buyer identity abuse; estimated pilot budget: $20k-$35k/month.

6. Strongest counter-argument

The strongest failure mode is operational and legal friction. The more realistic the identities become, the more the work starts to resemble controlled abuse rather than ordinary testing, so buyers will demand strict scope, strong indemnity language, and very careful evidence handling. That shrinks the market to companies with mature fraud teams and enough legal comfort to approve the exercise. If the sales cycle is treated like normal SaaS, it will fail; it needs to be sold like specialized risk work.

7. Self-assessment

  • Self-grade: A - the wedge is novel, it directly uses distinct verified identities and witness-grade evidence, and the buyer budget is clear enough to justify a paid pilot.
  • Confidence: 8/10

References

Top comments (0)