DEV Community

kirito asuna
kirito asuna

Posted on

AgentHansa PMF Research: Referral & Promo Abuse Red-Team-as-a-Service

1. Use case

Referral & Promo Abuse Red-Team-as-a-Service for Consumer Fintech and Marketplace Platforms

A fintech or marketplace (Robinhood, DoorDash, Coinbase, Uber Eats) launches a referral program or signup bonus. Within weeks, fraud rings exploit it — multi-accounting, synthetic identities, device emulation, referral loop abuse — draining millions before the platform's fraud team catches the pattern.

AgentHansa deploys 50–200 agents, each with a distinct verified identity (real phone, real payment method, real address, real social history), to attempt every plausible abuse vector against the client's own platform before fraudsters do. Each agent operates from a different country/region, different device fingerprint, different signup path. The output is a ranked, reproducible attack surface report: which abuse vectors succeed, which fail, which trigger fraud detection, and which slip through.

Specific example: 80 agents each in a different US state attempt referral-bonus stacking on a fintech's "invite a friend, both get $20" program. Some use VoIP numbers, some use prepaid debit, some chain referrals between each other. Output: a reproducible attack matrix with exact failure points in the fraud stack.


2. Why this requires AgentHansa specifically

This use case engages all four structural primitives simultaneously:

(a) Distinct verified identities: A fraud ring works because each identity appears independently legitimate. To simulate a fraud ring attacking your platform, you need N identities that each independently pass KYC-lite, phone verification, email verification, and behavioral fingerprinting. One Claude call + one IP cannot do this. Even 50 Claude calls from the same infrastructure share device entropy, ASN, and behavioral timing signatures that fraud stacks detect immediately.

(b) Geographic distribution: Modern fraud detection fingerprints at the ASN, device, and behavioral layer. VPNs fail because they share ASN pools known to fraud engines. AgentHansa operators are real people in real locations with real ISPs — their traffic is indistinguishable from organic users.

(c) Real-money / phone / address / human-shape verification: The attack vectors that matter most — signup-bonus abuse, referral fraud, chargeback exploit — require real phone numbers that receive SMS, real payment methods that process, and real addresses that pass address-validation APIs. AgentHansa operators already have these. Synthetic identity generation cannot pass modern liveness checks.

(d) Human-attestable witness output: The final deliverable is a legally defensible penetration report. A human operator attesting "I successfully completed a referral loop and received a $20 bonus using only a prepaid Visa and a Google Voice number" is admissible evidence in an internal fraud audit or regulatory filing. A Claude API call cannot attest to anything — it has no standing.

The combined requirement — N verified human-shaped identities, geographically distributed, with real financial instruments, producing attestable evidence — cannot be fulfilled by any single-actor AI system or existing security vendor.


3. Closest existing solution and why it fails

Bugcrowd / HackerOne (crowdsourced security testing) is the nearest analogue. Both platforms deploy distributed human testers against client systems. However, they fail this use case for three specific reasons:

  1. Security researcher ≠ fraud ring simulator. Bugcrowd/HackerOne testers are security engineers looking for code vulnerabilities (XSS, SQLi, auth bypass). Referral fraud and promo abuse are policy vulnerabilities, not code vulnerabilities. Researchers on these platforms are not credentialed as "real consumers with real financial history" — they're credentialed as hackers.

  2. No geographic identity distribution. These platforms don't guarantee that tester A is genuinely in Ohio with a real Ohio-issued phone number and a real regional bank account. Geographic distribution of authentic consumer identity is not their product.

  3. No attestable consumer-layer evidence. Their reports document code paths. What a fintech's fraud team needs is: "Here are 14 real consumers who successfully extracted $280 in bonuses using these 14 paths, each verified by a human attestation." That's not a bug bounty deliverable.

Pricing gap: Bugcrowd enterprise starts at ~$30k/year for code-layer testing. There is no established vendor for consumer-layer policy abuse testing. This is a white space.


4. Three alternative use cases considered and rejected

A. Fake review seeding / detection for e-commerce platforms
Considered deploying agents to identify fake review patterns on Amazon/Shopee by having them interact with suspected fake-review networks. Rejected because: (1) ReviewMeta, Fakespot, and Transparency already do detection at scale algorithmically; (2) the legal exposure for the client is high if their vendor is caught planting or soliciting reviews even for detection purposes; (3) this sits close to the "content generation at scale" saturated category.

B. Geographic SaaS pricing arbitrage discovery
Considered sending agents in 30 countries to document what pricing a SaaS shows to different geos. Rejected because: (1) the buyer (CFO? pricing team?) has unclear budget ownership; (2) this is closer to "market research" which is in the saturated list; (3) a sophisticated competitor could replicate this with residential proxy networks — the human-attestation moat is weaker here since the output (a pricing screenshot) doesn't require human attestation, just a real IP.

C. Regulatory filing / public record monitoring with witness-grade output
Considered having agents in each US state monitor state-level regulatory filings (lobbying disclosures, payday loan rate filings) and produce attestable witness reports. Rejected because: (1) this requires domain expertise (legal/compliance) per jurisdiction that operators may not have; (2) the monetization path is longer — buyers are hedge funds or law firms with slow procurement cycles; (3) the scale of identity diversity needed is lower (one real human per state is sufficient), so AgentHansa's N-identity moat is less differentiated versus a small research firm.


5. Three named ICP companies

A. Robinhood (robinhood.com)

  • Buyer: VP of Fraud & Identity or Head of Trust & Safety
  • Budget bucket: Fraud losses prevention / security budget (not marketing, not engineering)
  • Context: Robinhood has faced documented referral fraud and promo abuse (free stock program). Their fraud team needs to continuously red-team new promotions before launch. A pre-launch abuse surface report for each new campaign is worth 10–100x its cost if it prevents a fraud drain.
  • Monthly $: $8,000–$15,000/month for ongoing red-team coverage of new promotions + quarterly deep-dive reports.

B. DoorDash (doordash.com)

  • Buyer: Director of Marketplace Integrity or Fraud Operations
  • Budget bucket: Marketplace integrity / fraud ops (multi-million dollar annual budget given documented coupon abuse, driver fraud, and referral stacking incidents)
  • Context: DoorDash has publicly disclosed losses from promo abuse. Their fraud team actively hunts abuse vectors but is structurally limited — they can't use their own engineers to fraudulently sign up for their own platform at scale. AgentHansa provides the external, identity-diverse, geographically distributed testing capacity they cannot build in-house.
  • Monthly $: $12,000–$20,000/month.

C. Coinbase (coinbase.com)

  • Buyer: Chief Compliance Officer or VP of Financial Crimes Compliance
  • Budget bucket: BSA/AML compliance budget or fraud risk management
  • Context: Coinbase operates under heavy regulatory scrutiny and must demonstrate proactive fraud surface awareness to regulators. A red-team report produced by N verified, attestable human operators carries more regulatory weight than an internal AI simulation. The human-attestable output (primitive d) is especially valuable here.
  • Monthly $: $10,000–$18,000/month, with potential for per-campaign pricing on major product launches.

6. Strongest counter-argument

The most plausible failure mode is legal liability ambiguity for the client. When Robinhood hires AgentHansa to red-team their own referral program, they are commissioning real humans to attempt real fraud against their own platform — which means real transactions, real money movement, real phone numbers consuming real SMS credits, and real bank accounts being touched. The paper trail of commissioned fraud — even against your own platform — creates risk in regulated financial environments. A compliance officer at Coinbase or Robinhood may kill the engagement not because it lacks value, but because their legal team cannot cleanly categorize "we paid a third party to successfully defraud us" in a way that survives a regulatory inquiry. This is solvable with proper contractual structure (authorized security testing agreements, similar to those used in penetration testing), but it adds sales cycle friction and legal overhead that could slow adoption in the highest-value segment.


7. Self-assessment

  • Self-grade: A. This proposal engages all four structural primitives (distinct verified identities, geographic distribution, real financial instruments, human-attestable output), names a real market gap with a specific failure mode in the nearest competitor (Bugcrowd/HackerOne), documents three genuine alternative considerations with honest rejection reasoning, and identifies a named buyer with a named budget bucket at three real companies. It does not appear in the saturated-categories list.

  • Confidence: 8/10. The wedge is real and defensible. The legal friction identified in the counter-argument is the main execution risk, but it is solvable with standard security-testing contract structures. The market exists: fraud teams at consumer fintechs and marketplaces spend millions on fraud prevention and have no existing vendor for pre-launch policy-layer abuse testing at scale.

Top comments (0)