Of your last 100 signups, I'd bet 30–40 came in as @gmail.com. Maybe another 10 as @yahoo.com, @hotmail.com, @icloud.com. Your Segment event fired, your CRM got a new contact record with a name and a city, and then nothing. No company. No title. No way to know if this person runs engineering at a 300-person Series B or is a student experimenting on a weekend.
This is the PLG enrichment problem that nobody writes about. Every piece of B2B enrichment content assumes you start from a company domain — firstname@acme.com — and work backwards to firmographics. That model breaks the moment your product grows virally and people sign up with whatever email they happen to check.
I've spent the last six months wiring enrichment pipelines for PLG SaaS companies ranging from 2,000 to 80,000 monthly free users. Here's what actually works.
Why standard enrichment breaks for PLG inbound
The typical enrichment stack — Clearbit, Apollo, PDL in waterfall — is built around the assumption that your email domain is your company proxy. When that assumption holds, you get 80–90% match rates and clean firmographic data flowing into your CRM within seconds.
Personal email domains destroy this. Clearbit's Enrichment API returns a null company when it hits gmail.com. Apollo routes personal domains straight to a consumer bucket and skips B2B fields entirely. Even PDL's /person/enrich endpoint — the most permissive of the major providers — gives you around 32% hit rate on Gmail addresses versus 74% on corporate domains. I measured this across 6,200 signups for a developer tooling company last quarter.
The enrichment vendors aren't wrong to do this. Their products are optimized for outbound SDR workflows. They designed around sales reps who start from a target account list, not inbound products where users self-select with whatever email they happen to use.
The three signals that define a PQL when you have no company email
Before throwing API calls at the problem, get clear on what you're actually scoring. A PQL in a PLG context is the intersection of three things:
Identity fit — Is this person likely to be a B2B buyer in your ICP? Title, seniority, company size, industry.
Engagement depth — Have they hit your activation threshold? Not just "logged in," but reached the moment your product delivered value — uploaded a file, ran a query, invited a teammate, connected an integration.
Expansion signal — Are they bumping against limits? Viewing upgrade prompts, hitting API rate limits, inviting more users than their plan allows.
When someone signs up with a personal email, you still have their name, behavioral data from Mixpanel or Amplitude, and whatever they self-reported in onboarding. That's your starting point before enrichment APIs enter the picture.
Which APIs actually resolve personal emails to company identity
There are three approaches that work, each with real tradeoffs.
Approach 1: Graph-based reverse lookup
Datagma is the tool I keep coming back to for this. Their /enrich endpoint accepts a personal email, cross-references it against social graph data — primarily LinkedIn activity, public profiles, and email correlation patterns — and returns a company match with title, seniority, and LinkedIn URL. In my testing across 500 Gmail signups from a fintech tool, Datagma resolved 41% to a confident company match, nearly double what PDL returned on the same set.
PDL still earns a place in the stack because its depth on resolved matches is better — when PDL knows the answer, the data is richer. I run Datagma first, then fall through to PDL for the misses.
Approach 2: Name + self-reported data triangulation
If your onboarding flow asks for a job title and company name (even optionally), you can join that self-reported data against enrichment APIs to validate and expand it. Clay is designed for exactly this — you set up a table that takes {name, company_name} and waterfalls across 10+ sources to build the firmographic profile. The limitation: you're depending on users self-reporting accurately, which happens less than you'd expect for optional fields.
Approach 3: LinkedIn URL matching
If you can get a LinkedIn profile URL — through OAuth login, a "connect LinkedIn" optional step in onboarding, or by prompting users to enter it — you bypass the personal email problem entirely. RocketReach and PDL both accept LinkedIn URLs directly and return full firmographic profiles at 85%+ match rates.
Some teams use Phantombuster to automate LinkedIn outreach to unresolved users. I don't recommend this — it violates LinkedIn's ToS and creates legal exposure faster than it creates pipeline.
Comparing enrichment APIs on personal email resolution
| Provider | Gmail hit rate | Fields returned | Avg latency | Price per call |
|---|---|---|---|---|
| Datagma | ~41% | Name, title, company, LinkedIn, seniority | 800ms | ~$0.04 |
| PDL | ~32% | 100+ fields on match | 400ms | $0.05–$0.10 |
| Clearbit | ~8% | Deep firmographic on match | 200ms | $0.08 |
| Clay waterfall | ~55%* | Composite, varies by source | 3–8s | $0.15–$0.40 |
| RocketReach | ~18% | Name, title, company, email | 600ms | $0.06 |
*Clay's higher rate comes from combining multiple providers — you're paying for 2–3 API calls per resolution.
Hit rates measured across 500+ Gmail signups per provider. Numbers will vary based on your user base demographics.
Building the scoring formula
Once enrichment comes back — or a best-effort partial match — you need a score. Here's the model I use for most PLG B2B tools:
Identity score (0–40 points)
- Company size 50–500 employees: +20
- Company size 500+: +15 (larger isn't always better — enterprise sales cycles can kill PLG deals)
- Title matches your buyer persona (e.g., "engineer", "developer" for dev tools; "head of", "director", "VP" for business software): +10
- Industry match against ICP list: +10
- Enrichment confidence below 60%: −10
Engagement score (0–40 points)
- Activation event fired: +25
- Invited ≥1 teammate: +10
- Used product on 3+ distinct days in week 1: +5
Expansion signal (0–20 points)
- Hit a hard limit at least once: +10
- Viewed pricing page: +5
- Started upgrade flow (even if abandoned): +5
Total ≥ 60: route to sales. Total 35–59: high-touch nurture. Total < 35: self-serve nurture only.
Calibrate these weights against your own historical conversion data after the first 90 days of running the pipeline. The weights above are starting points, not gospel.
The webhook trigger that hands off to sales
The cleanest implementation: Segment as the event bus, a serverless function (Vercel or AWS Lambda) doing enrichment and scoring, then pushing a qualified lead into HubSpot or Salesforce with the score attached.
The trigger event is whatever you define as your activation moment — not "signed up," but the first event that proves the user got value. For a data tool, it might be query_executed. For a collaboration product, first_team_invite. For a file tool, first_export.
Segment event → Lambda
→ Datagma enrich (personal email)
→ PDL fallback if no match
→ Score calculation
→ If score ≥ 60: POST to CRM, set owner = SDR queue
→ If score 35–59: trigger nurture sequence
The latency on this full chain runs 1.5–3 seconds end-to-end, fast enough to fire before the user finishes their onboarding flow. Some teams surface a "your account manager will reach out" message in-app immediately after the score threshold is hit — which works well when the user's in the product and receptive.
Clay can replace the Lambda and API chain if you'd rather avoid custom code. You set up a Clay table as the enrichment layer, trigger it from Segment via webhook, and it handles the waterfall and CRM push without writing a function. The tradeoff: less control over scoring logic and higher cost per enriched contact.
What I actually use
For teams just starting out with PLG enrichment: Datagma as the primary personal email resolver, PDL as fallback, Segment as the event bus, Mixpanel for behavioral event storage (the SQL explorer makes it easy to export activation cohorts for offline scoring analysis without touching production code), and whatever CRM you already have.
Don't use Clearbit as your first call on personal emails — the hit rate doesn't justify the cost at that step, though it's excellent for work email enrichment downstream once you've resolved a user's company identity.
If you're doing more than 5,000 enrichment calls per month, Clay starts to make sense as a no-code orchestration layer, especially if your ops team prefers spreadsheet-style tooling to writing Lambda functions. The per-credit cost is higher, but you save on engineering time.
Skip ZoomInfo for this use case. It's enterprise-contract priced, built for outbound prospecting from company lists, and adds zero value for personal email resolution. Same story with Lusha — excellent for Chrome extension-style lookups starting from a LinkedIn profile, wrong tool for an automated inbound pipeline.
The honest ceiling: even with the best waterfall stack, you'll resolve 45–55% of personal email signups to confident company matches. The other half you're scoring on behavioral data alone. That's not a tooling problem — it's a data reality. Design your scoring model to handle the no-enrichment case gracefully rather than assuming every user will resolve cleanly.
Top comments (0)