What's the Best Way to Enrich Lead Data in 2026? A Technical Look at Waterfall Enrichment
If you have ever pulled a list from Apollo, ZoomInfo, or Lusha, loaded it into your sequencer, and watched 20% to 30% of it hard-bounce in the first hour, this post is for you.
The problem is architectural: you are querying a single-source database for B2B contact data in 2026, and B2B contact data decays at roughly 28% per year. By the time your "verified" record reaches your outbox, the person has changed jobs, the domain has stopped accepting that local-part, or the catch-all server has flipped its policy.
The fix is a different architecture entirely. It is called waterfall enrichment, and it is what every serious data engineering team in B2B is moving to.
What waterfall enrichment actually is
A waterfall is a cascade of data providers queried in sequence, with stop-on-hit logic. You send a single request (say, first_name + last_name + domain) and the system routes it through provider 1. If provider 1 returns a verified result above the confidence threshold, the cascade halts and you get the answer. If provider 1 misses, the system routes to provider 2. And so on, through tiers of providers ranked by cost, speed, and accuracy.
Pseudocode:
def waterfall_enrich(contact, providers, confidence_threshold=0.95):
for provider in providers: # sorted by tier: T1 → T2 → T3
result = provider.lookup(contact)
if result.found and result.confidence >= confidence_threshold:
return {
"data": result,
"provider_used": provider.name,
"cost": provider.cost_per_hit
}
return {"data": None, "provider_used": None, "cost": 0}
That is the entire model. The interesting engineering is everything around it: tier ordering, confidence scoring, parallel vs sequential calls, caching, fallback policies on rate limits, refund logic on misses.
Why single-source enrichment fails in 2026
To understand why waterfall wins, look at what users actually report when they run single-source tools in production.
Apollo
Apollo markets 91% email accuracy and a 275M+ contact database. Real-world bounce rates tell a different story.
- One detailed test on r/coldemail with 500 to 1000 leads, verified externally through NeverBounce, showed bounce rates of 32% to 38%.
- A 2026 review across 10,000+ users found that teams commonly see 15% to 25% bounce rates on exported lists.
- Applying Apollo's own "Verified Emails" filter drops the database from 275M contacts to 96M, meaning 65% of their contacts have unverified emails by their own classification.
- US contact accuracy runs around 88%. International accuracy drops to roughly 60%.
The structural issue: Apollo relies on its own database for enrichment. There is no cascade. If their database does not have a verified record, you get a "verified" record that was last validated months ago and now bounces.
ZoomInfo
ZoomInfo advertises 95% accuracy. G2's own aggregate contact data accuracy score for ZoomInfo is 7.7 out of 10, with users reporting practical accuracy ranging from 55% to 85% depending on segment.
The top five complaints across nearly 9000 G2 reviews of ZoomInfo are all variations of the same problem: data goes stale. "Outdated Contacts" appears in 215 separate reviews. "Inaccurate Data" appears in 232. One Reddit user reported a 50.7% bounce rate on their first ZoomInfo-sourced campaign.
The architectural cause is the refresh cadence. ZoomInfo updates records on a continuous-but-uneven schedule: high-profile enterprise contacts get updated frequently, mid-market and SMB records can go months without a refresh. You burn a credit, pull a contact, find the person left three months ago, and ZoomInfo does not refund credits for outdated records.
Lusha
Lusha claims 81% accuracy. That number is reasonable for their core geography (North America and UK). The problem is everywhere else.
- Users prospecting in continental Europe, APAC, or emerging markets report lower data availability and accuracy.
- One Capterra reviewer documented finding a phone number that had been wrong for 15 years: the prospect had left the company a decade and a half ago and the number had been reassigned to someone else.
- The credit system is the most cited frustration. Phone reveals cost 5 credits each, monthly allocations exhaust quickly, and if the data is wrong, credits are still consumed with no automatic refund.
The architectural pattern is the same as Apollo and ZoomInfo: one database, one source of truth, one accuracy ceiling. When that source misses or stales, the user pays anyway.
The math behind why waterfall wins
Single-source enrichment caps out at the accuracy of its single source. The math is simple:
P(found) = P(provider_has_record) × P(record_is_current)
For a single source with 60% coverage and 70% freshness:
P(found) = 0.60 × 0.70 = 0.42
42% of your list lands. The rest bounces or returns nothing useful.
Now run the same request through a waterfall of 5 independent providers (assuming reasonable independence, which holds for providers with different data acquisition methods):
P(at_least_one_finds) = 1 - ∏(1 - P_i)
= 1 - (1 - 0.42)^5
= 1 - 0.066
= 93.4%
Stack 10+ providers across tiers and you cross 99%. The reason no single vendor can match this is that no single vendor has access to every data acquisition method (contributor networks, SMTP probing, LinkedIn scraping, telco partnerships, ISP-level signals).
How smart cascade saves you time and money
The two objections to waterfall are obvious. Querying 10 providers per contact sounds expensive, and querying them in sequence sounds slow. Both are wrong if the cascade is implemented correctly.
Cost: stop-on-hit kills the wasted spend
A naive implementation would query all 20 providers in parallel and charge you for all 20. Smart waterfall stops at the first hit above the confidence threshold. In practice, around 60% to 70% of requests resolve at Tier 1 (the cheapest, fastest providers).
Only 20% to 25% need Tier 2 providers. Less than 10% reach Tier 3 fallback providers. The blended cost per hit ends up lower than any single premium vendor because you only pay for the cheap providers most of the time and only escalate to expensive providers when you have to.
There is one more layer most teams miss. If every provider in the cascade misses, the platform absorbs the upstream costs and refunds the credit. This is what zero-waste billing means. Compare this to Lusha and ZoomInfo, where outdated records still consume credits.
Speed: tier ordering exploits provider performance asymmetry
Different providers have different latency. Tier 1 providers are chosen partly because they respond in 200ms to 500ms. The waterfall hits them first not just for cost but for speed. Most requests resolve in under 1 second because they never leave Tier 1.
For requests that need to traverse the full cascade, the average response time across 20 providers ends up under 2 seconds because each tier failure is fast (a provider that does not have the record returns a 404 in under 300ms). The slow path is rare and bounded.
For bulk operations, you parallelize across contacts (not across providers per contact). A bulk enrich endpoint hitting 25 profiles per second is what you get when you horizontally scale the cascade across worker pools.
A concrete API example
Here is what calling a waterfall enrichment API looks like in practice. This is the LeadSonar API, but the shape is similar across any waterfall platform.
curl -X POST https://api.leadsonar.com/api/v1/enrich \
-H "X-API-Key: ls_live_sk_..." \
-H "Content-Type: application/json" \
-d '{
"contacts": [
{
"first_name": "Jane",
"last_name": "Doe",
"domain": "acme.com",
"linkedin_url": "https://linkedin.com/in/janedoe"
}
],
"fields": ["email", "phone"]
}'
Response:
{
"id": "enr_01HXYZ...",
"status": "completed",
"progress": {
"total": 1,
"completed": 1,
"emails_found": 1,
"phones_found": 1
},
"contacts": [
{
"email": "jane.doe@acme.com",
"email_confidence": 0.998,
"phone": "+14155550123",
"phone_type": "mobile",
"providers_used": ["LeadMagic", "ContactOut"]
}
],
"meta": {
"total_cost_usd": 0.025,
"duration_ms": 1240
}
}
Note providers_used. The cascade resolved at Tier 1 with two providers (one for email, one for phone). Cost was $0.025 for both data points. Total wall time was 1.24 seconds. If the cascade had needed to escalate, you would see Tier 2 or Tier 3 providers in that array and a higher (but still bounded) cost.
For developers, the OpenAPI 3.0 spec is at https://api.leadsonar.com/api/v1/openapi.json if you want to generate clients.
What waterfall solves that single-source cannot
Mapping the user complaints from Apollo, ZoomInfo, and Lusha to architectural fixes:
| Pain | Root cause | Waterfall fix |
|---|---|---|
| 15% to 30% bounce rate on "verified" emails | Single-source verification, stale records | Cross-checking across providers + live SMTP verification at request time |
| Wrong phone number that was correct 15 years ago | No refresh signal, no refund | Multiple providers vote on phone validity, refund on miss |
| US accuracy 88%, international accuracy 60% | One database with one geographic bias | Different providers excel in different regions, cascade picks the right one per request |
| Credits consumed for outdated records | Pay-per-attempt billing | Pay-per-verified-hit billing |
| Mid-market and SMB records months out of date | Uneven refresh cadence | Real-time verification at request time, not at index time |
| Rate limits choke automation | Single backend throttling | Parallelize across worker pools, route around per-provider limits |
The pattern is consistent: single-source architectures hit ceilings that waterfall architectures route around.
Try it on your own list
If you want to see the difference on your actual data instead of taking my word for it, LeadSonar runs the waterfall across 20+ providers with stop-on-hit logic and refund-on-miss billing. The free trial gives you 1000 leads for 7 days with no credit card.
Pull a list you have already enriched somewhere else, run it through the cascade, and compare the bounce rate. The math holds up in production.
Start at leadsonar.io.
Further reading
- Apollo data accuracy analysis with G2 review data: https://prospeo.io/s/apolloio-reviews
- ZoomInfo accuracy and bounce rate breakdown: https://prospeo.io/s/is-zoominfo-accurate
- Lusha review with credit system analysis: https://syncgtm.com/blog/lusha-review
- LeadSonar API documentation: https://app.leadsonar.io/docs.html `
Top comments (0)