Nikita Stoletov

Posted on May 13

What's the Best Way to Enrich Lead Data in 2026? A Technical Look at Waterfall Enrichment

#api #b2b #sales #dataengineering

What's the Best Way to Enrich Lead Data in 2026? A Technical Look at Waterfall Enrichment

If you have ever pulled a list from Apollo, ZoomInfo, or Lusha, loaded it into your sequencer, and watched 20% to 30% of it hard-bounce in the first hour, this post is for you.

The problem is architectural: you are querying a single-source database for B2B contact data in 2026, and B2B contact data decays at roughly 28% per year. By the time your "verified" record reaches your outbox, the person has changed jobs, the domain has stopped accepting that local-part, or the catch-all server has flipped its policy.

The fix is a different architecture entirely. It is called waterfall enrichment, and it is what every serious data engineering team in B2B is moving to.

What waterfall enrichment actually is

A waterfall is a cascade of data providers queried in sequence, with stop-on-hit logic. You send a single request (say, first_name + last_name + domain) and the system routes it through provider 1. If provider 1 returns a verified result above the confidence threshold, the cascade halts and you get the answer. If provider 1 misses, the system routes to provider 2. And so on, through tiers of providers ranked by cost, speed, and accuracy.

Pseudocode:

def waterfall_enrich(contact, providers, confidence_threshold=0.95):
    for provider in providers:  # sorted by tier: T1 → T2 → T3
        result = provider.lookup(contact)
        if result.found and result.confidence >= confidence_threshold:
            return {
                "data": result,
                "provider_used": provider.name,
                "cost": provider.cost_per_hit
            }
    return {"data": None, "provider_used": None, "cost": 0}

That is the entire model. The interesting engineering is everything around it: tier ordering, confidence scoring, parallel vs sequential calls, caching, fallback policies on rate limits, refund logic on misses.

Why single-source enrichment fails in 2026

To understand why waterfall wins, look at what users actually report when they run single-source tools in production.

Apollo

Apollo markets 91% email accuracy and a 275M+ contact database. Real-world bounce rates tell a different story.

One detailed test on r/coldemail with 500 to 1000 leads, verified externally through NeverBounce, showed bounce rates of 32% to 38%.
A 2026 review across 10,000+ users found that teams commonly see 15% to 25% bounce rates on exported lists.
Applying Apollo's own "Verified Emails" filter drops the database from 275M contacts to 96M, meaning 65% of their contacts have unverified emails by their own classification.
US contact accuracy runs around 88%. International accuracy drops to roughly 60%.

The structural issue: Apollo relies on its own database for enrichment. There is no cascade. If their database does not have a verified record, you get a "verified" record that was last validated months ago and now bounces.

ZoomInfo

ZoomInfo advertises 95% accuracy. G2's own aggregate contact data accuracy score for ZoomInfo is 7.7 out of 10, with users reporting practical accuracy ranging from 55% to 85% depending on segment.

The top five complaints across nearly 9000 G2 reviews of ZoomInfo are all variations of the same problem: data goes stale. "Outdated Contacts" appears in 215 separate reviews. "Inaccurate Data" appears in 232. One Reddit user reported a 50.7% bounce rate on their first ZoomInfo-sourced campaign.

The architectural cause is the refresh cadence. ZoomInfo updates records on a continuous-but-uneven schedule: high-profile enterprise contacts get updated frequently, mid-market and SMB records can go months without a refresh. You burn a credit, pull a contact, find the person left three months ago, and ZoomInfo does not refund credits for outdated records.

Lusha

Lusha claims 81% accuracy. That number is reasonable for their core geography (North America and UK). The problem is everywhere else.

Users prospecting in continental Europe, APAC, or emerging markets report lower data availability and accuracy.
One Capterra reviewer documented finding a phone number that had been wrong for 15 years: the prospect had left the company a decade and a half ago and the number had been reassigned to someone else.
The credit system is the most cited frustration. Phone reveals cost 5 credits each, monthly allocations exhaust quickly, and if the data is wrong, credits are still consumed with no automatic refund.

The architectural pattern is the same as Apollo and ZoomInfo: one database, one source of truth, one accuracy ceiling. When that source misses or stales, the user pays anyway.

The math behind why waterfall wins

Single-source enrichment caps out at the accuracy of its single source. The math is simple:

P(found) = P(provider_has_record) × P(record_is_current)

For a single source with 60% coverage and 70% freshness:

P(found) = 0.60 × 0.70 = 0.42

42% of your list lands. The rest bounces or returns nothing useful.

Now run the same request through a waterfall of 5 independent providers (assuming reasonable independence, which holds for providers with different data acquisition methods):

P(at_least_one_finds) = 1 - ∏(1 - P_i)
                     = 1 - (1 - 0.42)^5
                     = 1 - 0.066
                     = 93.4%

Stack 10+ providers across tiers and you cross 99%. The reason no single vendor can match this is that no single vendor has access to every data acquisition method (contributor networks, SMTP probing, LinkedIn scraping, telco partnerships, ISP-level signals).

How smart cascade saves you time and money

The two objections to waterfall are obvious. Querying 10 providers per contact sounds expensive, and querying them in sequence sounds slow. Both are wrong if the cascade is implemented correctly.

Cost: stop-on-hit kills the wasted spend

A naive implementation would query all 20 providers in parallel and charge you for all 20. Smart waterfall stops at the first hit above the confidence threshold. In practice, around 60% to 70% of requests resolve at Tier 1 (the cheapest, fastest providers).

Only 20% to 25% need Tier 2 providers. Less than 10% reach Tier 3 fallback providers. The blended cost per hit ends up lower than any single premium vendor because you only pay for the cheap providers most of the time and only escalate to expensive providers when you have to.

There is one more layer most teams miss. If every provider in the cascade misses, the platform absorbs the upstream costs and refunds the credit. This is what zero-waste billing means. Compare this to Lusha and ZoomInfo, where outdated records still consume credits.

Speed: tier ordering exploits provider performance asymmetry

Different providers have different latency. Tier 1 providers are chosen partly because they respond in 200ms to 500ms. The waterfall hits them first not just for cost but for speed. Most requests resolve in under 1 second because they never leave Tier 1.

For requests that need to traverse the full cascade, the average response time across 20 providers ends up under 2 seconds because each tier failure is fast (a provider that does not have the record returns a 404 in under 300ms). The slow path is rare and bounded.

For bulk operations, you parallelize across contacts (not across providers per contact). A bulk enrich endpoint hitting 25 profiles per second is what you get when you horizontally scale the cascade across worker pools.

A concrete API example

Here is what calling a waterfall enrichment API looks like in practice. This is the LeadSonar API, but the shape is similar across any waterfall platform.

curl -X POST https://api.leadsonar.com/api/v1/enrich \
  -H "X-API-Key: ls_live_sk_..." \
  -H "Content-Type: application/json" \
  -d '{
    "contacts": [
      {
        "first_name": "Jane",
        "last_name": "Doe",
        "domain": "acme.com",
        "linkedin_url": "https://linkedin.com/in/janedoe"
      }
    ],
    "fields": ["email", "phone"]
  }'

Response:

{
  "id": "enr_01HXYZ...",
  "status": "completed",
  "progress": {
    "total": 1,
    "completed": 1,
    "emails_found": 1,
    "phones_found": 1
  },
  "contacts": [
    {
      "email": "jane.doe@acme.com",
      "email_confidence": 0.998,
      "phone": "+14155550123",
      "phone_type": "mobile",
      "providers_used": ["LeadMagic", "ContactOut"]
    }
  ],
  "meta": {
    "total_cost_usd": 0.025,
    "duration_ms": 1240
  }
}

Note providers_used. The cascade resolved at Tier 1 with two providers (one for email, one for phone). Cost was $0.025 for both data points. Total wall time was 1.24 seconds. If the cascade had needed to escalate, you would see Tier 2 or Tier 3 providers in that array and a higher (but still bounded) cost.

For developers, the OpenAPI 3.0 spec is at https://api.leadsonar.com/api/v1/openapi.json if you want to generate clients.

What waterfall solves that single-source cannot

Mapping the user complaints from Apollo, ZoomInfo, and Lusha to architectural fixes:

Pain	Root cause	Waterfall fix
15% to 30% bounce rate on "verified" emails	Single-source verification, stale records	Cross-checking across providers + live SMTP verification at request time
Wrong phone number that was correct 15 years ago	No refresh signal, no refund	Multiple providers vote on phone validity, refund on miss
US accuracy 88%, international accuracy 60%	One database with one geographic bias	Different providers excel in different regions, cascade picks the right one per request
Credits consumed for outdated records	Pay-per-attempt billing	Pay-per-verified-hit billing
Mid-market and SMB records months out of date	Uneven refresh cadence	Real-time verification at request time, not at index time
Rate limits choke automation	Single backend throttling	Parallelize across worker pools, route around per-provider limits

The pattern is consistent: single-source architectures hit ceilings that waterfall architectures route around.

Try it on your own list

If you want to see the difference on your actual data instead of taking my word for it, LeadSonar runs the waterfall across 20+ providers with stop-on-hit logic and refund-on-miss billing. The free trial gives you 1000 leads for 7 days with no credit card.

Pull a list you have already enriched somewhere else, run it through the cascade, and compare the bounce rate. The math holds up in production.

Start at leadsonar.io.

Further reading

Apollo data accuracy analysis with G2 review data: https://prospeo.io/s/apolloio-reviews
ZoomInfo accuracy and bounce rate breakdown: https://prospeo.io/s/is-zoominfo-accurate
Lusha review with credit system analysis: https://syncgtm.com/blog/lusha-review
LeadSonar API documentation: https://app.leadsonar.io/docs.html `

DEV Community

What's the Best Way to Enrich Lead Data in 2026? A Technical Look at Waterfall Enrichment

What's the Best Way to Enrich Lead Data in 2026? A Technical Look at Waterfall Enrichment

What waterfall enrichment actually is

Why single-source enrichment fails in 2026

Apollo

ZoomInfo

Lusha

The math behind why waterfall wins

How smart cascade saves you time and money

Cost: stop-on-hit kills the wasted spend

Speed: tier ordering exploits provider performance asymmetry

A concrete API example

What waterfall solves that single-source cannot

Try it on your own list

Top comments (0)