Forem

Cover image for How to Reconcile Salesforce Leads Against Contacts at Scale
Siyana Hristova
Siyana Hristova

Posted on • Originally published at similarity-api.com

How to Reconcile Salesforce Leads Against Contacts at Scale

Duplicate identity records are almost inevitable in modern Salesforce environments.

Leads enter the CRM from web forms, enrichment tools, outbound prospecting platforms, partner integrations, event uploads, product sign-ups, and manual entry. Even in well-governed systems, slight variations in names, emails, company formatting, and job titles accumulate over time.

At scale, teams eventually need to answer practical operational questions:

  • Which of our newly imported leads already exist as contacts?
  • Who should own this inbound lead if the account already exists?
  • How do we clean identity data before migrations or reporting resets?

This is where lead-to-contact reconciliation workflows emerge.


Why teams run lead-to-contact reconciliation

This workflow is typically driven by operational needs:

  • Reporting accuracy — duplicate identities fragment attribution and pipeline analytics
  • Routing correctness — inbound leads often need to inherit ownership from existing accounts
  • Import risk reduction — bulk uploads can create thousands of duplicates without pre-checks
  • Automation enablement — surfacing candidate matches enables auto-assignment and conversion rules

Over time, reconciliation becomes a recurring RevOps capability rather than a one-off cleanup exercise.


What reconciliation workflows look like in practice

Pre-import identity checks

  • Export existing contacts
  • Compare new leads against the contact base
  • Review high-confidence matches
  • Merge or update records before import

Scheduled identity cleanup jobs

  • Compare recently created leads to contacts
  • Write similarity scores or match IDs to custom fields
  • Create review queues for RevOps teams

Automation-driven identity resolution

  • Apex triggers call external reconciliation endpoints before lead insert
  • Salesforce Flows surface candidate matches for SDR review
  • Nightly jobs reassign leads to existing account owners

At this stage, similarity matching becomes part of operational CRM infrastructure.


Exact vs similarity matching in CRM reconciliation

Traditional deduplication relies on exact matching — typically strict email equality or rule-based logic.

Exact matching works well when identity signals are clean and standardized.

In real go-to-market environments, identity data drifts:

  • People use multiple email addresses
  • Company names appear in different formats
  • Titles and suffixes vary
  • Records are created across disconnected systems

Similarity-based matching addresses this ambiguity by asking:

Are these records likely to represent the same real-world person?

Exact matching remains a useful first filter.
Similarity matching expands coverage to edge cases that strict rules cannot resolve at scale.


How reconciliation pipelines typically work

Conceptually, identity matching pipelines involve:

  1. Pre-processing — normalize casing, punctuation, token order, and company suffixes
  2. Similarity scoring — compare identity strings
  3. Filtering — retain matches above a defined confidence threshold

This approach works on small datasets.
It becomes harder when:

  • CRM datasets reach hundreds of thousands of records
  • Identity drift occurs continuously through imports and enrichment
  • Reconciliation must run automatically or on a frequent schedule

At that point, teams often move from ad-hoc scripts toward scalable matching infrastructure.


Replacing the pipeline with a single reconciliation call

Instead of designing and maintaining a full matching pipeline, teams can use a reconciliation API.

Example request:

payload = {
    "data_a": lead_match_strings,
    "data_b": contact_match_strings,
    "config": {
        "similarity_threshold": 0.82,
        "top_n": 3,
        "to_lowercase": True,
        "remove_punctuation": True,
        "use_token_sort": True,
        "output_format": "flat_table"
    }
}

res = requests.post(
    "https://api.similarity-api.com/reconcile",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json=payload
).json()
Enter fullscreen mode Exit fullscreen mode

A key design decision is defining the identity string — commonly a combination of:

  • First name
  • Last name
  • Email
  • Company / account name
  • Job title

Example reconciliation output

When using a flat table output format, matches are returned at row level:

lead_index lead_identity contact_index contact_identity score matched
0 Jane Doe jane@acme.com Acme Inc 1542 Jane Doe
0 Jane Doe jane@acme.com Acme Inc 9811 Janet Doe
1 Mark Lee mark@north.io North IO 2207 Marc Lee

These candidate matches can then power:

  • Lead conversion workflows
  • Ownership reassignment
  • Deduplication review queues
  • Automated CRM hygiene jobs

Final thoughts

Lead-to-contact reconciliation is not just a data cleanup task.
In high-volume Salesforce environments, it becomes a foundational operational capability.

Teams that implement scalable identity matching gain:

  • More reliable pipeline attribution
  • Cleaner account ownership signals
  • Safer bulk imports
  • Stronger automation across RevOps workflows

As CRM datasets grow, reconciliation workflows evolve from manual checks into continuous identity infrastructure.

Try a 100k rows leads dedupe for free at https://similarity-api.com/try-it

Top comments (0)