MaybeAI

Posted on Oct 27

From “I want automation” to “It runs”: 15 decisions for lead enrichment that actually execute

#automation #sales #dataengineering #ai

TL;DR

Before you choose tools, write the spec. Convert a fuzzy goal into 15 concrete decisions: triggers, source order, field definitions, mappings, idempotency, routing, retries, error semantics, audit and SLA. A clear spec decides success more than the platform you pick.

The real problem

Teams say, “We want to automate sales lead enrichment.”

Execution needs more than a wish. You must break it into triggers, data sources, field standards, mappings, routing, retry and backoff, error semantics, compliance, audit, and replay. Without an explicit spec, no platform can give you stable results.

Common pitfalls

Comparing tools on speed or price while skipping requirement definition.
Treating “company size” or “revenue” as plain English rather than computable standards.
Ignoring edge cases like private pages, rate limits, or missing fields.
No audit or replay, so failures cannot be reproduced.

A spec you can execute

Use this as an internal template. It fits any stack.

Trigger

Fire when a new Salesforce Lead is created and Company Size is empty.

Sources and order

LinkedIn: read employee count. If private or not accessible, record LI_PRIVATE.
Company website (About page): extract headcount phrases.
Crunchbase: financing stage and investors.
BuiltWith API: detect tech stack.

Fields and mapping

Company Size: normalize to ranges
"50–200 employees" → 51–200
Funding Stage: text to enum
"Series B" → Growth Stage
Tech Stack: de-duplicate and write as comma-separated.

Write rules (idempotent UPSERT)

Use LeadId + Domain as the idempotency key.
If an existing result from the same batch has a higher score, overwrite. Otherwise keep the current value.

Scoring and routing

Score each source on availability, freshness, and accuracy.
If the main path fails, shift to the fallback path and record the routing trace.

Retry and backoff

Retry up to 3 times with exponential backoff: 1s → 3s → 9s.
Do not retry on 4xx. Retry on 5xx and timeouts.

Error semantics

Use unified, countable codes:

LI_PRIVATE PAGE_NOT_FOUND RATE_LIMITED FIELD_AMBIGUOUS

Errors must be observable and easy to aggregate.

Fallback to human

If all paths fail, open a ticket with full context: request params, error codes, routing trace, and a screenshot or HTML snippet.

SLA and alerting

P95 latency ≤ 6s.
If error rate > 3% for 5 minutes, alert.
Weekly report for missing-core-field rate versus last week.

Audit and replay

Emit a unique TraceId per run.
Store inputs, parameters, source response summaries, and final writes to support one-click replay.

One-screen checklist (15 decisions)

Triggers and conditions
Primary source order
Fallback sources
Field standards
Text-to-enum mappings
Idempotency key and UPSERT rules
Scoring dimensions and thresholds
Routing strategy
Rate limits and concurrency
Retry and backoff
Error code semantics
Human fallback and ticket template
Security and compliance (privacy, auth)
Audit and replay
SLA, monitoring, alerting

Why specs decide wins

Tools influence execution speed; specs control correctness and repeatability.
A precise spec lets any platform reproduce the same result.
Turn the spec into a template. New scenarios reuse 80% and change 20%.

Method: Intent → Plan → Execute → Solidify

Intent modeling

Break “lead enrichment” into observable fields and sub-goals.
Define the single source of truth and acceptance criteria per field.

Planning and orchestration

Make priorities and branch conditions explicit.
Output a human-readable plan, not only code.

Execution with safe fallback

Call tools through standard interfaces and keep context.
Failed paths move into a human queue for rule tuning.

Solidify and evolve

Each successful run becomes a parameterized workflow template.
Replays and A/B tests refine mappings, routing, and thresholds.

Engineering notes

Idempotency: define keys to avoid side effects on re-runs.
Canary and rollback: ship new rules at low traffic first.
TraceId end-to-end: correlate logs across steps.
Least privilege: only the permissions you need, with access audit.
Data contracts: field names, types, ranges, and version strategy.

Anti-patterns

Dragging boxes in a UI to compensate for an undefined spec.
Leaving fuzzy fields to “AI will decide” without guardrails.
A single “failed” bucket with no error semantics.
No replay, so you cannot locate or reproduce issues.

FAQ

Q1. We already use a platform. Do we still need this?
Yes. Specs are platform-agnostic. They let you get the same result on any stack.

Q2. Will detailed definitions slow delivery?
Clear definitions cut rework. Velocity improves over the project’s lifetime.

Q3. What if data sources change often?
Make routing and scoring config-driven. Policy updates handle change.

Q4. How do we judge if automation is worth it?
Use a two-week payback rule of thumb:
people time × frequency × error cost. If it clears that bar, automate first.

Closing

Automation becomes easy once the spec exists. Turn “we want lead enrichment” into 15 decisions, write them in one page, and the rest becomes execution. Turn that page into a template, and your team compounds speed and stability week after week.

DEV Community