Fraud Detection and Recommendation Are the Same Pipeline. Most Teams Build Two.

#architecture #dataengineering #machinelearning #systemdesign

Fraud asks: which signal means stop?
Recommendation asks: which signal means go?

The behavioral signal layer underneath both is identical.
Most companies build it twice — and pay for it twice.

What both systems actually need

Strip both systems to their fundamentals:

Fraud detection pipeline:

Collect behavioral events (clicks, sessions, device, velocity, IP)
Enrich with context (account history, graph relationships, geolocation)
Build features (velocity signals, anomaly indicators, identity consistency)
Score: is this transaction fraudulent?

Recommendation engine pipeline:

Collect behavioral events (clicks, sessions, device, purchases, dwell time)
Enrich with context (purchase history, category affinity, recency)
Build features (preference signals, recency decay, cross-category behavior)
Score: which item should this user see next?

Steps 1–3 are the same work. The event pipeline, enrichment layer, and feature store are architecturally identical. Only the scoring layer diverges.

The unified architecture

              ┌─────────────────────────────────────┐
              │         BEHAVIORAL EVENTS            │
              │  clicks · sessions · device ·        │
              │  velocity · purchases · affinity      │
              └──────────────┬──────────────────────┘
                             │
              ┌──────────────▼──────────────────────┐
              │           EVENT PIPELINE             │
              │         (Kafka / Kinesis)             │
              └──────────────┬──────────────────────┘
                             │
              ┌──────────────▼──────────────────────┐
              │            FEATURE STORE             │
              │  velocity | identity | recency |      │
              │  affinity | anomaly | preference      │
              └──────────┬───────────────┬───────────┘
                         │               │
           ┌─────────────▼──┐       ┌───▼─────────────┐
           │  FRAUD MODEL   │       │   REC MODEL      │
           │  "STOP signal" │       │   "GO signal"    │
           │  risk_score    │       │   ranked_items   │
           └────────────────┘       └─────────────────┘

The split happens at the model layer. Everything upstream is shared infrastructure.

What happens when teams build separately

When fraud and recommendation run separate pipelines:

Two event schemas — identity resolution fails at the join. The device key that fraud uses to detect account takeover is not the same key the recommendation team uses for personalization.
Two feature stores — velocity signals that flag fraud never reach the recommendation team. High-velocity behavior is an anomaly signal for fraud and a high-intent signal for growth. Two teams, same signal, different conclusions, zero shared infrastructure.
Two identity graphs — the clean device graph built by the fraud team is unused by recommendation. The fragmented profiles that hurt recommendation accuracy are the same fragmentation the fraud team already solved.
Two data teams — double the cost, double the latency, double the maintenance.

The practical payoff of a unified layer

Fraud signals improve recommendation quality.

A user exhibiting high-velocity behavior across multiple unresolved device IDs is either a bot or a high-intent buyer in the final decision window. The recommendation engine should know which — and it can, if it reads from the same feature store.

Recommendation signals improve fraud detection.

A user whose purchase pattern suddenly diverges from their personalized ranking history is a behavioral anomaly. That divergence signal is already computed by the recommendation feature pipeline. The fraud model should consume it.

Identity resolution fixed once, applied everywhere.

Session stitching and device graph resolution built for fraud is the same graph the recommendation engine needs to resolve cross-device journeys. Build it once.

Build order if you're starting from scratch

# 1. Single event schema — shared between fraud and growth
event = {
    "user_id": "...",       # resolved identity key
    "device_id": "...",     # raw device
    "session_id": "...",
    "event_type": "...",    # click | purchase | login | flag
    "timestamp": "...",
    "context": {}           # category, value, geolocation, etc.
}

# 2. Single feature store — consumed by both models
features = {
    "velocity_1h": ...,     # fraud: anomaly | rec: intent signal
    "velocity_24h": ...,
    "identity_score": ...,  # fraud: consistency | rec: coverage
    "recency": ...,         # fraud: account age | rec: freshness
    "affinity_vector": ..., # fraud: behavioral baseline | rec: preference
}

# 3. Split at the model layer only
fraud_score = fraud_model.predict(features)
ranked_items = rec_model.rank(features, catalog)

The principle

Fraud and recommendation are not two different data problems. They are two different decisions made from the same behavioral signal.

Stop building the pipeline twice.

Have you seen teams successfully share infrastructure between fraud and recommendation, or is it always siloed? Curious what the actual blockers look like — organizational or technical.

vf-insights.com