DEV Community

Cover image for How to build a configurable scoring engine in n8n (and avoid 6 footguns I hit)
TALVIN LEE
TALVIN LEE

Posted on

How to build a configurable scoring engine in n8n (and avoid 6 footguns I hit)

Same input, same score, every time. Zero LLM cost per execution.

I shipped a scoring engine that handles both alert triage (SecOps) and lead qualification (Sales) from one workflow template. It reads rules from a Google Sheet, classifies into priority tiers, routes to Slack, and writes every decision to an audit log. Two workflows total. About 16 hours of build time over 4 weeks.

Open-source repo: TALVIN29/nexus-scoring-engine

Below: the architecture, the design decisions worth stealing, and the n8n quirks I'd want someone to warn me about before I started.

The core idea

Most ops teams write per-domain scoring code: one ad-hoc Python script for alerts, a different rule set in HubSpot for leads, an Airtable formula for support tickets. The scoring logic is functionally identical — score against rules, classify, route — but the implementation reinvents itself per domain.

So the engine is domain-agnostic. The workflow doesn't know whether it's scoring an alert or a lead. It reads a config_pack parameter, filters all config rows in the Google Sheet by that value, and runs.

To switch domains, the buyer changes one Sheet column value. No JS edits, no workflow changes.

Architecture

Two workflows. One Google Sheet.

Workflow 1 — Core Engine (event-driven)

Webhook → Validate → Load Config → Score → Classify → Route & Notify → Write Item → Audit Log
Enter fullscreen mode Exit fullscreen mode

7 stages. Fires on every incoming webhook event.

Workflow 2 — SLA Monitor (scheduled)

Schedule (5 min) → Read Open Items → Check SLA → Escalate → Update Item → Audit
Enter fullscreen mode Exit fullscreen mode

Cron-style. Catches drift on items that aren't acknowledged within their SLA window.

Sheet schema

Six tabs in a single Google Sheets workbook:

Tab Role
Scoring_Rules Rule definitions: field, operator, value, weight, active flag
Tier_Thresholds Score ranges per tier per pack
Notification_Targets Slack/email per tier per pack
SLA_Config SLA minutes + max escalations per tier
Items Runtime — every scored event appended here
Execution_Log Audit — every state change appended here

Every tab has a config_pack column. The engine filters every read by the pack. Add a new pack (support_tickets, order_review, content_moderation) by adding rows.

Why rule-based, not AI

The temptation to slap an LLM on a scoring problem is real. I considered it. Here's why I didn't:

  1. Determinism: Same input must produce the same score, otherwise the audit trail is useless. LLM scorers drift across runs, model versions, temperature settings.
  2. Cost: Pure arithmetic costs $0 per execution. LLM-based scoring at 100 events/day at $0.01/call = $30/month minimum. For a $49 product, that's the buyer's payback period gone.
  3. Auditability: A rule engine's decisions are inspectable. "Why was this lead scored 80?" → "Rules 101, 103, 105, 108 fired with weights 6+8+12+15." A vector embedding can't give you that.

LLMs are great where data is sparse and patterns are fuzzy. Scoring against known fields and known weights is the opposite of that.

Six footguns I hit (so you don't)

1. SplitInBatches loops don't aggregate downstream nodes the way you think

I built a test data generator that loops through 7 preset payloads, fires each at the engine, then summarizes pass/fail. First version used SplitInBatches with $('FormatNode').all() in the summary code node.

Result: summary saw only the LAST iteration's output. Every loop call overwrites the previous run's data.

Fix: switch to a linear pipeline using HTTP Request's built-in options.batching (batchSize: 1, batchInterval: 30000). n8n natively iterates per item with throttle. Downstream Code node with Run Once for All Items mode sees all results at once.

2. $getWorkflowStaticData('global') mutations don't persist across iterations

My second attempt: accumulate results in $getWorkflowStaticData('global') inside the loop. Static data is supposed to share state across nodes.

Result: total_scenarios = 0. Mutations within a single execution don't reliably persist across loop iterations in current n8n versions.

Fix: same as above — drop the loop, use linear pipeline.

3. Google Sheets free-tier rate limit hits at ~4 sequential webhook calls

Each engine call does 3 Sheets reads (rules, thresholds, targets). At 5-second throttle between webhooks, the 4th request returns 0 rows = engine triggers its SHEETS_UNAVAILABLE guard.

Fix: 30-second throttle for batch testing. For production traffic, plan to cache config in n8n static data or move to Postgres.

4. Hardcoded sheet IDs leak into shipped templates

n8n's Google Sheets node embeds the document ID + cached URL with full gid per tab. If you don't strip these before distribution, every buyer downloads your live workbook ID.

Fix: replace the document ID with a placeholder (REPLACE-WITH-YOUR-GOOGLE-SHEET-ID) and scrub cachedResultUrl fields before shipping. I wrote a 30-line Python script for this.

5. Renaming nodes in raw JSON breaks $('OldName') references in OTHER nodes

I stripped Phase-3 fallback nodes from the Lite version and renamed the kept ones. The n8n editor handles cross-references automatically. Editing JSON outside the editor does NOT.

Fix: grep for \$\('OldName'\) after any rename. Rewrite all occurrences.

6. n8n requires sticky notes inside the workflow for community submission

Marketplace reviewers reject templates without inline documentation. Stickies must explain setup, flow, and customization right inside the editor canvas.

Fix: add 3 stickies per workflow at minimum — Setup (blue), Flow (green), Output (purple). I generate them programmatically because the n8n editor's sticky-creation UX is friction-heavy.

Performance targets I committed to

  • Webhook-to-classification: < 5 seconds (p95)
  • Notification delivery: < 15 seconds after classification
  • Scoring accuracy: 100% deterministic (verified across 3 runs of 5 test scenarios)
  • Error rate: < 1% of executions
  • SLA breach detection: 100% within configured window
  • Audit completeness: 100% of items logged with all required fields

The engine hits all six on the test bench. End-to-end p95 sits around 9 seconds, mostly Sheets read latency. Caching is on the v1.1 roadmap.

Lessons

  • Rule-based scoring isn't dead just because LLMs exist. Determinism + auditability + zero per-call cost is a real moat for any domain where decisions get audited.
  • Domain-agnosticism via a single config column is one of the highest-leverage design patterns I've used. One workflow, three product variants.
  • Production-grade error handling was 40% of build time. Slack outages, Sheets timeouts, cascading parallel-read bugs. The happy path is 60% of the work; the failure modes are the other 40%.
  • Sticky notes mattered more than I expected. They're the difference between "buyer imports and figures it out" and "buyer imports and refunds 2 days later."

Get the code

Repo (free Lite versions + architecture diagram + sample payloads): github.com/TALVIN29/nexus-scoring-engine

Free Lite Alert Triage on n8n: under review (search "Production-Ready Alert Triage")

Free Lite Lead Qualification on n8n: under review (search "Production-Ready Lead Scoring")

Premium bundle ($49 — Core Engine + SLA Monitor + Test Data Generator + both config packs): talvinleegen.gumroad.com/l/nexus-scoring-engine

If you build something with this — or hit a different footgun I missed — I'd love to hear about it. Email's in the GitHub README.

— Talvin

Top comments (0)