Building Sourcing Intel: An AI-Powered Supply Chain Intelligence Platform with On-Device Inference

Kalyan — Wed, 01 Apr 2026 00:47:18 +0000

The Problem That Kept Me Up at Night

If you've worked anywhere near retail procurement, you know the pain. A tariff announcement drops on a Tuesday morning, and suddenly your team is scrambling through spreadsheets, disconnected tariff tables, and three different news sites trying to figure out: Which of our 65 SKUs are exposed? What does switching to Vietnam actually cost? Is Vietnam even stable right now?

That analysis — which should take minutes — takes days. By then, the window to act has closed.

I built SourcingIntel to fix that. It's a real-time, multi-agent supply chain intelligence platform that brings together affected SKUs, landed costs, geopolitical risk scores, and ranked sourcing recommendations into a single workflow. And it does it all on-device — no pricing data ever leaves your machine.

Why This Matters Now: The Middle East Crisis

Let me paint a picture that's painfully current. In 2026, the Iran conflict escalated — airstrikes, Strait of Hormuz shipping disruptions, and cascading sanctions. Within 48 hours:

Oil prices spiked — every SKU with petroleum-derived components (plastics, synthetic textiles, packaging) saw landed costs jump overnight
Strait of Hormuz transit threatened — ~30% of the world's seaborne oil passes through this chokepoint. Disruption means Persian Gulf ports become unreachable for tankers and cargo vessels, forcing reliance on limited pipeline alternatives and driving freight surcharges across all Asian shipping lanes
Secondary sanctions rippled outward — suppliers in Turkey, UAE, and India with Iranian business ties suddenly became compliance risks

For a procurement team managing 65 SKUs across 8 sourcing countries, the questions pile up instantly: Which SKUs have MiddleEast-origin raw materials in their supply chain? If we shift textile sourcing from MiddleEast to Bangladesh, what's the landed cost delta? Is Bangladesh itself stable right now — or are we jumping from one risk into another?

This is exactly the scenario SourcingIntel is built for. Here's what happens when the Iran conflict unfolds:

GDELT + RSS feeds pick up the conflict articles within minutes. The RiskPoller classifies them by severity and maps them to Iran, Iraq, and neighboring countries.
Non-suppressible floor rules kick in — Iran's SRI floors at 85 (active conflict), and any OFAC-sanctioned country floors at 75. These can't be gamed by a quiet news day.
The Convergence Detector fires — oil price spike + conflict news + sanctions data converge on Iran simultaneously. The UI flags this as a multi-signal convergence event.
The Morning Brief auto-generates decisions ranked by urgency: "Evaluate alternative sourcing for SKU-012 (Polyester Blend) — current source Turkey, risk elevated due to sanctions proximity"
One-click triage — the procurement lead hits Approve, and the chat agent runs a full comparison: China vs Vietnam vs Bangladesh for that SKU, with live tariff rates, SRI scores, and annual savings calculations.

The entire cycle — from conflict detection to actionable recommendation — takes minutes, not days. That's the gap I wanted to close.

What SourcingIntel Does

Before I get into the technical walkthrough, here's the quick pitch:

Ask natural language questions like "Compare sourcing China vs Vietnam for electronics" and get a full analysis with cost charts, risk scores, and actionable recommendations
Watch a live risk map that updates every 5 minutes from GDELT, BBC, Reuters, and US State Department data
Simulate tariff scenarios — slide a tariff rate from 0–100% and instantly see the $M portfolio impact
Get a Morning Brief — an autonomous AI agent that wakes up daily, ranks decisions by urgency, and lets you Approve/Defer/Dismiss each one
All powered by on-device AI via Microsoft Foundry Local (phi-4-mini) — sensitive data stays local

For the complete technical design — every module, every interface, every request flow — see TECHNICAL-DESIGN.md.

Architecture: The 7-Layer Stack

I designed SourcingIntel as a layered system where each layer only talks to the one directly below it through TypeScript interfaces. No shortcuts, no leaky abstractions. Here's the high-level view:

┌─────────────────────────────────────────────────────┐
│  Layer 0: UI (React / Next.js 14 App Router)        │
│  18 components · dark/light theme · SSE live updates │
├─────────────────────────────────────────────────────┤
│  Layer 1: API Routes (Next.js Route Handlers)        │
│  14 endpoints · SSE risk stream · Vercel cron        │
├─────────────────────────────────────────────────────┤
│  Layer 2: Agent Orchestration (LangGraph StateGraph) │
│  Keyword-first intent routing · LLM fallback         │
├─────────────────────────────────────────────────────┤
│  Layer 3: 6 Specialist Agents                        │
│  Inventory · Tariff · GeoRisk · Dashboard · News ·   │
│  Market (each with MCP tool-calling)                 │
├─────────────────────────────────────────────────────┤
│  Layer 4: MCP Tool Layer (Three-Layer Architecture)  │
│  7 stdio servers · 3 InMemoryTransport clients ·     │
│  Runtime tool registry (oil, FX, BDI, sanctions)     │
├─────────────────────────────────────────────────────┤
│  Layer 5: Storage Adapters (Interface-gated)         │
│  LanceDB (vectors) · SQLite (tariffs/suppliers) ·    │
│  InMemoryRiskStore (SRI + alerts + SSE emitter)      │
├─────────────────────────────────────────────────────┤
│  Layer 6: AI Infrastructure                          │
│  Foundry Local phi-4-mini · Xenova embeddings ·      │
│  Azure AI Foundry (optional cloud fallback)          │
└─────────────────────────────────────────────────────┘

The key constraint I imposed: every cross-layer dependency goes through a TypeScript interface. Agents never import concrete adapter classes. There's exactly one file (src/lib/startup.ts) that wires everything together — the entire project's dependency injection root.

Step 1: Multi-Agent Orchestration with LangGraph

The heart of SourcingIntel is a LangGraph StateGraph that routes user queries to 6 specialist agents. But here's the thing — calling an LLM just to figure out which agent to use adds 300–2000ms of latency. For a query like "What's the tariff rate for China?", that's wasted time.

So I built a keyword-first intent classifier. The orchestrator checks regex patterns first:

Query arrives
  ├─ NEWS_PATTERNS match?     → route to NewsAgent (no LLM call)
  ├─ MARKET_PATTERNS match?   → route to MarketIntelAgent (no LLM call)  
  ├─ has inventory + risk?    → route to full comparison pipeline (no LLM call)
  └─ None match               → call phi-4-mini to classify intent

Clear-cut queries skip the classification LLM call entirely. Ambiguous ones fall back to phi-4-mini. This keeps the system fast without sacrificing flexibility.

For complex queries like "Compare sourcing China vs Vietnam for electronics", the full pipeline kicks in:

classify → inventoryNode → tariffNode → riskNode → dashboardNode → merged response

Each node runs its specialist agent sequentially (phi-4-mini can't handle concurrent LLM calls on 16GB machines without risking OOM), and the DashboardAgent at the end merges everything into a unified response with cost comparison charts, risk scores, and ranked SourcingRecommendation[].

Step 2: The Six Specialist Agents

Each agent follows three strict rules:

Never import adapters directly — receive them via constructor interfaces
Always embed data in the user turn — not the system prompt (phi-4-mini grounds better this way)
Limit context to ~2KB — filter before passing to the LLM to avoid token limit cutoffs

InventoryAgent

Handles inventory lookups with smart routing — price queries get sorted results, country queries get exact matches, and only truly semantic queries hit the vector search. Uses getCountryConfig().getAllCountryNames() for country detection (the authoritative list from SQLite, not derived from inventory).

TariffAgent

Fetches all tariff rates from SQLite, filters to the relevant country before sending to LLM (otherwise phi-4-mini chokes on 80 rows of tariff data). Supports structured SKU-switch analysis: when you say "analyze switching SKU-001 from China", it builds full SourcingRecommendation[] with annual savings.

GeoRiskAgent

Computes a Sourcing Risk Index (SRI) per country using weighted signals: newsRisk × 0.30 + tariffRisk × 0.25 + tradeDisruption × 0.20 + baselineRisk × 0.25. The system prompt explicitly forbids generic advice — the agent must cite specific signals.

DashboardAgent

The merge layer. For each of the top 10 SKUs, it compares tariff rates across countries, finds the cheapest alternative, and calculates annual savings. Outputs structured data that the UI renders as charts.

NewsAggregatorAgent

Synthesizes GDELT conflict articles + BBC/Reuters RSS feeds. Groups alerts by country, surfaces the top 5 highest-severity ones, and formats convergence cards when multiple risk domains overlap.

MarketIntelAgent

The MCP tool-calling agent. All data comes from live financial APIs through the MCP runtime bridge — oil prices (Stooq), FX rates (ECB), Baltic Dry Index, sanctions screening. No static data.

Step 3: Three-Layer MCP Architecture

This is probably the most architecturally interesting part. I implemented MCP (Model Context Protocol) at three distinct levels, ensuring every agent-tool interaction traverses the MCP protocol:

Layer 1: External stdio MCP servers (7 servers, 35+ tools)

These are standalone servers using @modelcontextprotocol/sdk that can plug into Claude Desktop or any MCP-compatible client:

npx tsx src/mcp/riskMcpServer.ts        # SRI scores, alerts, heatmap
npx tsx src/mcp/tariffMcpServer.ts      # compare_countries, calculate_savings
npx tsx src/mcp/inventoryMcpServer.ts   # search_sku, list_by_country
npx tsx src/mcp/commodityMcpServer.ts   # live oil, BDI, cotton, copper
npx tsx src/mcp/sanctionsMcpServer.ts   # OFAC screening
npx tsx src/mcp/fxMcpServer.ts          # live exchange rates
npx tsx src/mcp/gdeltMcpServer.ts       # real-time conflict news

Layer 2: Internal MCP clients (InMemoryTransport)

At startup, I wire three MCP servers to in-process Client instances via InMemoryTransport. The agents call McpTariffAdapter, McpInventoryAdapter, and McpRiskAdapter — which implement the same ITariffStore, IInventoryStore, IRiskStore interfaces — but route every call through Client.callTool() over the MCP protocol boundary.

Agent → McpXxxAdapter.method() → Client.callTool() → InMemoryTransport
  → MCP server handler → raw store (SQLite / LanceDB / InMemoryRiskStore)

Layer 3: Runtime tool registry

mcpToolRegistry.ts provides pattern-matched tool selection + Promise.allSettled execution for live financial APIs. The MarketIntelAgent asks: "Which tools match this query?" and the registry returns the relevant tools based on trigger patterns.

Step 4: Real-Time Risk Pipeline (SSE)

The background risk engine runs a continuous loop:

Every 5 minutes, the RiskPoller fetches:
- GDELT conflict articles (single batch API call, matched to countries by name)
- BBC + Reuters RSS feeds
- US State Department travel advisories
- World Bank governance indicators (cached 24h)
ConflictClassifier scores each article by severity (keyword-based — no LLM, to preserve Foundry capacity for user queries)
RiskScorer computes the SRI with non-suppressible floor rules:
- Active conflict → minimum SRI 85
- US sanctions → minimum 75
- Do-not-travel → minimum 65
- Chronic instability → minimum 55

These floors are data-driven from the country_config SQLite table — they can't be overridden by absent news.

ConvergenceDetector flags when 2+ signal dimensions spike simultaneously for the same country
Updates stream to the UI via Server-Sent Events — the map, RiskRadarStrip, and SignalConvergenceStrip all subscribe and update in real time

All external API calls go through a CircuitBreaker — after 3 consecutive failures, it trips open and returns cached data. The UI stays functional even when data sources are down.

Step 5: The Morning Brief — Agentic Decision-Making

This is where SourcingIntel goes beyond a dashboard into genuine agentic territory. The Morning Brief is a standalone agentic pipeline (not routed through the orchestrator) that runs at 7am UTC via Vercel cron or on-demand:

Derives monitored countries from actual inventory — no hardcoded country list
Fetches SRI scores + tariff comparisons in parallel for all sourcing countries
Ranks decisions by urgency: urgencyScore = risk × financial impact
Pre-builds the exact chat query for each decision (ready for one-click execution)
Diffs against the previous brief to surface what changed overnight
Emails an HTML brief via nodemailer if SMTP is configured

The user gets a Decision Queue with three options per item:

Approve → auto-submits the pre-built query to the chat agent + emails the procurement team
Defer → moves to the bottom of the queue
Dismiss → removes the decision

There's also a Triage Runner that batch-executes the top 3 pending decisions sequentially through the chat agent with 1.8s spacing — so you can kick off a full analysis in one click.

Step 6: Responsible AI Design

Privacy and safety were non-negotiable from day one:

Principle	How It Works
On-device inference	All LLM calls via Foundry Local (phi-4-mini). Pricing/inventory data never sent to any cloud. Embeddings also run locally via Xenova.
Sanctions guardrails	The What-If simulator returns HTTP 403 for OFAC-sanctioned countries. InventoryAgent emits compliance warnings. The MCP sanctions tool flags OFAC countries for all agents.
Non-suppressible risk floors	Active conflict → SRI ≥ 85. These floors are data-driven from SQLite — they can't be overridden by absent news data.
Grounded responses	Every agent embeds live data in the LLM's user turn as `[INVENTORY DATA — use ONLY items listed below]`. The model physically cannot reference data it wasn't given.
Source attribution	All responses are grounded on explicit `[INVENTORY DATA]`, `[TARIFF DATA]`, `[LIVE MARKET DATA]` blocks with instructions to reference only provided data.
Circuit breaker	External API failures don't cascade — the circuit trips open and returns cached data.
Full tracing	Every LLM call is tagged with a UUID, agent name, and timing via a `withTracing()` decorator. Spans are queryable at `/api/trace/[id]`.

Additional Guardrails

Prompt Registry — all system prompts centralized in src/prompts/prompts.json with co-located max_tokens budgets. No scattered prompt strings.
LRU cache — 256 entries max with stale-while-revalidate and in-flight request coalescing to prevent cache stampedes
Per-connection SSE cleanup — riskEmitter.off() on disconnect, never removeAllListeners
Model TTL — Foundry Local model pinned for 2 hours via SDK to prevent auto-unload during idle periods

Step 7: The RAG Pipeline

SourcingIntel uses a retrieval-augmented generation pipeline for inventory search:

Ingestion (run once via npx tsx data/seedLanceDb.ts):

Parse 65 SKUs from inventory.csv
Build embedding text: "{name} {category} {country} {hsCode}"
Generate 384-dim vectors via Xenova all-MiniLM-L6-v2 (entirely on-device)
Store in LanceDB

Retrieval — but with an important twist. The InventoryAgent doesn't blindly use vector search for everything:

Price queries → getAll() + sort in JS (cosine similarity can't rank by price)
Country queries → getByCountry() exact match (no vectors needed)
Category queries → getAll() + JS filter
Semantic queries → vector search with top-8 results

This ensures the LLM always gets correctly-ranked data regardless of query type.

Step 8: Running It Yourself

Prerequisites

Node.js 20+
Foundry Local installed
Windows: Visual Studio Build Tools (for better-sqlite3)

Quick Start

# 1. Start Foundry Local
foundry service start
foundry model run phi-4-mini-instruct-openvino-gpu:2

# 2. Install and configure
cd SourcingIntel
pnpm install
cp .env.example .env.local
# Edit .env.local with your Foundry settings

# 3. Seed databases (optional — demo data included)
NODE_OPTIONS="--max-old-space-size=4096" npx tsx data/seed.ts
NODE_OPTIONS="--max-old-space-size=4096" npx tsx data/seedLanceDb.ts

# 4. Run
NODE_OPTIONS="--max-old-space-size=4096" pnpm dev
# Open http://localhost:3000

Demo Walkthrough

Try these queries in order:

#	Query / Action	What Happens
1	"Which electronics do I source from China?"	InventoryAgent → LanceDB semantic search
2	"What is the tariff rate for China electronics HS 8471.30?"	TariffAgent → SQLite lookup
3	"Compare sourcing China vs Vietnam for electronics"	Full pipeline: all 6 agents → merged response with chart
4	"What is the geopolitical risk for Vietnam right now?"	GeoRiskAgent → SRI score + signal breakdown
5	"What is the current WTI oil price and shipping index?"	MarketIntelAgent → MCP tools (Stooq + BDI)
6	"Summarize today's conflict news"	NewsAggregatorAgent → GDELT + RSS
7	Open What-If Simulator → drag to 40% China	$M portfolio impact simulation
8	Open Morning Brief (bell icon)	Autonomous agent → Decision Queue

Testing Without an LLM

All 69 tests run without a live Foundry Local instance:

NODE_OPTIONS="--max-old-space-size=4096" npx vitest run --config vitest.config.mjs

Test File	Tests	Coverage
`riskScorer.test.ts`	4	SRI floor rules, trend detection, weight correctness
`agents.test.ts`	6	Circuit breaker, InventoryAgent grounding, Orchestrator routing
`evals.test.ts`	44	Intent patterns, tariff accuracy (real SQLite), MCP sanctions, RAG grounding
`rag.test.ts`	14	InventoryRetriever contract, precision@K evaluation

Tests verify grounding data correctness — not LLM output. If the right data reaches the LLM context window, the model will ground its response correctly.

Key Technical Decisions and Tradeoffs

Keyword-first classification — saves 300–2000ms on clear intents at the cost of maintaining regex patterns. Worth it for a demo where responsiveness matters.
Sequential LLM calls — phi-4-mini on 16GB RAM can't handle concurrent inference. I chose reliability over speed.
Dummy vector scans in LanceDB — LanceDB v0.4's SQL filter is unreliable for full table scans. Using a dummy vector + JS filter is hacky but stable for 65 SKUs.
InMemoryTransport for internal MCP — running MCP in-process via InMemoryTransport avoids spawning separate server processes while still routing all tool calls through the MCP protocol boundary.
No LLM in ConflictClassifier — keyword-based severity scoring preserves Foundry Local capacity for user-facing queries. The risk pipeline runs every 5 minutes; burning LLM tokens on news classification would starve the chat.

Beyond Supply Chain: Applying This Pattern to Other Domains

While I built SourcingIntel for procurement, the underlying architecture — multi-agent orchestration + MCP tool layer + real-time risk scoring + agentic decision queues — is domain-agnostic. Here are concrete applications I see:

Financial Portfolio Risk Management

Replace SKUs with equity positions. The GeoRiskAgent monitors geopolitical events affecting market sectors. The TariffAgent becomes a regulatory-change tracker (new SEC rules, EU MiFID updates). The Morning Brief surfaces portfolio rebalancing decisions: "Emerging market exposure in Fund A exceeds threshold — 3 positions flagged, estimated VaR impact $2.1M." The same Approve/Defer/Dismiss workflow applies — portfolio managers triage instead of procurement leads.

Healthcare Supply Chain & Drug Shortage Monitoring

Hospitals face the same multi-signal convergence problem. An API shortage, a manufacturing plant shutdown in India, and an FDA warning letter — each is manageable alone, but together they mean your ICU runs out of a critical drug in 72 hours. The convergence detector pattern maps perfectly here. Floor rules become: FDA recall → minimum severity 90. Single-source drug → minimum 80.

Energy Grid Operations

Replace countries with grid regions, SKUs with generation assets. The risk pipeline monitors weather events, fuel price spikes, and equipment failure rates. The What-If simulator becomes: "If natural gas prices hit $8/MMBtu, what's the cost impact of switching Region 3 to wind+battery vs. keeping gas turbines online?" The same SRI scoring applies — a hurricane in the Gulf Coast floors the risk index for that supply region.

Agricultural Commodity Trading

Replace SKUs with crop positions across growing regions. The MCP tool layer already handles commodity prices (cotton, copper, oil) — extending to wheat, corn, and soybean futures is a configuration change. The GDELT pipeline catches droughts, export bans (India's rice export restrictions), and port strikes. The Morning Brief becomes: "Brazil soybean harvest 12% below forecast — evaluate switching Q3 soy contracts to Argentina. Urgency: High."

Insurance Underwriting

The risk scoring model translates directly. Replace country SRI with policyholder risk profiles. The convergence detector flags when multiple risk factors spike simultaneously for a portfolio segment. Floor rules become actuarial minimums. The decision queue surfaces renewal and pricing decisions.

The common thread across all of these: multi-signal convergence detection + agentic decision queues + non-suppressible floor rules. Any domain where professionals monitor multiple data streams and need to make time-sensitive decisions with incomplete information can benefit from this pattern.

What I'd Build Next

Multi-model support — swap phi-4-mini for larger models when running on GPU-equipped machines
Historical SRI analytics — persist risk scores to SQLite for trend analysis beyond the current session
Supplier relationship scoring — factor in quality, lead time, and payment terms alongside cost and risk
Collaborative decision queues — multi-user approval workflows with role-based access

- GitHub Repository: SourcingIntel

Built with Next.js 14, LangGraph, Microsoft Foundry Local, Model Context Protocol, LanceDB.

DEV Community: Kalyan