<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Kalyan</title>
    <description>The latest articles on DEV Community by Kalyan (@kalyan_8b63839572c8a7db1b).</description>
    <link>https://dev.to/kalyan_8b63839572c8a7db1b</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3854393%2F1daec21f-6e38-48fc-86da-15182f5e4ac4.png</url>
      <title>DEV Community: Kalyan</title>
      <link>https://dev.to/kalyan_8b63839572c8a7db1b</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kalyan_8b63839572c8a7db1b"/>
    <language>en</language>
    <item>
      <title>Building Sourcing Intel: An AI-Powered Supply Chain Intelligence Platform with On-Device Inference</title>
      <dc:creator>Kalyan</dc:creator>
      <pubDate>Wed, 01 Apr 2026 00:47:18 +0000</pubDate>
      <link>https://dev.to/kalyan_8b63839572c8a7db1b/building-sourcing-intel-an-ai-powered-supply-chain-intelligence-platform-with-on-device-inference-4kdi</link>
      <guid>https://dev.to/kalyan_8b63839572c8a7db1b/building-sourcing-intel-an-ai-powered-supply-chain-intelligence-platform-with-on-device-inference-4kdi</guid>
      <description>&lt;h2&gt;
  
  
  The Problem That Kept Me Up at Night
&lt;/h2&gt;

&lt;p&gt;If you've worked anywhere near retail procurement, you know the pain. A tariff announcement drops on a Tuesday morning, and suddenly your team is scrambling through spreadsheets, disconnected tariff tables, and three different news sites trying to figure out: &lt;em&gt;Which of our 65 SKUs are exposed? What does switching to Vietnam actually cost? Is Vietnam even stable right now?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That analysis — which should take minutes — takes days. By then, the window to act has closed.&lt;/p&gt;

&lt;p&gt;I built &lt;strong&gt;SourcingIntel&lt;/strong&gt; to fix that. It's a real-time, multi-agent supply chain intelligence platform that brings together affected SKUs, landed costs, geopolitical risk scores, and ranked sourcing recommendations into a single workflow. And it does it all &lt;strong&gt;on-device&lt;/strong&gt; — no pricing data ever leaves your machine.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Matters Now: The Middle East Crisis
&lt;/h2&gt;

&lt;p&gt;Let me paint a picture that's painfully current. In 2026, the Iran conflict escalated — airstrikes, Strait of Hormuz shipping disruptions, and cascading sanctions. Within 48 hours:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Oil prices spiked&lt;/strong&gt; — every SKU with petroleum-derived components (plastics, synthetic textiles, packaging) saw landed costs jump overnight&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strait of Hormuz transit threatened&lt;/strong&gt; — ~30% of the world's seaborne oil passes through this chokepoint. Disruption means Persian Gulf ports become unreachable for tankers and cargo vessels, forcing reliance on limited pipeline alternatives and driving freight surcharges across all Asian shipping lanes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secondary sanctions rippled outward&lt;/strong&gt; — suppliers in Turkey, UAE, and India with Iranian business ties suddenly became compliance risks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a procurement team managing 65 SKUs across 8 sourcing countries, the questions pile up instantly: &lt;em&gt;Which SKUs have MiddleEast-origin raw materials in their supply chain? If we shift textile sourcing from MiddleEast to Bangladesh, what's the landed cost delta? Is Bangladesh itself stable right now — or are we jumping from one risk into another?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is exactly the scenario SourcingIntel is built for. Here's what happens when the Iran conflict unfolds:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;GDELT + RSS feeds&lt;/strong&gt; pick up the conflict articles within minutes. The RiskPoller classifies them by severity and maps them to Iran, Iraq, and neighboring countries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Non-suppressible floor rules&lt;/strong&gt; kick in — Iran's SRI floors at 85 (active conflict), and any OFAC-sanctioned country floors at 75. These can't be gamed by a quiet news day.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Convergence Detector&lt;/strong&gt; fires — oil price spike + conflict news + sanctions data converge on Iran simultaneously. The UI flags this as a multi-signal convergence event.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Morning Brief&lt;/strong&gt; auto-generates decisions ranked by urgency: &lt;em&gt;"Evaluate alternative sourcing for SKU-012 (Polyester Blend) — current source Turkey, risk elevated due to sanctions proximity"&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One-click triage&lt;/strong&gt; — the procurement lead hits Approve, and the chat agent runs a full comparison: China vs Vietnam vs Bangladesh for that SKU, with live tariff rates, SRI scores, and annual savings calculations.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The entire cycle — from conflict detection to actionable recommendation — takes minutes, not days. That's the gap I wanted to close.&lt;/p&gt;




&lt;h2&gt;
  
  
  What SourcingIntel Does
&lt;/h2&gt;

&lt;p&gt;Before I get into the technical walkthrough, here's the quick pitch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ask natural language questions&lt;/strong&gt; like &lt;em&gt;"Compare sourcing China vs Vietnam for electronics"&lt;/em&gt; and get a full analysis with cost charts, risk scores, and actionable recommendations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Watch a live risk map&lt;/strong&gt; that updates every 5 minutes from GDELT, BBC, Reuters, and US State Department data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simulate tariff scenarios&lt;/strong&gt; — slide a tariff rate from 0–100% and instantly see the $M portfolio impact&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Get a Morning Brief&lt;/strong&gt; — an autonomous AI agent that wakes up daily, ranks decisions by urgency, and lets you Approve/Defer/Dismiss each one&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;All powered by on-device AI&lt;/strong&gt; via Microsoft Foundry Local (phi-4-mini) — sensitive data stays local&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;For the complete technical design — every module, every interface, every request flow — see &lt;a href="//./TECHNICAL-DESIGN.md"&gt;TECHNICAL-DESIGN.md&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Architecture: The 7-Layer Stack
&lt;/h2&gt;

&lt;p&gt;I designed SourcingIntel as a layered system where each layer only talks to the one directly below it through TypeScript interfaces. No shortcuts, no leaky abstractions. Here's the high-level view:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────┐
│  Layer 0: UI (React / Next.js 14 App Router)        │
│  18 components · dark/light theme · SSE live updates │
├─────────────────────────────────────────────────────┤
│  Layer 1: API Routes (Next.js Route Handlers)        │
│  14 endpoints · SSE risk stream · Vercel cron        │
├─────────────────────────────────────────────────────┤
│  Layer 2: Agent Orchestration (LangGraph StateGraph) │
│  Keyword-first intent routing · LLM fallback         │
├─────────────────────────────────────────────────────┤
│  Layer 3: 6 Specialist Agents                        │
│  Inventory · Tariff · GeoRisk · Dashboard · News ·   │
│  Market (each with MCP tool-calling)                 │
├─────────────────────────────────────────────────────┤
│  Layer 4: MCP Tool Layer (Three-Layer Architecture)  │
│  7 stdio servers · 3 InMemoryTransport clients ·     │
│  Runtime tool registry (oil, FX, BDI, sanctions)     │
├─────────────────────────────────────────────────────┤
│  Layer 5: Storage Adapters (Interface-gated)         │
│  LanceDB (vectors) · SQLite (tariffs/suppliers) ·    │
│  InMemoryRiskStore (SRI + alerts + SSE emitter)      │
├─────────────────────────────────────────────────────┤
│  Layer 6: AI Infrastructure                          │
│  Foundry Local phi-4-mini · Xenova embeddings ·      │
│  Azure AI Foundry (optional cloud fallback)          │
└─────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key constraint I imposed: &lt;strong&gt;every cross-layer dependency goes through a TypeScript interface.&lt;/strong&gt; Agents never import concrete adapter classes. There's exactly one file (&lt;code&gt;src/lib/startup.ts&lt;/code&gt;) that wires everything together — the entire project's dependency injection root.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1: Multi-Agent Orchestration with LangGraph
&lt;/h2&gt;

&lt;p&gt;The heart of SourcingIntel is a &lt;strong&gt;LangGraph StateGraph&lt;/strong&gt; that routes user queries to 6 specialist agents. But here's the thing — calling an LLM just to figure out &lt;em&gt;which&lt;/em&gt; agent to use adds 300–2000ms of latency. For a query like &lt;em&gt;"What's the tariff rate for China?"&lt;/em&gt;, that's wasted time.&lt;/p&gt;

&lt;p&gt;So I built a &lt;strong&gt;keyword-first intent classifier&lt;/strong&gt;. The orchestrator checks regex patterns first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Query arrives
  ├─ NEWS_PATTERNS match?     → route to NewsAgent (no LLM call)
  ├─ MARKET_PATTERNS match?   → route to MarketIntelAgent (no LLM call)  
  ├─ has inventory + risk?    → route to full comparison pipeline (no LLM call)
  └─ None match               → call phi-4-mini to classify intent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Clear-cut queries skip the classification LLM call entirely. Ambiguous ones fall back to phi-4-mini. This keeps the system fast without sacrificing flexibility.&lt;/p&gt;

&lt;p&gt;For complex queries like &lt;em&gt;"Compare sourcing China vs Vietnam for electronics"&lt;/em&gt;, the full pipeline kicks in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;classify → inventoryNode → tariffNode → riskNode → dashboardNode → merged response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each node runs its specialist agent sequentially (phi-4-mini can't handle concurrent LLM calls on 16GB machines without risking OOM), and the DashboardAgent at the end merges everything into a unified response with cost comparison charts, risk scores, and ranked &lt;code&gt;SourcingRecommendation[]&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 2: The Six Specialist Agents
&lt;/h2&gt;

&lt;p&gt;Each agent follows three strict rules:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Never import adapters directly&lt;/strong&gt; — receive them via constructor interfaces&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Always embed data in the user turn&lt;/strong&gt; — not the system prompt (phi-4-mini grounds better this way)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Limit context to ~2KB&lt;/strong&gt; — filter before passing to the LLM to avoid token limit cutoffs&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  InventoryAgent
&lt;/h3&gt;

&lt;p&gt;Handles inventory lookups with smart routing — price queries get sorted results, country queries get exact matches, and only truly semantic queries hit the vector search. Uses &lt;code&gt;getCountryConfig().getAllCountryNames()&lt;/code&gt; for country detection (the authoritative list from SQLite, not derived from inventory).&lt;/p&gt;

&lt;h3&gt;
  
  
  TariffAgent
&lt;/h3&gt;

&lt;p&gt;Fetches all tariff rates from SQLite, filters to the relevant country before sending to LLM (otherwise phi-4-mini chokes on 80 rows of tariff data). Supports structured SKU-switch analysis: when you say &lt;em&gt;"analyze switching SKU-001 from China"&lt;/em&gt;, it builds full &lt;code&gt;SourcingRecommendation[]&lt;/code&gt; with annual savings.&lt;/p&gt;

&lt;h3&gt;
  
  
  GeoRiskAgent
&lt;/h3&gt;

&lt;p&gt;Computes a &lt;strong&gt;Sourcing Risk Index (SRI)&lt;/strong&gt; per country using weighted signals: &lt;code&gt;newsRisk × 0.30 + tariffRisk × 0.25 + tradeDisruption × 0.20 + baselineRisk × 0.25&lt;/code&gt;. The system prompt explicitly forbids generic advice — the agent must cite specific signals.&lt;/p&gt;

&lt;h3&gt;
  
  
  DashboardAgent
&lt;/h3&gt;

&lt;p&gt;The merge layer. For each of the top 10 SKUs, it compares tariff rates across countries, finds the cheapest alternative, and calculates annual savings. Outputs structured data that the UI renders as charts.&lt;/p&gt;

&lt;h3&gt;
  
  
  NewsAggregatorAgent
&lt;/h3&gt;

&lt;p&gt;Synthesizes GDELT conflict articles + BBC/Reuters RSS feeds. Groups alerts by country, surfaces the top 5 highest-severity ones, and formats convergence cards when multiple risk domains overlap.&lt;/p&gt;

&lt;h3&gt;
  
  
  MarketIntelAgent
&lt;/h3&gt;

&lt;p&gt;The MCP tool-calling agent. All data comes from live financial APIs through the MCP runtime bridge — oil prices (Stooq), FX rates (ECB), Baltic Dry Index, sanctions screening. No static data.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 3: Three-Layer MCP Architecture
&lt;/h2&gt;

&lt;p&gt;This is probably the most architecturally interesting part. I implemented MCP (Model Context Protocol) at &lt;strong&gt;three distinct levels&lt;/strong&gt;, ensuring every agent-tool interaction traverses the MCP protocol:&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: External stdio MCP servers (7 servers, 35+ tools)
&lt;/h3&gt;

&lt;p&gt;These are standalone servers using &lt;code&gt;@modelcontextprotocol/sdk&lt;/code&gt; that can plug into Claude Desktop or any MCP-compatible client:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx tsx src/mcp/riskMcpServer.ts        &lt;span class="c"&gt;# SRI scores, alerts, heatmap&lt;/span&gt;
npx tsx src/mcp/tariffMcpServer.ts      &lt;span class="c"&gt;# compare_countries, calculate_savings&lt;/span&gt;
npx tsx src/mcp/inventoryMcpServer.ts   &lt;span class="c"&gt;# search_sku, list_by_country&lt;/span&gt;
npx tsx src/mcp/commodityMcpServer.ts   &lt;span class="c"&gt;# live oil, BDI, cotton, copper&lt;/span&gt;
npx tsx src/mcp/sanctionsMcpServer.ts   &lt;span class="c"&gt;# OFAC screening&lt;/span&gt;
npx tsx src/mcp/fxMcpServer.ts          &lt;span class="c"&gt;# live exchange rates&lt;/span&gt;
npx tsx src/mcp/gdeltMcpServer.ts       &lt;span class="c"&gt;# real-time conflict news&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Layer 2: Internal MCP clients (InMemoryTransport)
&lt;/h3&gt;

&lt;p&gt;At startup, I wire three MCP servers to in-process &lt;code&gt;Client&lt;/code&gt; instances via &lt;code&gt;InMemoryTransport&lt;/code&gt;. The agents call &lt;code&gt;McpTariffAdapter&lt;/code&gt;, &lt;code&gt;McpInventoryAdapter&lt;/code&gt;, and &lt;code&gt;McpRiskAdapter&lt;/code&gt; — which implement the same &lt;code&gt;ITariffStore&lt;/code&gt;, &lt;code&gt;IInventoryStore&lt;/code&gt;, &lt;code&gt;IRiskStore&lt;/code&gt; interfaces — but route every call through &lt;code&gt;Client.callTool()&lt;/code&gt; over the MCP protocol boundary.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent → McpXxxAdapter.method() → Client.callTool() → InMemoryTransport
  → MCP server handler → raw store (SQLite / LanceDB / InMemoryRiskStore)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Layer 3: Runtime tool registry
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;mcpToolRegistry.ts&lt;/code&gt; provides pattern-matched tool selection + &lt;code&gt;Promise.allSettled&lt;/code&gt; execution for live financial APIs. The MarketIntelAgent asks: &lt;em&gt;"Which tools match this query?"&lt;/em&gt; and the registry returns the relevant tools based on trigger patterns.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 4: Real-Time Risk Pipeline (SSE)
&lt;/h2&gt;

&lt;p&gt;The background risk engine runs a continuous loop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Every 5 minutes&lt;/strong&gt;, the &lt;code&gt;RiskPoller&lt;/code&gt; fetches:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GDELT conflict articles (single batch API call, matched to countries by name)&lt;/li&gt;
&lt;li&gt;BBC + Reuters RSS feeds&lt;/li&gt;
&lt;li&gt;US State Department travel advisories&lt;/li&gt;
&lt;li&gt;World Bank governance indicators (cached 24h)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;ConflictClassifier&lt;/strong&gt; scores each article by severity (keyword-based — no LLM, to preserve Foundry capacity for user queries)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;RiskScorer&lt;/strong&gt; computes the SRI with &lt;strong&gt;non-suppressible floor rules&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Active conflict → minimum SRI 85&lt;/li&gt;
&lt;li&gt;US sanctions → minimum 75&lt;/li&gt;
&lt;li&gt;Do-not-travel → minimum 65&lt;/li&gt;
&lt;li&gt;Chronic instability → minimum 55&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These floors are data-driven from the &lt;code&gt;country_config&lt;/code&gt; SQLite table — they can't be overridden by absent news.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;ConvergenceDetector&lt;/strong&gt; flags when 2+ signal dimensions spike simultaneously for the same country&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Updates stream to the UI via &lt;strong&gt;Server-Sent Events&lt;/strong&gt; — the map, RiskRadarStrip, and SignalConvergenceStrip all subscribe and update in real time&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All external API calls go through a &lt;strong&gt;CircuitBreaker&lt;/strong&gt; — after 3 consecutive failures, it trips open and returns cached data. The UI stays functional even when data sources are down.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 5: The Morning Brief — Agentic Decision-Making
&lt;/h2&gt;

&lt;p&gt;This is where SourcingIntel goes beyond a dashboard into genuine agentic territory. The Morning Brief is a &lt;strong&gt;standalone agentic pipeline&lt;/strong&gt; (not routed through the orchestrator) that runs at 7am UTC via Vercel cron or on-demand:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Derives&lt;/strong&gt; monitored countries from actual inventory — no hardcoded country list&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fetches&lt;/strong&gt; SRI scores + tariff comparisons in parallel for all sourcing countries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ranks&lt;/strong&gt; decisions by urgency: &lt;code&gt;urgencyScore = risk × financial impact&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pre-builds&lt;/strong&gt; the exact chat query for each decision (ready for one-click execution)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Diffs&lt;/strong&gt; against the previous brief to surface what changed overnight&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Emails&lt;/strong&gt; an HTML brief via nodemailer if SMTP is configured&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The user gets a &lt;strong&gt;Decision Queue&lt;/strong&gt; with three options per item:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Approve&lt;/strong&gt; → auto-submits the pre-built query to the chat agent + emails the procurement team&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Defer&lt;/strong&gt; → moves to the bottom of the queue&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dismiss&lt;/strong&gt; → removes the decision&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There's also a &lt;strong&gt;Triage Runner&lt;/strong&gt; that batch-executes the top 3 pending decisions sequentially through the chat agent with 1.8s spacing — so you can kick off a full analysis in one click.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 6: Responsible AI Design
&lt;/h2&gt;

&lt;p&gt;Privacy and safety were non-negotiable from day one:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Principle&lt;/th&gt;
&lt;th&gt;How It Works&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;On-device inference&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;All LLM calls via Foundry Local (phi-4-mini). Pricing/inventory data never sent to any cloud. Embeddings also run locally via Xenova.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Sanctions guardrails&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The What-If simulator returns HTTP 403 for OFAC-sanctioned countries. InventoryAgent emits compliance warnings. The MCP sanctions tool flags OFAC countries for all agents.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Non-suppressible risk floors&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Active conflict → SRI ≥ 85. These floors are data-driven from SQLite — they can't be overridden by absent news data.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Grounded responses&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Every agent embeds live data in the LLM's user turn as &lt;code&gt;[INVENTORY DATA — use ONLY items listed below]&lt;/code&gt;. The model physically cannot reference data it wasn't given.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Source attribution&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;All responses are grounded on explicit &lt;code&gt;[INVENTORY DATA]&lt;/code&gt;, &lt;code&gt;[TARIFF DATA]&lt;/code&gt;, &lt;code&gt;[LIVE MARKET DATA]&lt;/code&gt; blocks with instructions to reference only provided data.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Circuit breaker&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;External API failures don't cascade — the circuit trips open and returns cached data.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Full tracing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Every LLM call is tagged with a UUID, agent name, and timing via a &lt;code&gt;withTracing()&lt;/code&gt; decorator. Spans are queryable at &lt;code&gt;/api/trace/[id]&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Additional Guardrails
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prompt Registry&lt;/strong&gt; — all system prompts centralized in &lt;code&gt;src/prompts/prompts.json&lt;/code&gt; with co-located &lt;code&gt;max_tokens&lt;/code&gt; budgets. No scattered prompt strings.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LRU cache&lt;/strong&gt; — 256 entries max with stale-while-revalidate and in-flight request coalescing to prevent cache stampedes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-connection SSE cleanup&lt;/strong&gt; — &lt;code&gt;riskEmitter.off()&lt;/code&gt; on disconnect, never &lt;code&gt;removeAllListeners&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model TTL&lt;/strong&gt; — Foundry Local model pinned for 2 hours via SDK to prevent auto-unload during idle periods&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Step 7: The RAG Pipeline
&lt;/h2&gt;

&lt;p&gt;SourcingIntel uses a retrieval-augmented generation pipeline for inventory search:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ingestion&lt;/strong&gt; (run once via &lt;code&gt;npx tsx data/seedLanceDb.ts&lt;/code&gt;):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Parse 65 SKUs from &lt;code&gt;inventory.csv&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Build embedding text: &lt;code&gt;"{name} {category} {country} {hsCode}"&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Generate 384-dim vectors via Xenova &lt;code&gt;all-MiniLM-L6-v2&lt;/code&gt; (entirely on-device)&lt;/li&gt;
&lt;li&gt;Store in LanceDB&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Retrieval&lt;/strong&gt; — but with an important twist. The InventoryAgent doesn't blindly use vector search for everything:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Price queries&lt;/strong&gt; → &lt;code&gt;getAll()&lt;/code&gt; + sort in JS (cosine similarity can't rank by price)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Country queries&lt;/strong&gt; → &lt;code&gt;getByCountry()&lt;/code&gt; exact match (no vectors needed)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Category queries&lt;/strong&gt; → &lt;code&gt;getAll()&lt;/code&gt; + JS filter&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic queries&lt;/strong&gt; → vector search with top-8 results&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This ensures the LLM always gets correctly-ranked data regardless of query type.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 8: Running It Yourself
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Node.js 20+&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://aka.ms/foundry-local" rel="noopener noreferrer"&gt;Foundry Local&lt;/a&gt; installed&lt;/li&gt;
&lt;li&gt;Windows: Visual Studio Build Tools (for &lt;code&gt;better-sqlite3&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Quick Start
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Start Foundry Local&lt;/span&gt;
foundry service start
foundry model run phi-4-mini-instruct-openvino-gpu:2

&lt;span class="c"&gt;# 2. Install and configure&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;SourcingIntel
pnpm &lt;span class="nb"&gt;install
cp&lt;/span&gt; .env.example .env.local
&lt;span class="c"&gt;# Edit .env.local with your Foundry settings&lt;/span&gt;

&lt;span class="c"&gt;# 3. Seed databases (optional — demo data included)&lt;/span&gt;
&lt;span class="nv"&gt;NODE_OPTIONS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"--max-old-space-size=4096"&lt;/span&gt; npx tsx data/seed.ts
&lt;span class="nv"&gt;NODE_OPTIONS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"--max-old-space-size=4096"&lt;/span&gt; npx tsx data/seedLanceDb.ts

&lt;span class="c"&gt;# 4. Run&lt;/span&gt;
&lt;span class="nv"&gt;NODE_OPTIONS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"--max-old-space-size=4096"&lt;/span&gt; pnpm dev
&lt;span class="c"&gt;# Open http://localhost:3000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Demo Walkthrough
&lt;/h3&gt;

&lt;p&gt;Try these queries in order:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Query / Action&lt;/th&gt;
&lt;th&gt;What Happens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;em&gt;"Which electronics do I source from China?"&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;InventoryAgent → LanceDB semantic search&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;em&gt;"What is the tariff rate for China electronics HS 8471.30?"&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;TariffAgent → SQLite lookup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;em&gt;"Compare sourcing China vs Vietnam for electronics"&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;Full pipeline: all 6 agents → merged response with chart&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;em&gt;"What is the geopolitical risk for Vietnam right now?"&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;GeoRiskAgent → SRI score + signal breakdown&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;em&gt;"What is the current WTI oil price and shipping index?"&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;MarketIntelAgent → MCP tools (Stooq + BDI)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;&lt;em&gt;"Summarize today's conflict news"&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;NewsAggregatorAgent → GDELT + RSS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Open What-If Simulator → drag to 40% China&lt;/td&gt;
&lt;td&gt;$M portfolio impact simulation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Open Morning Brief (bell icon)&lt;/td&gt;
&lt;td&gt;Autonomous agent → Decision Queue&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Testing Without an LLM
&lt;/h2&gt;

&lt;p&gt;All 69 tests run &lt;strong&gt;without&lt;/strong&gt; a live Foundry Local instance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;NODE_OPTIONS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"--max-old-space-size=4096"&lt;/span&gt; npx vitest run &lt;span class="nt"&gt;--config&lt;/span&gt; vitest.config.mjs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Test File&lt;/th&gt;
&lt;th&gt;Tests&lt;/th&gt;
&lt;th&gt;Coverage&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;riskScorer.test.ts&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;SRI floor rules, trend detection, weight correctness&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;agents.test.ts&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Circuit breaker, InventoryAgent grounding, Orchestrator routing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;evals.test.ts&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;44&lt;/td&gt;
&lt;td&gt;Intent patterns, tariff accuracy (real SQLite), MCP sanctions, RAG grounding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;rag.test.ts&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;InventoryRetriever contract, precision@K evaluation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Tests verify &lt;strong&gt;grounding data correctness&lt;/strong&gt; — not LLM output. If the right data reaches the LLM context window, the model will ground its response correctly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Technical Decisions and Tradeoffs
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Keyword-first classification&lt;/strong&gt; — saves 300–2000ms on clear intents at the cost of maintaining regex patterns. Worth it for a demo where responsiveness matters.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Sequential LLM calls&lt;/strong&gt; — phi-4-mini on 16GB RAM can't handle concurrent inference. I chose reliability over speed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Dummy vector scans in LanceDB&lt;/strong&gt; — LanceDB v0.4's SQL filter is unreliable for full table scans. Using a dummy vector + JS filter is hacky but stable for 65 SKUs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;InMemoryTransport for internal MCP&lt;/strong&gt; — running MCP in-process via &lt;code&gt;InMemoryTransport&lt;/code&gt; avoids spawning separate server processes while still routing all tool calls through the MCP protocol boundary.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No LLM in ConflictClassifier&lt;/strong&gt; — keyword-based severity scoring preserves Foundry Local capacity for user-facing queries. The risk pipeline runs every 5 minutes; burning LLM tokens on news classification would starve the chat.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Beyond Supply Chain: Applying This Pattern to Other Domains
&lt;/h2&gt;

&lt;p&gt;While I built SourcingIntel for procurement, the underlying architecture — multi-agent orchestration + MCP tool layer + real-time risk scoring + agentic decision queues — is domain-agnostic. Here are concrete applications I see:&lt;/p&gt;

&lt;h3&gt;
  
  
  Financial Portfolio Risk Management
&lt;/h3&gt;

&lt;p&gt;Replace SKUs with equity positions. The GeoRiskAgent monitors geopolitical events affecting market sectors. The TariffAgent becomes a regulatory-change tracker (new SEC rules, EU MiFID updates). The Morning Brief surfaces portfolio rebalancing decisions: &lt;em&gt;"Emerging market exposure in Fund A exceeds threshold — 3 positions flagged, estimated VaR impact $2.1M."&lt;/em&gt; The same Approve/Defer/Dismiss workflow applies — portfolio managers triage instead of procurement leads.&lt;/p&gt;

&lt;h3&gt;
  
  
  Healthcare Supply Chain &amp;amp; Drug Shortage Monitoring
&lt;/h3&gt;

&lt;p&gt;Hospitals face the same multi-signal convergence problem. An API shortage, a manufacturing plant shutdown in India, and an FDA warning letter — each is manageable alone, but together they mean your ICU runs out of a critical drug in 72 hours. The convergence detector pattern maps perfectly here. Floor rules become: &lt;em&gt;FDA recall → minimum severity 90. Single-source drug → minimum 80.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Energy Grid Operations
&lt;/h3&gt;

&lt;p&gt;Replace countries with grid regions, SKUs with generation assets. The risk pipeline monitors weather events, fuel price spikes, and equipment failure rates. The What-If simulator becomes: &lt;em&gt;"If natural gas prices hit $8/MMBtu, what's the cost impact of switching Region 3 to wind+battery vs. keeping gas turbines online?"&lt;/em&gt; The same SRI scoring applies — a hurricane in the Gulf Coast floors the risk index for that supply region.&lt;/p&gt;

&lt;h3&gt;
  
  
  Agricultural Commodity Trading
&lt;/h3&gt;

&lt;p&gt;Replace SKUs with crop positions across growing regions. The MCP tool layer already handles commodity prices (cotton, copper, oil) — extending to wheat, corn, and soybean futures is a configuration change. The GDELT pipeline catches droughts, export bans (India's rice export restrictions), and port strikes. The Morning Brief becomes: &lt;em&gt;"Brazil soybean harvest 12% below forecast — evaluate switching Q3 soy contracts to Argentina. Urgency: High."&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Insurance Underwriting
&lt;/h3&gt;

&lt;p&gt;The risk scoring model translates directly. Replace country SRI with policyholder risk profiles. The convergence detector flags when multiple risk factors spike simultaneously for a portfolio segment. Floor rules become actuarial minimums. The decision queue surfaces renewal and pricing decisions.&lt;/p&gt;

&lt;p&gt;The common thread across all of these: &lt;strong&gt;multi-signal convergence detection + agentic decision queues + non-suppressible floor rules.&lt;/strong&gt; Any domain where professionals monitor multiple data streams and need to make time-sensitive decisions with incomplete information can benefit from this pattern.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'd Build Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multi-model support&lt;/strong&gt; — swap phi-4-mini for larger models when running on GPU-equipped machines&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Historical SRI analytics&lt;/strong&gt; — persist risk scores to SQLite for trend analysis beyond the current session&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Supplier relationship scoring&lt;/strong&gt; — factor in quality, lead time, and payment terms alongside cost and risk&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Collaborative decision queues&lt;/strong&gt; — multi-user approval workflows with role-based access&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;h2&gt;
  
  
  - &lt;strong&gt;GitHub Repository:&lt;/strong&gt; &lt;a href="https://github.com/AKhil-codes/SourcingIntel" rel="noopener noreferrer"&gt;SourcingIntel&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Built with Next.js 14, LangGraph, Microsoft Foundry Local, Model Context Protocol, LanceDB.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>productivity</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
