AI security monitoring at scale: one LLM call, every dashboard

#ai #openai #architecture #webdev

How CoinHawk runs a continuous AI security scan for every connected user using a single shared LLM call every 5 minutes.

The dumb version doesn't scale

Imagine you want every user's dashboard to display a live "security score" produced by an LLM. The first instinct is:

GET /api/security/scan  →  call OpenAI  →  return result

If 1,000 users hit the dashboard, you make 1,000 LLM calls. At ~$0.02 per call you've burned $20 in 30 seconds, OpenAI rate-limits you, and your p95 latency is now whatever GPT feels like today.

The second instinct is to cache the response per-user. That's a little better, but you still scale costs linearly with users, and the cache invalidation logic gets ugly fast.

The real answer for monitoring-style features is dead simple: the scan is global, so make exactly one global scan and serve it to everyone. Below is the production pattern I shipped in CoinHawk's "Sentinel" feature.

The mental model

The data is shared, not user-specific. A "system security posture" doesn't care who's logged in.
Compute on a timer, not on demand. Refresh every N minutes regardless of traffic.
Reads are O(1) and instant. Every API request just returns the cached object.
One in-flight refresh at a time. No matter how many things ask, only one LLM call runs.

This pattern works for any "expensive global signal you want to display everywhere": status pages, market summaries, anomaly detectors, AI-curated newsfeeds, etc.

The full implementation (~50 lines)

import { openai } from "@workspace/integrations-openai-ai-server";

export type SecurityScanResult = {
  riskScore: number;
  status: "secure" | "watch" | "elevated" | "critical";
  summary: string;
  risks: Array<{
    severity: "low" | "medium" | "high" | "critical";
    category: string;
    title: "string;"
    description: "string;"
    recommendation: string;
  }>;
  scannedAt: string;
};

let cachedScan: SecurityScanResult = fallbackScan();
let inFlight: Promise<SecurityScanResult> | null = null;
let schedulerStarted = false;

export function getCachedScan(): SecurityScanResult {
  return cachedScan; // O(1), no I/O, every request hits this
}

export async function refreshScan(): Promise<SecurityScanResult> {
  // Coalesce concurrent refresh requests into the same in-flight call.
  if (inFlight) return inFlight;

  inFlight = (async () => {
    try {
      const result = await runSecurityScan(); // the actual OpenAI call
      cachedScan = result;
      return result;
    } catch (err) {
      logger.error({ err }, "Sentinel scan failed");
      return cachedScan; // serve stale on failure, never break the dashboard
    } finally {
      inFlight = null;
    }
  })();
  return inFlight;
}

export function startSecurityScanScheduler(intervalMs = 5 * 60 * 1000) {
  if (schedulerStarted) return;
  schedulerStarted = true;
  setTimeout(() => refreshScan().catch(() => {}), 3000); // warm-up
  setInterval(() => refreshScan().catch(() => {}), intervalMs);
}

And the route:

router.get("/security/scan", (req, res) => {
  res.json(getCachedScan());
});

That's it. 1 user or 100,000 users, the cost is the same: one LLM call every 5 minutes.

The four design decisions that matter

1. Coalesce in-flight calls

The inFlight promise is the most important line in the file. Without it, two concurrent calls to refreshScan() (e.g. scheduler + manual admin trigger) become two LLM calls. With it, the second call just awaits the first. This single guard collapses thundering-herd risk to zero.

2. Serve stale on failure

The catch block returns cachedScan instead of throwing. If OpenAI has an outage, your dashboard keeps showing the last known scan instead of a red error banner. Users stay calm. You buy yourself time to fix things.

3. Warm up before the first user hits

setTimeout(() => refreshScan(), 3000);

Three seconds after boot, kick off the first scan in the background. Otherwise the very first dashboard request after deploy waits on an OpenAI cold start (3–10s).

4. Force structured output

const completion = await openai.chat.completions.create({
  model: "gpt-5.4",
  response_format: { type: "json_object" },  // <-- strict JSON, no markdown
  messages: [
    { role: "system", content: SECURITY_SYSTEM_PROMPT },
    { role: "user", content: "Run a fresh security scan." },
  ],
});

response_format: { type: "json_object" } plus a system prompt that includes the exact schema means you can JSON.parse() without regret. Then defensively coerce every field — never trust the model's types.

The system prompt that produces useful output

The system prompt is doing real work. It defines the schema, enumerates allowed values, sets distribution expectations ("most scans should mostly be low/medium with at most 1-2 high or critical items"), and forbids prose around the JSON.

The interesting trick: include the portfolio context (holdings, win rate, exchanges connected) directly in the system prompt rather than the user message. The model produces more grounded, specific risks ("watch your AVAX exposure on bridge X") instead of generic ones ("be careful of phishing").

What about user-specific scans?

For per-user signals (e.g. "scan THIS wallet"), the same pattern applies — just key the cache:

const cache = new Map<string, { result: SecurityScanResult; ts: number }>();

export async function getScanForWallet(addr: string) {
  const hit = cache.get(addr);
  if (hit && Date.now() - hit.ts < 60_000) return hit.result;
  const result = await runSecurityScan(addr);
  cache.set(addr, { result, ts: Date.now() });
  return result;
}

But for the global header pill that says "Sentinel: secure" on every dashboard? One scan. Forever.

Try it

Watch the Sentinel pill update on the live CoinHawk dashboard:

→ https://71554e3f-e544-4c13-9297-83c480d696c1-00-3dqa8py4myltw.worf.replit.dev/

Connect a wallet, watch the header. The pill is reading directly from the cached global scan that refreshes every 5 minutes — same code as above, in production right now.

If you liked this pattern, the launch story is here: How I built an AI crypto trading dashboard in a weekend with Replit + Base.

Built with OpenAI, Express, and a stubborn refusal to scale costs linearly with users.