DEV Community

Shuja
Shuja

Posted on • Originally published at dev.to

I Built an Open-Source Security Middleware for LLMs, Here's How It Works

Most AI apps connect directly to OpenAI with zero middleware.

No PII filtering. No injection defense. No spend caps. User input goes straight to a third-party API, emails, phone numbers, API keys, all of it.

I built ShieldStack TS to fix this. It's a TypeScript middleware layer that sits between your app and any LLM provider, intercepting every request and response with sub-2ms overhead.

In this article, I'll walk through the architecture, show the code, and explain the trade-offs I made.


What It Does

ShieldStack runs as a pipeline. Every prompt passes through 4 checks before reaching the LLM:

User Prompt
    ↓
① Token Budget Check   (<0.1ms)
② Injection Detection  (<0.5ms)
③ Secrets Scanning     (<0.5ms)
④ PII Redaction        (<1ms)
    ↓
Clean Prompt → LLM
Enter fullscreen mode Exit fullscreen mode

On the way back, streaming responses pass through a TransformStream that sanitizes PII and secrets chunk-by-chunk in real-time.


Quick Setup

npm install shieldstack-ts
Enter fullscreen mode Exit fullscreen mode
import { ShieldStack } from 'shieldstack-ts';

const shield = new ShieldStack({
  pii: {
    policy: 'redact',
    emails: true,
    creditCards: true,
    phoneNumbers: true,
  },
  injectionDetection: {
    threshold: 0.8,
  },
  tokenLimiter: {
    maxTokens: 10000,
    windowMs: 3600000,
  },
});

const safePrompt = await shield.evaluateRequest(
  userInput, userId, tokenEstimate
);

// safePrompt is clean — send it to your LLM
Enter fullscreen mode Exit fullscreen mode

The Architecture: Why Order Matters

The pipeline order isn't random. It's sorted by cost:

Budget check runs first because it's a single Map lookup. If a user is over their limit, why waste CPU on regex? Block them in 0.1ms and move on.

Injection detection runs second because if someone is trying to jailbreak your system, you don't want to spend time redacting their PII. Just block them.

Secrets and PII run last because regex engines are the most expensive operations in the pipeline. By this point, we know the request is legitimate and within budget.

This is a general principle for middleware design: cheapest checks first, most expensive last.


Real-Time Stream Sanitization

This was the hardest problem to solve.

When an LLM streams a response, it sends chunks like:

Chunk 1: "The customer email is user@exa"
Chunk 2: "mple.com and their phone..."
Enter fullscreen mode Exit fullscreen mode

If you scan each chunk independently, user@exa doesn't match any email regex. The PII slips through.

Most solutions buffer the entire response and scan at the end. That kills the streaming UX.

My approach uses the Web Streams API:

createStream(): TransformStream<Uint8Array, Uint8Array> {
  const pii = this.piiRedactor;
  const secrets = this.secretsDetector;
  const decoder = new TextDecoder();
  const encoder = new TextEncoder();

  let buf = '';

  return new TransformStream({
    transform(chunk, controller) {
      buf += decoder.decode(chunk, { stream: true });

      let { redactedText } = pii.redact(buf);
      ({ redactedText } = secrets.redact(redactedText));

      buf = '';
      controller.enqueue(encoder.encode(redactedText));
    },
    flush(controller) {
      const tail = decoder.decode();
      if (!tail) return;

      let { redactedText } = pii.redact(tail);
      ({ redactedText } = secrets.redact(redactedText));
      controller.enqueue(encoder.encode(redactedText));
    }
  });
}
Enter fullscreen mode Exit fullscreen mode

TransformStream<Uint8Array, Uint8Array> works natively on Node.js, Bun, and Cloudflare Workers. Zero platform-specific code.


Denial-of-Wallet Prevention

A startup got a $47,000 OpenAI bill in one weekend. One user wrote a script that hit their /chat endpoint in a loop. No rate limiting. No token caps.

ShieldStack prevents this with per-user token budgets:

const shield = new ShieldStack({
  tokenLimiter: {
    maxTokens: 10000,
    windowMs: 3600000, // 1 hour
  },
});
Enter fullscreen mode Exit fullscreen mode

The limiter uses an async sliding-window token bucket. Each request checks the budget before anything else, if you're over, you get a 403 in under 0.1ms.

For multi-server deployments, plug in Redis:

import { RedisStore } from 'shieldstack-ts';

const shield = new ShieldStack({
  tokenLimiter: {
    maxTokens: 10000,
    windowMs: 3600000,
    store: new RedisStore(redisClient),
  },
});
Enter fullscreen mode Exit fullscreen mode

The Redis adapter uses duck typing, any client that implements get(), set(), and del() works. No forced peer dependencies. ioredis, Upstash, node-redis all work out of the box.


Framework Adapters

The core is a pure TypeScript class. It never imports a framework. Thin adapters wrap it:

Express:

import { expressShield } from 'shieldstack-ts';

app.post('/chat', expressShield(shield), (req, res) => {
  // req.body is already sanitized
});
Enter fullscreen mode Exit fullscreen mode

Next.js App Router:

import { withShield } from 'shieldstack-ts';

export const POST = withShield(shield, chatHandler);
Enter fullscreen mode Exit fullscreen mode

Hono (Cloudflare Workers):

import { honoShield } from 'shieldstack-ts';

app.use('/chat', honoShield(shield));
Enter fullscreen mode Exit fullscreen mode

Trade-Offs I Made

Heuristic scoring over ML for injection detection. A weighted regex system detects jailbreak attempts in under 0.5ms. No model loading, no ONNX runtime, no cold starts. An ML classifier is on the roadmap, but for v0.1, I wanted zero overhead and zero dependencies.

Duck-typed Redis over hard dependency. The GenericRedisClient interface is 3 methods. This means no version conflicts, no bundle bloat, and edge compatibility (Upstash works on Workers, ioredis works on Node).

Web Standards over Node.js APIs. TransformStream instead of Node.js Transform. This sacrifices some Node-specific optimizations but gains universal runtime support without polyfills.


Performance

Operation Overhead
Token limit check (in-memory) < 0.1ms
Token limit check (Redis) 1–3ms
Injection detection < 0.5ms
PII redaction < 1ms
Stream sanitization per chunk < 0.2ms
Total end-to-end < 2ms

LLM calls take 500ms–5s. ShieldStack adds less than 0.4% overhead.


Try It

git clone https://github.com/ShujaSN/shieldstack-ts.git
cd shieldstack-ts
npm install
npm run test
Enter fullscreen mode Exit fullscreen mode

Run the demo:

cd examples/demo
npm install
npm run dev
Enter fullscreen mode Exit fullscreen mode

Send a prompt with a fake email, watch [REDACTED_EMAIL] appear in the stream. Try "Ignore previous instructions" , watch it get blocked.


The repo is MIT licensed. If you're building AI features and want a drop-in security layer, give it a try.

GitHub: ShieldStack TS

If you have questions about the architecture or want to contribute, open an issue or drop a comment below.

Top comments (1)

Collapse
 
freerave profile image
freerave

Love the architecture and the performance-first mindset. However, from a security engineering perspective, there is a critical vulnerability in your TransformStream implementation that allows PII to leak entirely.
By executing buf = ''; after scanning the current chunk, you are clearing the state. If the LLM naturally chunks an email as ['admin@', 'company.com'], or if an attacker manipulates the context to force specific token generation, chunk 1 and chunk 2 will be evaluated independently. The regex will fail on both, and the PII will stream directly to the user.
To fix this, you need an 'Overlapping Sliding Window' buffer, not a flush-and-forget mechanism, to ensure cross-chunk patterns are caught without breaking the stream.