DEV Community

optml
optml

Posted on

Securing AI Agents with 42 Built-in Plugins

In Part 1, we covered why MCP gateways matter. In Part 2, we set up ContextForge and executed tool calls. Now let's talk about what makes ContextForge genuinely different from other MCP proxies: the plugin pipeline.

ContextForge ships with 42 built-in plugins covering security, performance, content processing, input validation, and policy enforcement. In this post, we'll enable them, see them in action, and understand how they protect AI agents in production.


How the Plugin Pipeline Works

Every tool call and resource fetch passes through a chain of plugins, organized by hooks:

Agent Request
     ↓
[tool_pre_invoke]     ← Validate, filter, rate-limit BEFORE the tool runs
     ↓
  Tool Execution
     ↓
[tool_post_invoke]    ← Filter, compress, audit AFTER the tool returns
     ↓
Response to Agent
Enter fullscreen mode Exit fullscreen mode

The same pattern applies to resources (resource_pre_fetch / resource_post_fetch) and prompts (prompt_pre_fetch / prompt_post_fetch).

Plugins execute in priority order within each hook band. Same-priority plugins run in parallel for performance.

Plugin Modes

Mode Behavior
enforce Block requests that violate the plugin's rules
permissive Log violations but allow the request through
disabled Plugin is not loaded

The 42 Plugins at a Glance

Category Count Key Plugins
Security & Compliance 13 PII Filter, Secrets Detection, SQL Sanitizer, VirusTotal, Content Moderation
Performance & Optimization 8 Rate Limiter, Circuit Breaker, Cache, Watchdog, TOON Encoder
Content Processing 9 HTML→Markdown, JSON Repair, Code Formatter, Summarizer
Input Validation 6 Schema Guard, Argument Normalizer, SPARC Validator
Networking 4 Header Injector, Vault, Webhook Notification
Policy Engine 1 Unified PDP (Cedar + OPA + RBAC + MAC combined)

Plus 3 Rust-powered plugins (PyO3) for performance-critical paths: PII Filter, Secrets Detection, Encoded Exfil Detection.

Plugin Management UI (42 total)


Deep Dive: Key Plugins in Action

PII Filter — Mask Sensitive Data

The PII Filter scans tool responses for personally identifiable information and masks it before it reaches the LLM.

What it catches:

  • Social Security Numbers: 123-45-6789***-**-****
  • Email addresses: john@company.com****@****
  • Credit card numbers, phone numbers, IP addresses

Rust-powered variant: The PIIDetectorRust plugin uses PyO3 bindings for 10x faster detection on high-throughput paths.

# Verified: Rust PII detector in action
from pii_filter import PIIDetectorRust

detector = PIIDetectorRust()
result = detector.detect("My email is john@test.com and SSN is 123-45-6789")
# Returns: [PIIFinding(type="email", ...), PIIFinding(type="ssn", ...)]
Enter fullscreen mode Exit fullscreen mode

DenyList — Block Prohibited Content

Define words or patterns that must never appear in tool responses:

# plugins/config.yaml
- name: "DenyList"
  mode: "enforce"
  config:
    denied_words: ["innovative", "revolutionary", "groundbreaking"]
    action: "block"
Enter fullscreen mode Exit fullscreen mode

In enforce mode, any tool response containing these words is immediately blocked. Useful for preventing AI hallucination buzzwords, competitor names, or regulated terms.

TOON Encoder — Save 30-70% on LLM Tokens

TOON (Tool Output Optimized Notation) is a custom encoding that compresses JSON tool results before they're sent to the LLM:

Standard JSON:  {"name": "John", "role": "admin", "active": true}
TOON encoded:   n:John|r:admin|a:1
Enter fullscreen mode Exit fullscreen mode

Real-world measurement: 15% reduction on small JSON, 30-70% on larger payloads. This directly reduces LLM API costs.

Rate Limiter — Per-Team, Per-User Throttling

- name: "RateLimiter"
  mode: "enforce"
  config:
    requests_per_minute: 60
    burst: 10
Enter fullscreen mode Exit fullscreen mode

Prevent runaway agents from overwhelming backend systems or burning through API quotas.

URL Reputation — Block Malicious Endpoints

Checks URLs in tool arguments against threat intelligence feeds:

Agent tries to call: fetch_url("http://malicious.example.com/payload")
     ↓
URL Reputation Plugin: ⛔ BLOCKED (known malicious domain)
Enter fullscreen mode Exit fullscreen mode

Cached Tool Results — Avoid Redundant Calls

If an agent calls the same tool with the same arguments within the cache TTL, the cached result is returned instantly:

First call:  get_current_time(timezone="UTC")  → 150ms (real call)
Second call: get_current_time(timezone="UTC")  → 2ms (cached)
Enter fullscreen mode Exit fullscreen mode

Summarizer — LLM-Powered Response Compression

For large tool responses (documentation, logs, data dumps), the Summarizer plugin calls a secondary LLM to compress the content:

- name: "Summarizer"
  mode: "enforce"
  config:
    provider: "anthropic"
    anthropic:
      model: "claude-haiku-4-5-20251001"
      max_tokens: 256
    threshold_chars: 500
Enter fullscreen mode Exit fullscreen mode

Responses exceeding the threshold are automatically summarized before reaching the primary agent.

Unified PDP — Multi-Engine Policy Decisions

The Unified PDP plugin integrates four policy engines into one interface:

  • Cedar — AWS's policy language
  • OPA — Open Policy Agent (Rego)
  • RBAC — Native role-based access control
  • MAC — Mandatory Access Control (Bell-LaPadula model)
- name: "UnifiedPDPPlugin"
  mode: "enforce"
  config:
    engines: ["native_rbac"]  # Start simple, add Cedar/OPA as needed
Enter fullscreen mode Exit fullscreen mode

Running 28 Plugins Simultaneously

We tested 28 plugins running at the same time. Here's the verified configuration:

Total plugins: 42
Enabled: 28  |  Disabled: 14

Hooks distribution:
├─ tool_pre_invoke:     17 plugins
├─ tool_post_invoke:    26 plugins
├─ resource_pre_fetch:   7 plugins
├─ resource_post_fetch: 14 plugins
├─ prompt_pre_fetch:    10 plugins
└─ prompt_post_fetch:    7 plugins

Modes: enforce=18, permissive=10
Enter fullscreen mode Exit fullscreen mode

Performance impact: With 28 plugins active, health endpoint response stayed at 4ms, gateway queries at 7ms. The parallel execution model keeps latency low.


Observability: See Everything

Prometheus Metrics

curl http://localhost:8000/metrics/prometheus
# 801 lines, 44 metric definitions
Enter fullscreen mode Exit fullscreen mode

Key metrics include:

  • mcp_tool_calls_total — per-tool call counts
  • mcp_plugin_executions_total — per-plugin execution counts
  • mcp_tool_call_duration_seconds — latency histograms
  • mcp_active_sessions — concurrent session gauge

Aggregated JSON Metrics

curl http://localhost:8000/metrics
# Structured summary: tools, resources, servers, prompts, a2a_agents
Enter fullscreen mode Exit fullscreen mode

Tool Annotations

Each tool exposes metadata about its behavior:

{
  "name": "get_current_time",
  "annotations": {
    "readOnlyHint": true,
    "openWorldHint": true
  }
}
Enter fullscreen mode Exit fullscreen mode

Agents can use these hints to make smarter decisions about tool usage.

System Metrics Dashboard


Enabling Plugins

All plugin configuration lives in plugins/config.yaml:

plugins:
  - name: "PIIFilterPlugin"
    mode: "enforce"        # enforce | permissive | disabled
    priority: 10           # Lower = runs first
    hooks:
      - "tool_post_invoke"
      - "resource_post_fetch"
    config:
      patterns:
        - type: "ssn"
          regex: "\\d{3}-\\d{2}-\\d{4}"
          mask: "***-**-****"
Enter fullscreen mode Exit fullscreen mode

Enable the plugin system in .env:

PLUGINS_ENABLED=true
PLUGINS_CONFIG_FILE=plugins/config.yaml
Enter fullscreen mode Exit fullscreen mode

Check the status via Admin API:

curl http://localhost:8000/admin/plugins/stats \
  -H "Authorization: Bearer ${TOKEN}"
Enter fullscreen mode Exit fullscreen mode

Load Testing Results

We ran Locust with 10 concurrent users for 30 seconds against a gateway with 28 active plugins:

Endpoint Avg Response p99
/health 4ms 12ms
/gateways 7ms 22ms
/servers 13ms 38ms
/admin/ 43ms 95ms

~290 requests total, zero failures. The plugin pipeline adds minimal overhead.


Infrastructure We Verified

Component Details
HTTPS/TLS TLS 1.3, AEAD-AES256-GCM-SHA384, RSA-4096 self-signed certs
PostgreSQL Alembic migrations, 60 tables created, full CRUD verified
Redis v8.6.1, PING/PONG connectivity, caching layer
Rust Plugins PIIDetectorRust built via maturin, detect/mask verified
Load Test Locust 10 users / 30s, 4-43ms avg response

The Full Picture

Admin Dashboard with System Overview

AI Agent
   ↓
[ContextForge Gateway]
   ├─ JWT Authentication
   ├─ Token Scoping (what can you see?)
   ├─ RBAC (what can you do?)
   ├─ Plugin Pipeline:
   │   ├─ PII Filter (mask sensitive data)
   │   ├─ Rate Limiter (throttle per team)
   │   ├─ DenyList (block prohibited content)
   │   ├─ SQL Sanitizer (prevent injection)
   │   ├─ TOON Encoder (compress for LLM)
   │   ├─ URL Reputation (block malicious URLs)
   │   ├─ Watchdog (track response times)
   │   └─ ... 35 more plugins
   ├─ Prometheus Metrics
   └─ Audit Logging
   ↓
Backend Tools (MCP, REST, gRPC, A2A)
Enter fullscreen mode Exit fullscreen mode

Summary

ContextForge isn't just an MCP proxy — it's a governance layer for AI agents. The 42 built-in plugins give you:

  • Data protection without modifying agent code
  • Cost optimization through TOON compression and caching
  • Compliance with PII filtering, audit logs, and policy engines
  • Reliability with rate limiting, circuit breakers, and watchdogs
  • Visibility with Prometheus metrics and structured logging

All open source, all configurable, all running with minimal latency overhead.


ContextForge is open source under Apache 2.0.

GitHub logo IBM / mcp-context-forge

An AI Gateway, registry, and proxy that sits in front of any MCP, A2A, or REST/gRPC APIs, exposing a unified endpoint with centralized discovery, guardrails and management. Optimizes Agent & Tool calling, and supports plugins.

ContextForge

An open source registry and proxy that federates MCP, A2A, and REST/gRPC APIs with centralized governance, discovery, and observability. Optimizes Agent & Tool calling, and supports plugins.

ContextForge Banner

Build Python Package  CodeQL  Bandit Security  Dependency Review  Tests & Coverage  Lint & Static Analysis

Deploy to IBM Code Engine

Async License  PyPI  Docker Image 

ContextForge is an open source registry and proxy that federates tools, agents, and APIs into one clean endpoint for your AI clients. It provides centralized governance, discovery, and observability across your AI infrastructure:

  • Tools Gateway — MCP, REST, gRPC-to-MCP translation, and TOON compression
  • Agent Gateway — A2A protocol, OpenAI-compatible and Anthropic agent routing
  • API Gateway — Rate limiting, auth, retries, and reverse proxy for REST services
  • Plugin Extensibility — 40+ plugins for additional transports, protocols, and integrations
  • Observability — OpenTelemetry tracing with Phoenix, Jaeger, Zipkin, and other OTLP backends

It runs as a fully compliant MCP server, deployable via PyPI or Docker, and scales to multi-cluster environments on Kubernetes with Redis-backed federation and caching.

ContextForge

Table of Contents

Top comments (0)