Building Your Own AI Proxy: Route, Cache, and Monitor LLM Requests in TypeScript
In the rapidly evolving world of AI, Large Language Models (LLMs) have become indispensable tools for a myriad of applications. However, integrating and managing these powerful models in production environments comes with its own set of challenges: spiraling costs, vendor lock-in, inconsistent APIs, and a lack of observability. This is where an AI proxy becomes a game-changer.
At Juspay, a fintech company dealing with high-volume, mission-critical transactions, we've learned the hard way that robust infrastructure is paramount. Our experience building and scaling payment systems has directly informed our approach to AI integration, leading to the creation of NeuroLink—our universal AI development platform. NeuroLink isn't just an SDK; it's the foundation upon which you can build sophisticated AI infrastructure, including your own AI proxy.
This article will guide you through the process of building a powerful AI proxy using NeuroLink in TypeScript, covering key components like routing, caching, rate limiting, cost tracking, and logging.
Why Teams Build AI Proxies
Before diving into the "how," let's understand the "why." Why do engineering teams, especially in enterprise environments, invest in building their own AI proxies?
- Cost Control and Optimization: LLM usage can get expensive, fast. A proxy allows you to implement intelligent routing to the cheapest available model for a given task, enforce rate limits to prevent accidental overspending, and track costs per user or project.
- Multi-Tenancy and Access Control: For platforms serving multiple users or internal teams, a proxy can manage API keys, enforce usage quotas, and isolate access, ensuring fair usage and preventing resource contention.
- Vendor Abstraction and Resilience: Relying on a single LLM provider creates vendor lock-in and a single point of failure. A proxy abstracts away provider-specific APIs, allowing you to seamlessly switch between models (e.g., OpenAI, Anthropic, Google Gemini, AWS Bedrock) or even implement failover to a different provider if one goes down. NeuroLink, with its unified API across 13+ providers, makes this abstraction a core feature.
- Audit Logs and Observability: Understanding how LLMs are being used is crucial for debugging, compliance, and performance optimization. A proxy acts as a central point to log all requests and responses, track latency, monitor errors, and gain insights into usage patterns.
- Data Governance and Security: In sensitive environments, proxies can sanitize requests, redact Personally Identifiable Information (PII) from prompts and responses, and enforce data residency policies.
- Performance Enhancement: Caching LLM responses for common or deterministic queries can significantly reduce latency and API calls, improving user experience and cutting costs.
Key Components of an AI Proxy
A robust AI proxy typically comprises several core components:
- Request Router: Directs incoming LLM requests to the appropriate provider and model based on predefined rules (e.g., cost, latency, capability).
- Caching Layer: Stores responses for frequently asked or deterministic queries to reduce latency and API costs.
- Rate Limiting: Prevents abuse and controls spending by limiting the number of requests within a given timeframe.
- Cost Tracking: Monitors token usage and API costs, providing granular insights.
- Logging and Monitoring: Captures detailed logs of all interactions, errors, and performance metrics.
- Security & Data Sanitization: Handles API key management, input validation, and output redaction.
Building One with NeuroLink as the Foundation
NeuroLink is designed to be the "pipe layer for the AI nervous system," making it an ideal foundation for an AI proxy. Its key features—unified API, multi-provider support, middleware system, and built-in telemetry—directly address the needs of proxy development.
Let's explore how to build some of these components using NeuroLink.
Initial Setup
First, ensure you have NeuroLink installed:
npm install @juspay/neurolink
Then, configure your NeuroLink instance with the LLM providers you want to proxy. NeuroLink allows you to define multiple providers and will intelligently select the best one.
// src/proxy.ts
import { NeuroLink, type Middleware } from "@juspay/neurolink";
import { type IncomingMessage, type ServerResponse } from "http";
// Initialize NeuroLink with your desired providers
const neurolink = new NeuroLink({
// Configure providers with their API keys (ideally from environment variables)
openai: { apiKey: process.env.OPENAI_API_KEY },
anthropic: { apiKey: process.env.ANTHROPIC_API_KEY },
googleAI: { apiKey: process.env.GOOGLE_AI_API_KEY },
// ... add other providers
});
console.log("NeuroLink AI Proxy initialized.");
// This will be our HTTP server handler
async function handleRequest(req: IncomingMessage, res: ServerResponse) {
// ... proxy logic goes here
}
// Example of a simple HTTP server (can be integrated with Express, Fastify, etc.)
// import * as http from 'http';
// const server = http.createServer(handleRequest);
// server.listen(3000, () => {
// console.log('AI Proxy listening on port 3000');
// });
1. Middleware for Logging and Monitoring
NeuroLink's middleware system is perfect for implementing cross-cutting concerns like logging, cost tracking, and performance monitoring.
Let's create a logging middleware:
// src/middleware/logging.ts
import { type Middleware, type GenerateOptions, type GenerateResult } from "@juspay/neurolink";
export const loggingMiddleware: Middleware = {
name: "logging-middleware",
async onBeforeGenerate(options: GenerateOptions) {
const startTime = Date.now();
console.log(`[${this.name}] Request received:`, {
model: options.model,
provider: options.provider,
input: options.input?.text?.substring(0, 100) + "...", // Log first 100 chars
// ... other relevant options
});
return { ...options, __startTime: startTime }; // Attach startTime for later use
},
async onAfterGenerate(result: GenerateResult, options: GenerateOptions & { __startTime: number }) {
const endTime = Date.now();
const duration = endTime - options.__startTime;
console.log(`[${this.name}] Request completed in ${duration}ms:`, {
model: options.model,
provider: options.provider,
output: result.output.text?.substring(0, 100) + "...", // Log first 100 chars
// ... other relevant results
});
// Here, you could send metrics to an observability platform like OpenTelemetry, Langfuse, etc.
return result;
},
async onError(error: Error, options: GenerateOptions) {
console.error(`[${this.name}] Request failed:`, {
model: options.model,
provider: options.provider,
input: options.input?.text?.substring(0, 100) + "...",
error: error.message,
});
throw error; // Re-throw the error
},
};
// Apply the middleware to your NeuroLink instance
// neurolink.use(loggingMiddleware);
You can extend this middleware to track token usage (from result.usage), record costs, and send data to your observability platform of choice. NeuroLink also supports OpenTelemetry integration natively.
2. Caching Layer
A caching layer is crucial for optimizing performance and cost. NeuroLink's MCP (Model Context Protocol) enhancements include a built-in ToolCache. While primarily for tool calls, you can adapt a similar pattern for LLM responses or implement a custom middleware.
Here's a simplified caching middleware example:
// src/middleware/caching.ts
import { type Middleware, type GenerateOptions, type GenerateResult } from "@juspay/neurolink";
import LRUCache from "lru-cache"; // npm install lru-cache
interface CacheEntry {
result: GenerateResult;
timestamp: number;
}
const cache = new LRUCache<string, CacheEntry>({
max: 1000, // Max 1000 entries
ttl: 1000 * 60 * 5, // Cache for 5 minutes
});
export const cachingMiddleware: Middleware = {
name: "caching-middleware",
async onBeforeGenerate(options: GenerateOptions) {
// Generate a cache key based on the request
const cacheKey = JSON.stringify({
input: options.input,
model: options.model,
provider: options.provider,
// Exclude non-deterministic options like __startTime
// You might need a more sophisticated key generation for complex scenarios
});
const cached = cache.get(cacheKey);
if (cached && (Date.now() - cached.timestamp < cache.ttl!)) {
console.log(`[${this.name}] Cache hit for key: ${cacheKey}`);
return { ...options, __cachedResult: cached.result }; // Return cached result
}
console.log(`[${this.name}] Cache miss for key: ${cacheKey}`);
return options; // Proceed with generation
},
async onAfterGenerate(result: GenerateResult, options: GenerateOptions & { __cachedResult?: GenerateResult }) {
if (options.__cachedResult) {
return options.__cachedResult; // Return the result that was found in cache
}
// If not from cache, store the new result
const cacheKey = JSON.stringify({
input: options.input,
model: options.model,
provider: options.provider,
});
cache.set(cacheKey, { result, timestamp: Date.now() });
console.log(`[${this.name}] Stored new result in cache for key: ${cacheKey}`);
return result;
},
};
// Add to NeuroLink:
// neurolink.use(cachingMiddleware);
3. Request Router
NeuroLink's core functionality includes intelligent provider selection. You can configure it to automatically pick the cheapest or fastest model, or implement a custom routing logic within a middleware.
For example, to prioritize a specific model or provider for certain requests:
// src/middleware/routing.ts
import { type Middleware, type GenerateOptions } from "@juspay/neurolink";
export const routingMiddleware: Middleware = {
name: "routing-middleware",
async onBeforeGenerate(options: GenerateOptions) {
// Example: Route specific keywords to a powerful but expensive model
if (options.input?.text?.toLowerCase().includes("financial analysis")) {
console.log(`[${this.name}] Routing "financial analysis" to gpt-4o.`);
return { ...options, provider: "openai", model: "gpt-4o" };
}
// Example: Route shorter requests to a cheaper, faster model
if (options.input?.text && options.input.text.length < 50) {
console.log(`[${this.name}] Routing short request to gemini-3-flash.`);
return { ...options, provider: "googleAI", model: "gemini-3-flash" };
}
// Default NeuroLink's auto-selection or existing provider/model in options
return options;
},
};
// neurolink.use(routingMiddleware);
NeuroLink also has a ToolRouter within its MCP enhancements that supports various strategies (e.g., capability-based, round-robin). While this is for tool calls, the principles can be applied to LLM routing.
4. Security: API Key Management & Request Sanitization
Your proxy should manage API keys securely and potentially sanitize user inputs.
For API key management, ensure keys are loaded from secure environment variables or a secrets manager, not hardcoded. NeuroLink handles this by default when you initialize it with process.env.YOUR_API_KEY.
For request sanitization, you can add another middleware:
// src/middleware/sanitization.ts
import { type Middleware, type GenerateOptions } from "@juspay/neurolink";
export const sanitizationMiddleware: Middleware = {
name: "sanitization-middleware",
async onBeforeGenerate(options: GenerateOptions) {
if (options.input?.text) {
// Simple example: Remove common PII patterns
let sanitizedText = options.input.text.replace(/\d{16}/g, "[CREDIT_CARD_NUMBER]") // Credit card numbers
.replace(/\b\d{3}-\d{2}-\d{4}\b/g, "[SSN]"); // Social Security Numbers
// More robust PII detection requires NLP libraries or dedicated services
// Prevent prompt injection (basic example)
if (sanitizedText.toLowerCase().includes("ignore previous instructions")) {
console.warn(`[${this.name}] Potential prompt injection detected. Blocking request.`);
throw new Error("Invalid input: Potential prompt injection detected.");
}
return { ...options, input: { ...options.input, text: sanitizedText } };
}
return options;
},
};
// neurolink.use(sanitizationMiddleware);
When to Build vs. Buy
Building an AI proxy provides ultimate control and customization, which is critical for complex enterprise needs, stringent security requirements, or highly specialized routing logic. However, it requires development and maintenance effort.
For many teams, especially those starting out or with simpler needs, commercial solutions like Portkey, Helicone, OpenPipe, or LiteLLM Proxy offer off-the-shelf capabilities that cover many common proxy use cases (caching, logging, cost tracking). NeuroLink itself can be seen as an SDK that complements these, allowing you to integrate with them or build similar features on top.
Consider building if:
- You have unique routing logic or business rules.
- You need deep integration with existing internal systems (e.g., identity, billing, audit).
- You have strict compliance or security requirements that off-the-shelf solutions don't fully meet.
- You want complete control over the infrastructure and data flow.
- You are already using NeuroLink for unified AI access and want to leverage its ecosystem.
Consider buying if:
- You need a quick, managed solution.
- Your requirements are standard (basic caching, rate limiting, logging).
- You want to offload infrastructure maintenance.
Conclusion
Building your own AI proxy with NeuroLink in TypeScript empowers you to gain granular control over your LLM infrastructure. From optimizing costs through intelligent routing and caching to enhancing observability with comprehensive logging and ensuring security through input sanitization, a custom proxy addresses the complex challenges of production AI.
By leveraging NeuroLink's unified API and powerful middleware system, you can develop a robust, resilient, and cost-effective AI gateway tailored to your specific needs, enabling your team to build and scale AI applications with confidence.
NeuroLink — The Universal AI SDK for TypeScript
- GitHub: github.com/juspay/neurolink
- Install:
npm install @juspay/neurolink - Docs: docs.neurolink.ink
- Blog: blog.neurolink.ink — 150+ technical articles
Top comments (0)