<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: ChrisRemo</title>
    <description>The latest articles on DEV Community by ChrisRemo (@chrisremo85).</description>
    <link>https://dev.to/chrisremo85</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3840876%2F98df10df-c3a0-4db0-a253-672563f62e89.jpg</url>
      <title>DEV Community: ChrisRemo</title>
      <link>https://dev.to/chrisremo85</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/chrisremo85"/>
    <language>en</language>
    <item>
      <title>EU AI Act and LLM Proxies - What Your Infrastructure Layer Needs to Know</title>
      <dc:creator>ChrisRemo</dc:creator>
      <pubDate>Thu, 02 Apr 2026 12:38:07 +0000</pubDate>
      <link>https://dev.to/chrisremo85/eu-ai-act-and-llm-proxies-what-your-infrastructure-layer-needs-to-know-4jil</link>
      <guid>https://dev.to/chrisremo85/eu-ai-act-and-llm-proxies-what-your-infrastructure-layer-needs-to-know-4jil</guid>
      <description>&lt;p&gt;The EU AI Act is &lt;a href="https://artificialintelligenceact.eu/implementation-timeline/" rel="noopener noreferrer"&gt;rolling out in stages&lt;/a&gt;. Prohibited practices since February 2025, GPAI rules since August 2025, high-risk system obligations from August 2026. Full enforcement across all risk categories hits in August 2027.&lt;/p&gt;

&lt;p&gt;If you're running a proxy or gateway between your applications and LLM providers, some of this applies to you - maybe not in the way you'd expect.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who is responsible for what?
&lt;/h2&gt;

&lt;p&gt;The AI Act doesn't assign blame to a single layer. It distributes responsibilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model providers&lt;/strong&gt; (OpenAI, Anthropic, or your self-hosted vLLM) handle model safety, training data, and GPAI obligations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Infrastructure&lt;/strong&gt; (your proxy/gateway) provides secure access control, usage tracking, and routing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Your application&lt;/strong&gt; owns the use case, risk classification, transparency, and monitoring&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;End users&lt;/strong&gt; use the service within the terms you set&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A proxy doesn't train models or decide how outputs are used. But it's not a completely neutral pipe either - RBAC controls, rate limits, and routing decisions shape the compliance landscape. The legal classification of AI infrastructure providers is still being refined. This is where things stand as of Q2 2026.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real problem: prompt logging
&lt;/h2&gt;

&lt;p&gt;Here's where it gets practical. Most LLM gateways log prompts and responses. Some by default, some as an opt-in "observability" feature, some because it was easier to log everything than to think about what to exclude.&lt;/p&gt;

&lt;p&gt;Under GDPR, prompt content becomes personal data the moment someone types a name, email address, or anything identifiable. Once your proxy stores that, you're on the hook for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A legal basis to process it&lt;/li&gt;
&lt;li&gt;Records of processing activities&lt;/li&gt;
&lt;li&gt;Data subject access requests (someone asks "what data do you have on me?")&lt;/li&gt;
&lt;li&gt;Right to deletion ("delete everything about me")&lt;/li&gt;
&lt;li&gt;Breach notification scope (a data breach now includes prompt content)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This isn't theoretical. If your proxy vendor stores prompts in a database, all of this applies to your DPA.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the AI Act actually requires for logging
&lt;/h2&gt;

&lt;p&gt;Article 12 requires "automatic recording of events" for traceability - but primarily for high-risk AI systems. The key word here is &lt;em&gt;events&lt;/em&gt;, not &lt;em&gt;content&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;What the regulation expects: who made the request, when, which model, how many tokens, what happened. Metadata.&lt;/p&gt;

&lt;p&gt;What most proxies do: store the full conversation. That's not what was asked for, and it creates obligations that weren't necessary.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Relevant for compliance&lt;/th&gt;
&lt;th&gt;Common in proxies&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Usage tracking (who, when, which model)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rate limiting per user/team&lt;/td&gt;
&lt;td&gt;Recommended&lt;/td&gt;
&lt;td&gt;Sometimes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Audit logs for admin actions&lt;/td&gt;
&lt;td&gt;Yes (high-risk)&lt;/td&gt;
&lt;td&gt;Rare&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model access control (RBAC)&lt;/td&gt;
&lt;td&gt;Recommended&lt;/td&gt;
&lt;td&gt;Basic or none&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt content logging&lt;/td&gt;
&lt;td&gt;Not the baseline requirement&lt;/td&gt;
&lt;td&gt;Often default&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost tracking per team/org&lt;/td&gt;
&lt;td&gt;Recommended&lt;/td&gt;
&lt;td&gt;Sometimes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In certain high-risk deployments, more detailed documentation may be required for incident investigation. But that obligation sits with the deployer's application layer - the proxy doesn't have the context to decide what needs to be recorded for a specific use case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data minimization as architecture
&lt;/h2&gt;

&lt;p&gt;GDPR Article 5(1)(c) requires data minimization - collect only what you need. Article 25 requires Privacy by Design - build it into the system, don't bolt it on later.&lt;/p&gt;

&lt;p&gt;For a proxy, this means: if you don't need prompt content, don't store it. Not "we have a toggle to disable logging" - there should be no logging code to toggle. The proxy reads the &lt;code&gt;model&lt;/code&gt; field for routing, streams bytes to the provider and back, tracks metadata, and forgets the content.&lt;/p&gt;

&lt;p&gt;This is the approach we took with &lt;a href="https://github.com/voidmind-io/voidllm" rel="noopener noreferrer"&gt;VoidLLM&lt;/a&gt;. Not because the AI Act forced us - we made this decision before the regulation was finalized - but because zero-knowledge is the right architecture for infrastructure that handles other people's data.&lt;/p&gt;

&lt;p&gt;The practical benefit: your Data Processing Agreement covers metadata only. No prompt content in scope means no content-related breach notifications, no content-related access requests, no content retention policies for the proxy layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a proxy should give you
&lt;/h2&gt;

&lt;p&gt;For AI Act readiness at the infrastructure level:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Access control&lt;/strong&gt; - org/team/user/key hierarchy so you know who has access to what&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Usage tracking&lt;/strong&gt; - metadata per request, not content&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rate limits and token budgets&lt;/strong&gt; - constrain usage at every level&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit logs&lt;/strong&gt; - track administrative actions (who changed permissions, created keys, modified models)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metrics&lt;/strong&gt; - Prometheus or similar for monitoring and alerting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Health checks&lt;/strong&gt; - know when your upstream providers are degraded&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What a proxy should not do
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Validate AI outputs (that's the model provider's or your responsibility)&lt;/li&gt;
&lt;li&gt;Detect prohibited use cases (that's governance, not infrastructure)&lt;/li&gt;
&lt;li&gt;Inject AI disclosure labels (your app handles user-facing transparency)&lt;/li&gt;
&lt;li&gt;Assess risk levels (the use case determines classification, not the proxy)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  A useful principle
&lt;/h2&gt;

&lt;p&gt;Log content where you understand and take responsibility for it. For most architectures, that's your application - where you have context about the use case, the user, and what needs to be recorded. Not your proxy, which sees bytes flowing through.&lt;/p&gt;

&lt;p&gt;Some regulated industries may want content logging for their own reasons. That's a legitimate choice based on risk assessment. The point is it should be a conscious decision, not a proxy default you didn't know was on.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;We're engineers, not lawyers. If you're evaluating proxies for a regulated environment, the architecture matters more than the feature list. We wrote a more detailed analysis with diagrams on our &lt;a href="https://voidllm.ai/blog/eu-ai-act-llm-proxies-where-voidllm-stands" rel="noopener noreferrer"&gt;blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>privacy</category>
      <category>compliance</category>
      <category>llm</category>
    </item>
    <item>
      <title>VoidLLM vs LiteLLM - An Honest Comparison from the Builder's Perspective</title>
      <dc:creator>ChrisRemo</dc:creator>
      <pubDate>Wed, 01 Apr 2026 18:36:21 +0000</pubDate>
      <link>https://dev.to/chrisremo85/voidllm-vs-litellm-an-honest-comparison-from-the-builders-perspective-405c</link>
      <guid>https://dev.to/chrisremo85/voidllm-vs-litellm-an-honest-comparison-from-the-builders-perspective-405c</guid>
      <description>&lt;p&gt;If you're running LLMs in production, you've probably evaluated LiteLLM. It's the most popular gateway out there - 100+ providers, massive community, used by companies like Stripe and Netflix.&lt;/p&gt;

&lt;p&gt;I built &lt;a href="https://github.com/voidmind-io/voidllm" rel="noopener noreferrer"&gt;VoidLLM&lt;/a&gt; with a different set of priorities. Here's an honest comparison - including where LiteLLM is ahead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I built something different
&lt;/h2&gt;

&lt;p&gt;We were running self-hosted models in Kubernetes, hitting vLLM directly. No proxy, network policies were the only access control. It worked until we needed to know which team was burning through GPU hours.&lt;/p&gt;

&lt;p&gt;LiteLLM was the obvious first choice, but the Python runtime, startup time, and dependency tree felt heavy for what we needed. We also had a hard GDPR requirement - no prompt content could be stored anywhere.&lt;/p&gt;

&lt;p&gt;So we built VoidLLM in Go.&lt;/p&gt;

&lt;h2&gt;
  
  
  What VoidLLM does differently
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Privacy by architecture.&lt;/strong&gt; There's no "disable content logging" toggle - because there's no content logging code. The proxy reads the &lt;code&gt;model&lt;/code&gt; field from the request body, streams bytes between client and upstream, and forgets. Usage events track who, which model, how many tokens - nothing else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Single binary.&lt;/strong&gt; One Go binary (~25MB) with the admin UI embedded. No Python, no pip, no virtualenv. Download, configure, run.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Performance.&lt;/strong&gt; Under 500 microseconds of proxy overhead at 2000 RPS. We benchmarked this with Vegeta at sustained load on a 12-core machine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Built-in admin UI.&lt;/strong&gt; Key management, usage tracking, model configuration, playground, team management - all embedded in the binary. Not a separate service.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP Gateway.&lt;/strong&gt; VoidLLM doubles as an MCP gateway - register external MCP servers, proxy tool calls with scoped access control. Plus Code Mode: AI agents write JavaScript that orchestrates multiple MCP tool calls in a single WASM-sandboxed execution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RBAC.&lt;/strong&gt; Org/team/user/key hierarchy with four roles. Rate limits, token budgets, and model access control at every level. Most-restrictive-wins inheritance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Load balancing.&lt;/strong&gt; Multi-deployment models with round-robin, least-latency, weighted, and priority routing. Automatic failover with per-deployment circuit breakers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where LiteLLM is better
&lt;/h2&gt;

&lt;p&gt;I'll be honest about this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Provider coverage.&lt;/strong&gt; LiteLLM supports 100+ providers. VoidLLM supports 6 (OpenAI, Anthropic, Azure, Ollama, vLLM, custom). If you need native Bedrock, VertexAI, or Cohere integration, LiteLLM has us beat.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Community.&lt;/strong&gt; Thousands of users, extensive docs, large contributor base. VoidLLM is new. Our docs are solid but our community is just getting started.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python SDK.&lt;/strong&gt; If your stack is Python-native and you want a library you can import directly, LiteLLM's SDK is a natural fit. VoidLLM is a standalone proxy - you point your SDK at it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Observability integrations.&lt;/strong&gt; LiteLLM connects to Langfuse, Lunary, MLflow for request-level observability. VoidLLM deliberately avoids content-level logging - that's the privacy trade-off.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;VoidLLM&lt;/th&gt;
&lt;th&gt;LiteLLM&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Language&lt;/td&gt;
&lt;td&gt;Go&lt;/td&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Proxy overhead&lt;/td&gt;
&lt;td&gt;&amp;lt; 500us P50&lt;/td&gt;
&lt;td&gt;~8ms P95&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Providers&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;100+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Content logging&lt;/td&gt;
&lt;td&gt;Never (by design)&lt;/td&gt;
&lt;td&gt;Optional&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deployment&lt;/td&gt;
&lt;td&gt;Single binary&lt;/td&gt;
&lt;td&gt;Python runtime + deps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Admin UI&lt;/td&gt;
&lt;td&gt;Embedded&lt;/td&gt;
&lt;td&gt;Separate service&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP Gateway&lt;/td&gt;
&lt;td&gt;Built-in + Code Mode&lt;/td&gt;
&lt;td&gt;Recent addition&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RBAC&lt;/td&gt;
&lt;td&gt;Org/team/user/key&lt;/td&gt;
&lt;td&gt;Virtual keys&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Load balancing&lt;/td&gt;
&lt;td&gt;4 strategies + failover&lt;/td&gt;
&lt;td&gt;Retry/fallback&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;License&lt;/td&gt;
&lt;td&gt;BSL 1.1&lt;/td&gt;
&lt;td&gt;MIT&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Switching is easy
&lt;/h2&gt;

&lt;p&gt;Both are OpenAI-compatible. Switching from LiteLLM to VoidLLM (or back) is a base URL change:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="c1"&gt;# Before (LiteLLM)
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://litellm:4000/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk-...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# After (VoidLLM)
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://voidllm:8080/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vl_uk_...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your application code stays the same.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom line
&lt;/h2&gt;

&lt;p&gt;If you need 30+ LLM providers and a Python SDK with a large ecosystem, use LiteLLM.&lt;/p&gt;

&lt;p&gt;If you care about privacy by design, want zero operational overhead (one binary, SQLite default), need sub-millisecond proxy performance, or want an MCP gateway built in - take a look at &lt;a href="https://github.com/voidmind-io/voidllm" rel="noopener noreferrer"&gt;VoidLLM&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Different problems, different trade-offs. Pick what fits.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>proxy</category>
      <category>go</category>
      <category>privacy</category>
    </item>
    <item>
      <title>Code Mode: Batching MCP Tool Calls in a WASM Sandbox to Cut LLM Token Usage by 30-80%</title>
      <dc:creator>ChrisRemo</dc:creator>
      <pubDate>Sun, 29 Mar 2026 00:19:42 +0000</pubDate>
      <link>https://dev.to/chrisremo85/code-mode-batching-mcp-tool-calls-in-a-wasm-sandbox-to-cut-llm-token-usage-by-30-80-18g7</link>
      <guid>https://dev.to/chrisremo85/code-mode-batching-mcp-tool-calls-in-a-wasm-sandbox-to-cut-llm-token-usage-by-30-80-18g7</guid>
      <description>&lt;h3&gt;
  
  
  The Problem: One Tool Call Per Turn Is Expensive
&lt;/h3&gt;

&lt;p&gt;If you've worked with LLMs and tool use, you know the pattern. The model decides it needs to call a tool. It emits a tool call. Your system executes it, returns the result. The model reads the result, reasons about it, and decides it needs another tool call. Repeat.&lt;/p&gt;

&lt;p&gt;Every round trip burns tokens. The model re-reads the entire conversation history each time. For workflows that touch 5-10 tools — think "look up the customer, check their subscription, fetch recent invoices, calculate usage, draft a summary" — you're paying for the same context window over and over. The token cost adds up fast, and latency compounds with each turn.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution: Let the LLM Write the Orchestration
&lt;/h3&gt;

&lt;p&gt;Code Mode flips the pattern. Instead of one tool call per LLM turn, the model writes a short JavaScript program that orchestrates multiple tool calls in a single execution. The model gets the results all at once and reasons over the complete picture.&lt;/p&gt;

&lt;p&gt;This is inspired by &lt;a href="https://blog.cloudflare.com/code-mode-mcp/" rel="noopener noreferrer"&gt;Cloudflare's Code Mode concept&lt;/a&gt;. The difference: VoidLLM's implementation is fully self-hosted, runs in a WASM-sandboxed runtime, and integrates with any MCP server you're already running.&lt;/p&gt;

&lt;p&gt;In practice, this reduces token usage by 30-80% depending on the complexity of the tool workflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture
&lt;/h3&gt;

&lt;p&gt;The execution pipeline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The LLM receives auto-generated TypeScript type declarations describing all available MCP tools&lt;/li&gt;
&lt;li&gt;The LLM emits a JavaScript block that calls the tools it needs&lt;/li&gt;
&lt;li&gt;VoidLLM executes the JS inside a &lt;strong&gt;QuickJS runtime compiled to WebAssembly&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Tool calls within the JS are dispatched to upstream MCP servers via Streamable HTTP&lt;/li&gt;
&lt;li&gt;Results are collected and returned to the LLM in a single response&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The WASM layer is powered by &lt;a href="https://wazero.io/" rel="noopener noreferrer"&gt;Wazero&lt;/a&gt;, a pure Go WebAssembly runtime. No CGO, no external dependencies. VoidLLM stays a single static binary.&lt;/p&gt;

&lt;p&gt;Tools are exposed through an &lt;strong&gt;ES6 Proxy pattern&lt;/strong&gt; — the LLM can call any tool by name without per-tool bindings.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;invoices&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="nx"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;crm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_customer&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;cust_8a3f&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="nx"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;billing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_invoices&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;cust_8a3f&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;]);&lt;/span&gt;

&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;churnRisk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;low&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;invoices&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;inv&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;inv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;overdue&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;usage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;analytics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_usage&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;cust_8a3f&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;period&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;30d&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="nx"&gt;churnRisk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;trend&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;declining&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;high&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;medium&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Evaluated &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;invoices&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; invoices`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;invoices&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;churnRisk&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Five tool calls, conditional logic, parallel execution — all in one LLM turn instead of five. The &lt;code&gt;console.log&lt;/code&gt; output is captured and returned alongside the result.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security
&lt;/h3&gt;

&lt;p&gt;The QuickJS WASM runtime has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No filesystem access&lt;/strong&gt; — cannot read or write files on the host&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No network access&lt;/strong&gt; — only dispatched MCP tool calls go through VoidLLM's controlled proxy layer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No host access&lt;/strong&gt; — the WASM module runs in an isolated memory space&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On top of the sandbox, admins get a &lt;strong&gt;per-tool blocklist&lt;/strong&gt;. You can allow Code Mode access to your CRM tools but block it from calling &lt;code&gt;database.execute_raw_sql&lt;/code&gt;. Configuration is managed through VoidLLM's admin API and UI.&lt;/p&gt;

&lt;h3&gt;
  
  
  Getting Started
&lt;/h3&gt;

&lt;p&gt;Add your MCP servers to &lt;code&gt;voidllm.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;mcp_servers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS Knowledge&lt;/span&gt;
    &lt;span class="na"&gt;alias&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://knowledge-mcp.global.api.aws&lt;/span&gt;
    &lt;span class="na"&gt;auth_type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;none&lt;/span&gt;

&lt;span class="na"&gt;settings&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mcp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;code_mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
      &lt;span class="na"&gt;pool_size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8&lt;/span&gt;
      &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;30s&lt;/span&gt;
      &lt;span class="na"&gt;max_tool_calls&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Code Mode endpoint lives at &lt;code&gt;/api/v1/mcp&lt;/code&gt;. Connect your IDE (Claude Code, Cursor, Windsurf) and the LLM will have &lt;code&gt;list_servers&lt;/code&gt;, &lt;code&gt;search_tools&lt;/code&gt;, and &lt;code&gt;execute_code&lt;/code&gt; available alongside your regular tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  Limitations
&lt;/h3&gt;

&lt;p&gt;Being upfront about what Code Mode doesn't do yet:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SSE transport not supported&lt;/strong&gt; — only Streamable HTTP. Deprecated SSE servers are auto-detected and flagged.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No OAuth for upstream MCP servers&lt;/strong&gt; — API keys and custom headers only.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single instance only&lt;/strong&gt; — WASM pool is in-memory.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Try It
&lt;/h3&gt;

&lt;p&gt;VoidLLM is a lightweight LLM proxy written in Go — less than 2ms overhead, org/team/user hierarchy, key management, and usage tracking. Code Mode is the newest addition.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/voidmind-io/voidllm" rel="noopener noreferrer"&gt;https://github.com/voidmind-io/voidllm&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>mcp</category>
      <category>performance</category>
    </item>
    <item>
      <title>Why I built a privacy-first LLM proxy</title>
      <dc:creator>ChrisRemo</dc:creator>
      <pubDate>Tue, 24 Mar 2026 00:05:58 +0000</pubDate>
      <link>https://dev.to/chrisremo85/why-i-built-a-privacy-first-llm-proxy-4llj</link>
      <guid>https://dev.to/chrisremo85/why-i-built-a-privacy-first-llm-proxy-4llj</guid>
      <description>&lt;p&gt;Every LLM gateway I evaluated had the same problem: they logged my prompts.&lt;/p&gt;

&lt;p&gt;I'd spin up a proxy, route my team's requests through it, check the dashboard - and there they were. Full request bodies, full response bodies, sitting in someone's database. Sometimes on someone else's infrastructure.&lt;/p&gt;

&lt;p&gt;For a lot of use cases that's fine. But when you're working with customer data, internal documents, or anything remotely sensitive, "we store everything by default" isn't a feature. It's a liability.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;I needed something simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Route LLM requests through a single endpoint&lt;/li&gt;
&lt;li&gt;Manage API keys so developers don't share raw provider keys&lt;/li&gt;
&lt;li&gt;Track who's using what, how much it costs&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Never touch the actual prompts&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sounds basic, right? But every solution I found either:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Logged everything&lt;/strong&gt; - full request/response bodies in their database&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Charged per-request&lt;/strong&gt; - on top of what I'm already paying the provider&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Were way too complex&lt;/strong&gt; - sprawling microservice architectures, dozens of config files, hours of setup for something that should take minutes&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I just wanted a proxy. A dumb pipe with access control and a dashboard.&lt;/p&gt;

&lt;h2&gt;
  
  
  So I built one
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/voidmind-io/voidllm" rel="noopener noreferrer"&gt;VoidLLM&lt;/a&gt; is a single Go binary that sits between your apps and LLM providers. It's OpenAI-compatible - change your base URL, keep your SDK.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-p&lt;/span&gt; 8080:8080 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;VOIDLLM_ADMIN_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;openssl rand &lt;span class="nt"&gt;-hex&lt;/span&gt; 32&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;VOIDLLM_ENCRYPTION_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;openssl rand &lt;span class="nt"&gt;-base64&lt;/span&gt; 32&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  ghcr.io/voidmind-io/voidllm:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. SQLite database, admin UI, API - all in a 63MB Docker image.&lt;/p&gt;

&lt;h3&gt;
  
  
  What it does
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Access control:&lt;/strong&gt; Org → Team → Key hierarchy with 4 RBAC roles. Create API keys per user, per team, or per service account. Keys are HMAC-SHA256 hashed — we never store the raw key.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Usage tracking:&lt;/strong&gt; Every request logs who made it, which model, how many tokens, how long it took, what it cost. No prompt content. No response content. Just metadata.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3u2klcj0oxhp7krnl4rb.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3u2klcj0oxhp7krnl4rb.jpg" alt="VoidLLM Usage Analytics" width="800" height="393"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rate limiting:&lt;/strong&gt; Per org, per team, per key. Token budgets (daily/monthly) and request limits (per minute/per day). In-memory for single instance, Redis for distributed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Provider adapters:&lt;/strong&gt; OpenAI (passthrough), Anthropic (full message format translation), Azure OpenAI (URL mapping), Ollama, vLLM, or any OpenAI-compatible endpoint.&lt;/p&gt;

&lt;h3&gt;
  
  
  What it doesn't do
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;It never sees your prompts.&lt;/strong&gt; This isn't a toggle. There's no "enable content logging" option. The proxy reads the request body only to extract the &lt;code&gt;model&lt;/code&gt; field for routing, then passes it through. Request and response bodies exist in memory for the duration of the request - they're never written to disk, never logged, never stored anywhere.&lt;/p&gt;

&lt;p&gt;This is a hard architectural constraint, not a policy. You can audit the code - there's no code path that persists content.&lt;/p&gt;

&lt;h2&gt;
  
  
  The privacy angle
&lt;/h2&gt;

&lt;p&gt;I keep emphasizing this because it matters more than people think.&lt;/p&gt;

&lt;p&gt;If you're in the EU, GDPR applies to LLM prompts that contain personal data. If your proxy logs those prompts, congratulations - you now have a data processing operation that needs a legal basis, a retention policy, a deletion mechanism, and probably a DPIA.&lt;/p&gt;

&lt;p&gt;Or you could just... not log them.&lt;/p&gt;

&lt;p&gt;VoidLLM is GDPR-compliant by architecture. There's nothing to delete because there's nothing stored. The DPO's favorite proxy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tech decisions
&lt;/h2&gt;

&lt;p&gt;A few choices that might be interesting if you're building something similar:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Go + Fiber v3:&lt;/strong&gt; Fiber runs on fasthttp, not net/http. The proxy overhead is under 2ms. For a pass-through proxy where every millisecond counts (especially with streaming), this matters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SQLite default:&lt;/strong&gt; Zero dependencies to get started. &lt;code&gt;modernc.org/sqlite&lt;/code&gt; is pure Go — no CGO, no shared libraries. For single-instance deployments (which is most people), it just works. PostgreSQL is there when you need to scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Embedded UI:&lt;/strong&gt; The React admin dashboard is compiled into the Go binary via &lt;code&gt;embed.FS&lt;/code&gt;. No separate frontend deployment, no CORS, no reverse proxy config. One binary, one port.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HMAC-SHA256 for API keys:&lt;/strong&gt; Not bcrypt. The auth check is on the hot path of every proxy request. HMAC with a server-side secret gives O(1) lookup with constant-time comparison. Bcrypt would add 100ms+ per request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ed25519 license keys:&lt;/strong&gt; Enterprise features are gated by a signed JWT that's verified offline. No license server call on the hot path. Daily heartbeat refreshes the key in the background.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;The project is still very early. Some things on the roadmap:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model routing / fallback chains&lt;/li&gt;
&lt;li&gt;More provider adapters&lt;/li&gt;
&lt;li&gt;Documentation site&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're running LLMs (self-hosted or managed) and need access control without the prompt logging, give it a try:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/voidmind-io/voidllm" rel="noopener noreferrer"&gt;github.com/voidmind-io/voidllm&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I'm looking for early adopters.&lt;/strong&gt; If you're willing to test it in your setup and share honest feedback, I'll give you a free Enterprise license.&lt;br&gt;
Open an issue or reach out directly — I want to know what breaks, what's missing, and what you'd actually need before you'd trust this in production.&lt;/p&gt;

&lt;p&gt;It's BSL 1.1 licensed — source available, self-hosting permitted. Converts to Apache 2.0 after 4 years.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This project was built with significant assistance from AI (Claude by Anthropic).&lt;/em&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>go</category>
      <category>selfhosted</category>
      <category>proxy</category>
    </item>
  </channel>
</rss>
