I built a private LLM API in one morning that logs nothing — here's exactly how

#webdev #ai #opensource

I was frustrated.

Every LLM API provider logs your prompts.
OpenAI. Anthropic. Google. All of them.
For teams building on sensitive data —
healthcare, fintech, legal — this is a blocker.

So this morning I built NullLog.
A private LLM inference API with zero data retention.
Not a policy. Architecture. Nothing is ever written to storage.

Here's exactly how I built it in a few hours.

The stack

Cloudflare Workers — edge routing, auth, key management
Cloudflare Workers AI — inference (free tier covers a lot)
Cloudflare KV — API key storage only (no prompts, no responses)
Stripe — payments, instant API key delivery
Resend — transactional email

Total infra cost to run: near zero.

How it works

Customer pays Stripe
→ Webhook fires to Cloudflare Worker
→ Worker generates API key
→ Key stored in KV (email + tier only, no usage logs)
→ Customer gets key by email in 60 seconds
→ They hit /v1/chat/completions
→ Worker routes to inference
→ Response returned
→ Nothing written anywhere

The worker (simplified)

export default {
  async fetch(request, env) {
    const url = new URL(request.url)

    if (url.pathname === '/v1/chat/completions') {
      const apiKey = request.headers
        .get('Authorization')
        ?.replace('Bearer ', '')

      // Validate key exists
      const keyData = await env.KEYS.get(apiKey, 'json')
      if (!keyData?.active) {
        return Response.json(
          { error: 'Invalid API key' }, 
          { status: 401 }
        )
      }

      // Route to inference — nothing logged
      const body = await request.json()
      const response = await env.AI.run(
        '@cf/meta/llama-4-scout-17b-16e-instruct',
        { messages: body.messages }
      )

      return Response.json({
        choices: [{
          message: { 
            role: 'assistant', 
            content: response.response 
          }
        }]
      })
    }
  }
}

Zero database writes in the inference path.
The only thing stored is whether your API key is valid.

Models available

Running the latest from Cloudflare's edge network:

Kimi K2.5 — 256k context, just launched March 2026
GPT-OSS 120B — OpenAI's open weights
Llama 4 Scout 17B — multimodal, MoE
Nemotron 120B — NVIDIA, just added March 2026
DeepSeek R1 32B — strong reasoning
Mistral Small 3.1 24B
Qwen 2.5 Coder 32B — great for code
Llama 3.3 70B

Drop-in OpenAI replacement

One line change:

from openai import OpenAI

client = OpenAI(
    api_key="your-nulllog-key",
    base_url="https://api.sparsitron.com/v1"
)

response = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[{"role": "user", "content": "Hello"}]
)

Works with LangChain, LlamaIndex, any OpenAI SDK integration.

What I learned building this

Zero logging is an architecture decision, not a policy.

Most providers say "we don't train on your data" —
but they still log. Logging and training are separate things.
True privacy means nothing written to persistent storage
anywhere in the request path.

Compliance unlocks enterprise.

GDPR, HIPAA, SOC2 — these aren't just checkboxes.
They're why enterprises can't use OpenAI directly.
Private inference is a real $B market that's mostly unsolved.

Cloudflare Workers AI is surprisingly powerful.

Running frontier models at the edge with near-zero
infra cost. The credit system is generous for early products.

Try it

Live at api.sparsitron.com

Free trial with code PHLAUNCH at
api.sparsitron.com/redeem

Would love feedback from the dev.to community —
especially on the zero-log architecture approach
and whether this solves a real pain you've faced.

I'm also building IntelliCortex — a novel neural
architecture to replace transformers. Sparsitron™
is our sparse computation approach. NullLog is
the infra layer we built to run our own experiments
privately. Patent filed.