I was frustrated.
Every LLM API provider logs your prompts.
OpenAI. Anthropic. Google. All of them.
For teams building on sensitive data —
healthcare, fintech, legal — this is a blocker.
So this morning I built NullLog.
A private LLM inference API with zero data retention.
Not a policy. Architecture. Nothing is ever written to storage.
Here's exactly how I built it in a few hours.
The stack
- Cloudflare Workers — edge routing, auth, key management
- Cloudflare Workers AI — inference (free tier covers a lot)
- Cloudflare KV — API key storage only (no prompts, no responses)
- Stripe — payments, instant API key delivery
- Resend — transactional email
Total infra cost to run: near zero.
How it works
Customer pays Stripe
→ Webhook fires to Cloudflare Worker
→ Worker generates API key
→ Key stored in KV (email + tier only, no usage logs)
→ Customer gets key by email in 60 seconds
→ They hit /v1/chat/completions
→ Worker routes to inference
→ Response returned
→ Nothing written anywhere
The worker (simplified)
export default {
async fetch(request, env) {
const url = new URL(request.url)
if (url.pathname === '/v1/chat/completions') {
const apiKey = request.headers
.get('Authorization')
?.replace('Bearer ', '')
// Validate key exists
const keyData = await env.KEYS.get(apiKey, 'json')
if (!keyData?.active) {
return Response.json(
{ error: 'Invalid API key' },
{ status: 401 }
)
}
// Route to inference — nothing logged
const body = await request.json()
const response = await env.AI.run(
'@cf/meta/llama-4-scout-17b-16e-instruct',
{ messages: body.messages }
)
return Response.json({
choices: [{
message: {
role: 'assistant',
content: response.response
}
}]
})
}
}
}
Zero database writes in the inference path.
The only thing stored is whether your API key is valid.
Models available
Running the latest from Cloudflare's edge network:
- Kimi K2.5 — 256k context, just launched March 2026
- GPT-OSS 120B — OpenAI's open weights
- Llama 4 Scout 17B — multimodal, MoE
- Nemotron 120B — NVIDIA, just added March 2026
- DeepSeek R1 32B — strong reasoning
- Mistral Small 3.1 24B
- Qwen 2.5 Coder 32B — great for code
- Llama 3.3 70B
Drop-in OpenAI replacement
One line change:
from openai import OpenAI
client = OpenAI(
api_key="your-nulllog-key",
base_url="https://api.sparsitron.com/v1"
)
response = client.chat.completions.create(
model="kimi-k2.5",
messages=[{"role": "user", "content": "Hello"}]
)
Works with LangChain, LlamaIndex, any OpenAI SDK integration.
What I learned building this
Zero logging is an architecture decision, not a policy.
Most providers say "we don't train on your data" —
but they still log. Logging and training are separate things.
True privacy means nothing written to persistent storage
anywhere in the request path.
Compliance unlocks enterprise.
GDPR, HIPAA, SOC2 — these aren't just checkboxes.
They're why enterprises can't use OpenAI directly.
Private inference is a real $B market that's mostly unsolved.
Cloudflare Workers AI is surprisingly powerful.
Running frontier models at the edge with near-zero
infra cost. The credit system is generous for early products.
Try it
Live at api.sparsitron.com
Free trial with code PHLAUNCH at
api.sparsitron.com/redeem
Would love feedback from the dev.to community —
especially on the zero-log architecture approach
and whether this solves a real pain you've faced.
I'm also building IntelliCortex — a novel neural
architecture to replace transformers. Sparsitron™
is our sparse computation approach. NullLog is
the infra layer we built to run our own experiments
privately. Patent filed.
Top comments (1)
Hey — I'm the builder. Happy to give anyone
a free Pro trial to test before paying.
Use code PHLAUNCH at api.sparsitron.com/redeem
Would love feedback from anyone who tries it.