DEV Community

Nico Domino
Nico Domino

Posted on

llama-dash - Local LLM Ops

I've been building llama-dash, a single-pane dashboard and logging proxy for a self-hosted local inference stack.

I run llama-swap + llama.cpp on a box at home and got tired of having zero visibility — no request log, no idea which model was loaded when, no way to hand out scoped access without exposing the raw backend.

So llama-dash sits in front as one public port: it proxies the OpenAI/Anthropic-compatible /v1/* endpoints unchanged (streaming SSE passes straight through), logs every request with token counts and cost estimates, and adds the stuff llama-swap doesn't have — hashed API keys, per-key rate limits and model allow-lists, routing rules, and model load/unload from the UI.

The bit I like most is that you can point Claude Code at it via ANTHROPIC_BASE_URL and watch your own usage flow through. It ships as a Docker Compose stack with the backend hidden internally.

https://github.com/ndom91/llama-dash

Top comments (0)