Most "I tried AI-generated infrastructure" posts end one of two ways: either everything worked perfectly (it didn't), or it burned everything down (also didn't happen). Mine landed somewhere more useful than either of those.
I wrote a detailed prompt, pointed Claude Code at a fresh Ubuntu Server 24.04 VM running on VMware ESXi 8, and walked away. No approvals. No babysitting. One unattended session to build a four-service AI inference stack from nothing.
Here's what came out, what broke, and the five fixes that mattered.
What I Was Building
The goal: a fully self-hosted AI stack I could use to test local models and experiment with an LLM gateway. Four services, all running in Docker on a single internal bridge network:
- Ollama for local LLM inference
- LiteLLM as an OpenAI-compatible proxy with key management and spend tracking
- Open WebUI as the chat frontend
- PostgreSQL 16 as LiteLLM's backend database
The network design was intentional. Open WebUI talks to LiteLLM, not directly to Ollama. LiteLLM routes to local models or Ollama Cloud depending on what's selected. Ollama has no host port binding at all — the only way to reach it from outside the Docker network is through the gateway. Only ports 3000 and 4000 exposed. UFW locking everything else.
This is a CPU-only dev environment (the RTX 3060 lives in my homelab box, not this VM), so I wasn't expecting blazing inference speed. I wanted a working stack I could actually build on.
The Prompt Did Most of the Work
Before running anything, I spent real time writing the prompt. Turns out this was the part that mattered most.
I covered directory structure, secret generation strategy, the full Docker Compose configuraiton, healthcheck logic, UFW rules, and required a credential summary at the end. A few things I was deliberate about:
Secrets. All passwords and API keys generated with openssl rand, stored in a .env file immediately chmod 600'd, never hardcoded anywhere. LiteLLM's config.yaml uses os.environ/ references throughout. The .env gets passed to containers via Docker's env_file: directive, which injects values as environment variables rather than mounting the file anywhere web-accessible.
Non-interactive mode. This single paragraph changed everything:
You have full sudo access. Execute every step autonomously without pausing to ask for confirmation, approval, or clarification. Treat every step as pre-approved.
Without it, Claude Code gates on nearly every tool use. File writes, sudo commands, service restarts — all of it pauses for approval. That one block is the difference between a fully autonomous run and you clicking "yes" for twenty minutes.
What It Did on Its Own
I ran the prompt and left it. The sequence, with zero input from me:
Installed Docker Engine from the official apt repo, created /opt/ai-stack/ with the full directory structure, generated all secrets with openssl rand, wrote and immediately locked down .env, wrote litellm/config.yaml using environment variable references (no secrets hardcoded), created prometheus/prometheus.yml as a required placeholder file (if this doesn't exist as a file, Docker creates it as a directory and compose fails — good catch), wrote docker-compose.yml with all four services including healthchecks and dependency ordering, configured UFW with SSH rule added first then enabled with --force, ran docker compose pull then docker compose up -d, verified each service, and printed a full credential summary.
One session. No prompts. No intervention.
90% There. Five Fixes Required.
The core infrastructure was solid. Containers came up in the right order, healthchecks resolved, the dependency chain worked, UFW was sane, secrets were handled correctly. I verified externally that nothing sensitive was web-accessible.
The issues that needed fixing weren't architecture problems. They were configuration details — three of which are useful to know regardless of how you built the stack.
Fix 1: The LiteLLM healthcheck was broken in two ways. The compose file used /health with curl. Problem: /health on LiteLLM requires an API key, so Docker got a 401 and interpreted it as a failed healthcheck. Also, curl isn't installed in the LiteLLM image. The fix was switching to /health/liveliness (no auth required) and replacing curl with a Python one-liner:
healthcheck:
test: ["CMD-SHELL", "python3 -c \"import urllib.request; urllib.request.urlopen('http://localhost:4000/health/liveliness')\""]
interval: 15s
timeout: 10s
retries: 5
start_period: 30s
This one mattered most. Open WebUI's depends_on condition is service_healthy for LiteLLM. Broken healthcheck means Open WebUI never starts. Fix the healthcheck and everything downstream resolves.
Fix 2: ENABLE_OLLAMA_API was set to false. Local models pulled into the Ollama container weren't showing up in the model selector. Setting it to "true" gives Open WebUI a direct connection to Ollama for listing local models, while still using LiteLLM as the API gateway. Simple, but invisible until you're staring at an empty model list wondering what happened.
Fix 3: DATABASE_URL leaked into Open WebUI via env_file. This is the one worth writing down somewhere.
Docker's env_file: passes every variable in the file to the container — every one. DATABASE_URL (LiteLLM's PostgreSQL connection string) was in the shared .env. Open WebUI picked it up, tried to connect to LiteLLM's Postgres instance, and crashed immediately on missing tables. The UI stopped loading entirely.
The fix: remove DATABASE_URL from .env. Set it inline on the LiteLLM service only:
litellm:
environment:
DATABASE_URL: postgresql://litellm:${POSTGRES_PASSWORD}@postgres:5432/litellm
Open WebUI then correctly fell back to its own SQLite database. Any admin account created during the misconfiguration was lost (stored in the wrong DB), so a fresh account was needed after the fix. Minor, but worth knowing ahead of time.
This pattern applies well outside this specific stack. Shared .env files and service-specific secrets don't mix cleanly. Anything owned by one service should be set inline on that service.
Fix 4: Ollama Cloud api_base was wrong. LiteLLM's OpenAI provider appends /chat/completions to whatever api_base you give it. The path needs to already include /v1. https://ollama.com constructed https://ollama.com/chat/completions — 404. https://ollama.com/v1 constructed https://ollama.com/v1/chat/completions — 200. One path component, all Ollama Cloud calls failing.
Fix 5: Ollama is in Docker. Its CLI is too. I ran ollama pull llama3.2 on the host and got command not found. The binary isn't on the host. All Ollama operations in a Dockerized install go through docker exec:
docker exec -it ollama ollama pull llama3.2
docker exec ollama ollama list
No functional impact. Just the kind of thing you forget in the moment.
The Part That Stuck With Me
Prompt quality determines output quality, full stop. The reason this went as well as it did is that the prompt was thorough. Gaps in the prompt become gaps in the output. That's not a Claude Code-specific observation, it's just how agentic automation works — same as writing a runbook for a junior tech. Vague instructions, vague results.
The env_file scope thing is the most broadly useful takeaway from this whole experiment, becuase it has nothing to do with AI-generated configs specifically. It's Docker behavior that bites people in hand-written compose files too.
And honestly? 90% correct on first autonomous run for a four-service stack with network isolation, secrets management, and firewall configuration is a result I'd take. The five fixes were configuration details. Nothing had to be rebuilt. The services came up, the network worked, the firewall was sane.
For a CPU-only dev environment on a clean VM, that's a solid starting point.
Stack: Ollama + LiteLLM + Open WebUI + PostgreSQL | Docker Compose | Ubuntu Server 24.04 | VMware ESXi 8
Top comments (0)