<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: chiruwonder</title>
    <description>The latest articles on DEV Community by chiruwonder (@nestaibychirai).</description>
    <link>https://dev.to/nestaibychirai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3847550%2Fbbddfb6a-06e7-4a2a-9fbd-0c9455488e89.png</url>
      <title>DEV Community: chiruwonder</title>
      <link>https://dev.to/nestaibychirai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nestaibychirai"/>
    <language>en</language>
    <item>
      <title>How I built an OpenAI-compatible API layer on top of Ollama (and what broke along the way)</title>
      <dc:creator>chiruwonder</dc:creator>
      <pubDate>Sat, 28 Mar 2026 11:31:10 +0000</pubDate>
      <link>https://dev.to/nestaibychirai/how-i-built-an-openai-compatible-api-layer-on-top-of-ollama-and-what-broke-along-the-way-217m</link>
      <guid>https://dev.to/nestaibychirai/how-i-built-an-openai-compatible-api-layer-on-top-of-ollama-and-what-broke-along-the-way-217m</guid>
      <description>&lt;p&gt;I've been building NestAI for the past few months — a platform that deploys private Ollama + Open WebUI servers for teams in about 33 minutes. Recently shipped an OpenAI-compatible API layer on top of it and wanted to share what the journey looked like, including the parts that broke silently at 2am.&lt;/p&gt;

&lt;p&gt;Why OpenAI-compatible&lt;br&gt;
The obvious reason: adoption.&lt;br&gt;
Most developers already have OpenAI code. LangChain integrations, existing chatbots, internal tools. If switching to a private AI stack means rewriting everything, most teams won't bother.&lt;br&gt;
So we made it a one-line change:&lt;br&gt;
pythonfrom openai import OpenAI&lt;/p&gt;

&lt;h1&gt;
  
  
  Before
&lt;/h1&gt;

&lt;p&gt;client = OpenAI(api_key="sk-...")&lt;/p&gt;

&lt;h1&gt;
  
  
  After — everything else stays identical
&lt;/h1&gt;

&lt;p&gt;client = OpenAI(&lt;br&gt;
  base_url="&lt;a href="https://nestai.chirai.dev/api/v1" rel="noopener noreferrer"&gt;https://nestai.chirai.dev/api/v1&lt;/a&gt;",&lt;br&gt;
  api_key="YOUR_NESTAI_KEY"&lt;br&gt;
)&lt;br&gt;
Same SDK. Same methods. Same response format. Just your own infrastructure.&lt;/p&gt;

&lt;p&gt;The stack&lt;br&gt;
Each NestAI server is a dedicated Hetzner Cloud VM running:&lt;/p&gt;

&lt;p&gt;Ollama — local model inference&lt;br&gt;
Open WebUI — chat interface + API layer&lt;br&gt;
nginx — reverse proxy + SSL termination&lt;br&gt;
certbot — SSL certificates&lt;/p&gt;

&lt;p&gt;The backend that provisions these is Express/Node on another Hetzner server, using the Hetzner Cloud API to spin up VMs via cloud-init.&lt;/p&gt;

&lt;p&gt;What actually broke&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Certbot rewrites your nginx config silently
This one got me badly. The flow was:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Install nginx, write config with proxy_pass to Open WebUI&lt;br&gt;
Run certbot --nginx --redirect&lt;br&gt;
Certbot rewrites the config — and silently removes the location blocks you added&lt;/p&gt;

&lt;p&gt;Fix: after certbot runs, rewrite the nginx config programmatically with the correct SSL + proxy setup. Don't trust certbot to preserve your config.&lt;br&gt;
bashcertbot --nginx --redirect -d $DOMAIN --non-interactive --agree-tos -m $EMAIL&lt;/p&gt;

&lt;h1&gt;
  
  
  Immediately overwrite with correct config after certbot
&lt;/h1&gt;

&lt;p&gt;cat &amp;gt; /etc/nginx/sites-available/webui &amp;lt;&amp;lt; 'NGINX'&lt;br&gt;
server {&lt;br&gt;
    listen 443 ssl;&lt;br&gt;
    location / {&lt;br&gt;
        proxy_pass &lt;a href="http://localhost:3000" rel="noopener noreferrer"&gt;http://localhost:3000&lt;/a&gt;;&lt;br&gt;
        proxy_read_timeout 600;&lt;br&gt;
        proxy_buffering off;  # critical for streaming&lt;br&gt;
    }&lt;br&gt;
}&lt;br&gt;
NGINX&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Model pull completes but Open WebUI doesn't know yet
Ollama pulls the model fine. But Open WebUI caches the model list on startup. So you'd pull llama3.1 and it wouldn't appear in the UI until a restart.
Fix: restart Open WebUI after the model pull completes, or hit the /api/tags endpoint to trigger a refresh.&lt;/li&gt;
&lt;li&gt;Streaming breaks without proxy_buffering off
If you're using the streaming API and responses cut off randomly — it's almost certainly nginx buffering. One line fixes it:
nginxproxy_buffering off;
proxy_read_timeout 600;
Without proxy_buffering off, nginx collects the whole streamed response before sending it to the client. Looks broken. Took me an embarrassingly long time to find this.&lt;/li&gt;
&lt;li&gt;First user becomes WebUI admin unintentionally
Open WebUI makes the first signup the admin. If a team member visits before the owner sets up their account — they become admin.
Fix: auto-create an admin account via the API immediately after WebUI starts, before any users can reach it:
bashSIGNUP_RESP=$(curl -sf -X POST &lt;a href="http://localhost:3000/api/v1/auths/signup" rel="noopener noreferrer"&gt;http://localhost:3000/api/v1/auths/signup&lt;/a&gt; \
-H "Content-Type: application/json" \
-d '{"name":"Admin","email":"&lt;a href="mailto:admin@yourserver.local"&gt;admin@yourserver.local&lt;/a&gt;","password":"'$ADMIN_PASS'"}')
Store those credentials somewhere — you'll need them for analytics collection later.&lt;/li&gt;
&lt;li&gt;Swap space or the model just won't load
Running a 7B model on a server with 8GB RAM and no swap configured? It'll silently fail or crash mid-inference with no useful error.
Always configure swap before starting Ollama:
bashfallocate -l 8G /swapfile
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
echo '/swapfile none swap sw 0 0' &amp;gt;&amp;gt; /etc/fstab&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The API in practice&lt;br&gt;
Once it's running, the OpenAI compatibility means you can drop it into basically anything:&lt;br&gt;
LangChain:&lt;br&gt;
pythonfrom langchain_openai import ChatOpenAI&lt;/p&gt;

&lt;p&gt;llm = ChatOpenAI(&lt;br&gt;
    base_url="&lt;a href="https://nestai.chirai.dev/api/v1" rel="noopener noreferrer"&gt;https://nestai.chirai.dev/api/v1&lt;/a&gt;",&lt;br&gt;
    api_key="your-key",&lt;br&gt;
    model="llama3.1"&lt;br&gt;
)&lt;br&gt;
Streaming in Node.js:&lt;br&gt;
javascriptimport OpenAI from "openai"&lt;/p&gt;

&lt;p&gt;const client = new OpenAI({&lt;br&gt;
  baseURL: "&lt;a href="https://nestai.chirai.dev/api/v1" rel="noopener noreferrer"&gt;https://nestai.chirai.dev/api/v1&lt;/a&gt;",&lt;br&gt;
  apiKey: "your-key"&lt;br&gt;
})&lt;/p&gt;

&lt;p&gt;const stream = await client.chat.completions.create({&lt;br&gt;
  model: "llama3.1",&lt;br&gt;
  messages: [{ role: "user", content: "Hello" }],&lt;br&gt;
  stream: true,&lt;br&gt;
})&lt;/p&gt;

&lt;p&gt;for await (const chunk of stream) {&lt;br&gt;
  process.stdout.write(chunk.choices[0]?.delta?.content || "")&lt;br&gt;
}&lt;br&gt;
No rate limits — unlike OpenAI's RPM/TPM caps, you're hitting your own server. The only limit is what your VM can handle. A 7B model on 8 cores does about 20-30 tok/s, which is fine for most internal tooling.&lt;/p&gt;

&lt;p&gt;What I'd do differently&lt;br&gt;
Start with the API layer from day one. I added it later and had to retrofit some things. If you're building on Ollama for teams, the API is the product — the chat UI is just one consumer of it.&lt;br&gt;
Log everything during provisioning. Cloud-init runs in the dark. Add verbose logging at every step, send it back to your backend via callbacks. You'll thank yourself at 2am when a deployment fails silently.&lt;br&gt;
Test on a fresh VM every time. Your local Ollama setup has state accumulated over months. A fresh VM will surface issues your local environment hides.&lt;/p&gt;

&lt;p&gt;Where it is now&lt;br&gt;
NestAI is live at nestai.chirai.dev — deploys a private Ollama server for your team in ~33 minutes, OpenAI-compatible API included, starts at $40/month (₹3,499). There's a $2/₹99 trial if you want to kick the tyres.&lt;br&gt;
The full API docs are at nestai.chirai.dev/docs/api.&lt;br&gt;
Happy to answer questions about any of the above — especially the nginx/certbot stuff which I've seen trip up a lot of people.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>selfhosted</category>
      <category>pgaichallenge</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
