Originally published on my portfolio → dainwi.vercel.app
There's a moment every indie developer dreads.
You've been building for weeks. The app is finally starting to feel real — the UI is clean, the auth works, the database is humming. And then you open the OpenAI pricing page.
That was me, sometime around midnight, staring at API costs and doing rough math in my head. InterviewAI — the AI-powered interview prep platform I was building — needed a language model at its core. It needed to generate interview questions tailored to a job role and evaluate spoken answers in real time.
The OpenAI API could do all of that beautifully. But at scale, even a modest number of daily users would rack up a bill I couldn't justify as a third-year CS student building a side project between assignments.
So I asked myself: what if I just ran the model myself?
Enter Ollama
I'd heard of Ollama before but always dismissed it as a "hobbyist" tool — something you'd use to chat with a local model for fun, not something you'd build a real feature on.
I was wrong.
Ollama lets you pull and run open-source LLMs locally with a single command. No API key. No rate limits. No usage bill. It spins up a local REST API on http://localhost:11434 that you can call from any backend.
What I Used It For
InterviewAI has one core AI-powered feature, and Ollama powers it.
Generating Interview Questions
When a user enters a job role (say, "Frontend Developer at a fintech startup"), InterviewAI generates a set of tailored interview questions — behavioural, technical, and situational.
I used gpt-oss:120b-cloud — the -cloud tag means the model doesn't run on your machine at all. Ollama automatically offloads it to their cloud infrastructure, so you skip the 65GB download and the GPU requirement entirely. But from your code's perspective, it still looks like a local call to localhost:11434 — same API, zero changes to your integration.
const response = await fetch("http://localhost:11434/api/chat", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: "gpt-oss:120b-cloud",
messages: [
{
role: "user",
content: `Generate 5 interview questions for a ${role} position.
Include 2 technical, 2 behavioural, and 1 situational question.
Return as a JSON array.`
}
],
stream: false
})
});
const data = await response.json();
const questions = JSON.parse(data.response);
⚠️ Gotcha: Ollama's
/api/chatreturns the full response indata.responseas a string. If you're expecting JSON, you need to parse it yourself — and explicitly prompt the model to return only JSON with no markdown fences. Otherwise you'll get triple-backtick JSON blocks wrapping your output andJSON.parsewill throw.
The Catch: Ollama Still Needs to Be Running Locally
Here's the part nobody tells you upfront.
Even with cloud models, Ollama acts as a local proxy — your code calls localhost:11434, Ollama routes it to their cloud servers behind the scenes. That means Ollama itself still needs to be running on the machine making the call.
When I deployed InterviewAI to Vercel, the calls silently failed. Vercel serverless functions have no localhost:11434 — there's no Ollama process running there.
My solution was to keep the Ollama-powered features working in a local/self-hosted mode, while the deployed version handles it differently. For a portfolio project, this is perfectly fine — document it clearly and users who want the full AI features run it locally.
If you want cloud models to work in a fully deployed setup, Ollama offers a direct cloud API:
const response = await fetch("https://ollama.com/api/chat", {
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": `Bearer ${process.env.OLLAMA_API_KEY}`
},
body: JSON.stringify({
model: "gpt-oss:120b",
messages: [{ role: "user", content: prompt }],
stream: false
})
});
This is the production path — and it's still free-tier friendly compared to OpenAI.
Quick Setup (Try It in 2 Minutes)
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Sign in (required for cloud models)
ollama signin
# Pull the cloud model (no 65GB download)
ollama pull gpt-oss:120b-cloud
# Test it
curl http://localhost:11434/api/chat -d '{
"model": "gpt-oss:120b-cloud",
"messages": [{"role": "user", "content": "What is a closure in JavaScript?"}],
"stream": false
}'
That's it. Your local GPT is running.
Was It Worth It?
Completely.
For a project where I needed to move fast, experiment freely, and not worry about burning through API credits every time I tested a new prompt — running Ollama locally was the right call. I iterated on prompts dozens of times without a second thought about cost.
gpt-oss isn't a compromise. For structured tasks — generating questions from a template, scoring an answer against a rubric — it's genuinely good enough. And "good enough" that's free beats "perfect" that costs money you don't have.
If you're a student or indie dev building something AI-powered, give Ollama a real shot before reaching for the OpenAI dashboard. You might be surprised how far it gets you.
About Me & What I'm Building
I'm Dainwi, a third-year CS student at Galgotias University building full-stack and mobile apps.
Here's what I'm currently working on:
- 🎙️ InterviewAI — AI-powered interview prep (Next.js, Cloudflare D1, Ollama)
- 📝 Opus — A Flutter task manager with offline-first sync (like Things 3 for Android)
- 🌐 Portfolio — All my projects, blog posts, and experience
If this helped you, check out my other posts and projects at dainwi.vercel.app — I write about real problems I hit while building, not theory.
And if you're into dev content in Hinglish, follow me on Instagram @iamdainwichoudhary 🇮🇳
Have you tried Ollama? Drop your experience in the comments — I'm curious what models others are using for production-ish projects.
Top comments (0)