Ever wondered what it feels like to have your own ChatGPT running locally? Let me take you on a journey from being completely dependent on cloud APIs to having my own AI assistant running on my laptop.
The "Aha!" Moment π‘
Picture this: You're deep in a coding session at 2 AM, your OpenAI credits just ran out, and you desperately need help debugging that stubborn piece of code. Sound familiar? This exact scenario pushed me to explore local AI solutions, and boy, was I in for a treat!
Enter GPT-OSS - an open-source language model that you can actually run on consumer hardware. No more API limits, no more internet dependency, and definitely no more surprise bills!
Why Go Local? π
Before we dive into the technical bits, let me share why running AI locally changed my development workflow:
- Privacy First: Your code and conversations never leave your machine
- Cost Control: No more surprise API bills (looking at you, OpenAI π)
- Offline Capability: Code on planes, trains, or that coffee shop with terrible WiFi
- Learning Experience: Understanding how these models actually work under the hood
The Hardware Reality Check π
Let's be real about hardware requirements. GPT-OSS comes in two flavors:
GPT-OSS-20B (The Laptop Friendly)
- Memory needed: β₯16GB VRAM or unified memory
- Sweet spot: RTX 4070/4080, Apple M1/M2/M3 Macs
- My experience: Runs smoothly on my MacBook Pro M2 with 32GB
GPT-OSS-120B (The Workstation Beast)
- Memory needed: β₯60GB VRAM or unified memory
- Target: Multi-GPU setups, workstations, or that gaming rig you justified for "development"
- Reality check: This is serious hardware territory
π‘ Pro tip: Both models come MXFP4 quantized out of the box. You can offload to CPU if you're short on VRAM, but expect slower responses.
Setting Up Your Local AI Assistant π οΈ
Step 1: Get Ollama (Your New Best Friend)
Ollama is like Docker for AI models - it makes everything stupidly simple.
# Install Ollama (check their website for your OS)
# Then pull your model of choice
# For the 20B model (recommended for most setups)
ollama pull gpt-oss:20b
# For the 120B model (if you've got the hardware)
ollama pull gpt-oss:120b
Step 2: Your First Conversation
ollama run gpt-oss:20b
And just like that, you're chatting with AI running entirely on your machine! The first time I saw this work, I literally said "No way!" out loud.
Making It Pretty: Enter Open WebUI π¨
Now, terminal chat is cool and all, but let's be honest - sometimes we want that sleek ChatGPT-like interface. This is where Open WebUI becomes your new favorite tool.
Think of Open WebUI as your personal ChatGPT interface, but for local models. It supports multiple models, has RAG capabilities, and honestly looks better than some paid services I've used.
Quick Setup Options:
The Python Way (My Preferred Method):
pip install open-webui
open-webui serve
The Docker Way (For the Container Enthusiasts):
docker run -d -p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:main
Once it's running, open your browser to http://localhost:3000
and boom - you've got your own ChatGPT-like interface! Select your GPT-OSS model from the dropdown and start chatting.
If you're more of a visual learner, I found this fantastic tutorial that walks through the entire Ollama and Open WebUI setup process:
The video covers everything from installation to getting your first chat session running - perfect if you prefer following along visually!
For the API Lovers: Seamless Integration π
Here's where it gets really exciting. Ollama exposes a Chat Completions-compatible API, which means minimal code changes if you're already using OpenAI's SDK.
from openai import OpenAI
# Just point to your local Ollama instance
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama" # Dummy key, but required
)
response = client.chat.completions.create(
model="gpt-oss:20b",
messages=[
{"role": "system", "content": "You're a helpful coding assistant."},
{"role": "user", "content": "Explain async/await in JavaScript like I'm 5"}
]
)
print(response.choices[0].message.content)
The first time I ran this code and realized I was getting responses from my own hardware instead of OpenAI's servers... chef's kiss π¨βπ³π
Function Calling: Because Why Not? π οΈ
GPT-OSS supports function calling out of the box. Here's a simple weather example:
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "The city name"}
},
"required": ["city"]
},
},
}
]
response = client.chat.completions.create(
model="gpt-oss:20b",
messages=[{"role": "user", "content": "What's the weather like in Tokyo?"}],
tools=tools
)
print(response.choices[0].message)
Advanced: Agents SDK Integration π€
For those wanting to build more complex AI applications, GPT-OSS plays nicely with OpenAI's Agents SDK through LiteLLM:
import asyncio
from agents import Agent, Runner, function_tool
from agents.extensions.models.litellm_model import LitellmModel
@function_tool
def get_weather(city: str):
return f"The weather in {city} is sunny and 72Β°F."
async def main():
agent = Agent(
name="WeatherBot",
instructions="You're a helpful weather assistant.",
model=LitellmModel(model="ollama/gpt-oss:20b"),
tools=[get_weather],
)
result = await Runner.run(agent, "What's the weather in San Francisco?")
print(result.final_output)
if __name__ == "__main__":
asyncio.run(main())
My Real-World Experience π
After running GPT-OSS locally for a few months, here's what I've noticed:
The Good:
- Response times are actually pretty decent (especially on Apple Silicon)
- No more worrying about API costs during experimentation
- Perfect for code review and debugging sessions
- Great for learning and understanding how these models work
The Challenges:
- Initial setup requires some technical know-how
- Quality isn't quite GPT-4 level (but getting close!)
- Hardware requirements can be limiting
The Surprising:
- The model "personality" feels different but equally helpful
- Perfect for when you need AI help but don't want your code leaving your machine
Wrapping Up π¬
Running your own AI locally isn't just about saving money or working offline - it's about understanding the technology, maintaining privacy, and having complete control over your tools.
Is it perfect? No. Is it worth trying? Absolutely.
Whether you're a privacy-conscious developer, a cost-conscious startup, or just someone who loves playing with cutting-edge tech, GPT-OSS with Ollama and Open WebUI is definitely worth your time.
Have you tried running AI models locally? What's been your experience? Drop your thoughts in the comments below!
Found this helpful? Give it a β€οΈ and follow for more AI and development content!
Top comments (1)
I've been meaning to explore local AI setups, and your breakdown just gave me the push I needed. The way you laid out the trade-offs between the 20B and 120B models, the role of Ollama, and the seamless pairing with Open WebUI super clear and practical.