DEV Community

Cover image for Introducing GPT-OSS: Run Your Own Open-Source GPT Model Locally
Varshith V Hegde
Varshith V Hegde Subscriber

Posted on

Introducing GPT-OSS: Run Your Own Open-Source GPT Model Locally

Ever wondered what it feels like to have your own ChatGPT running locally? Let me take you on a journey from being completely dependent on cloud APIs to having my own AI assistant running on my laptop.

The "Aha!" Moment πŸ’‘

Picture this: You're deep in a coding session at 2 AM, your OpenAI credits just ran out, and you desperately need help debugging that stubborn piece of code. Sound familiar? This exact scenario pushed me to explore local AI solutions, and boy, was I in for a treat!

Enter GPT-OSS - an open-source language model that you can actually run on consumer hardware. No more API limits, no more internet dependency, and definitely no more surprise bills!

Why Go Local? 🏠

Before we dive into the technical bits, let me share why running AI locally changed my development workflow:

  • Privacy First: Your code and conversations never leave your machine
  • Cost Control: No more surprise API bills (looking at you, OpenAI πŸ‘€)
  • Offline Capability: Code on planes, trains, or that coffee shop with terrible WiFi
  • Learning Experience: Understanding how these models actually work under the hood

The Hardware Reality Check πŸ“Š

Let's be real about hardware requirements. GPT-OSS comes in two flavors:

GPT-OSS-20B (The Laptop Friendly)

  • Memory needed: β‰₯16GB VRAM or unified memory
  • Sweet spot: RTX 4070/4080, Apple M1/M2/M3 Macs
  • My experience: Runs smoothly on my MacBook Pro M2 with 32GB

GPT-OSS-120B (The Workstation Beast)

  • Memory needed: β‰₯60GB VRAM or unified memory
  • Target: Multi-GPU setups, workstations, or that gaming rig you justified for "development"
  • Reality check: This is serious hardware territory

πŸ’‘ Pro tip: Both models come MXFP4 quantized out of the box. You can offload to CPU if you're short on VRAM, but expect slower responses.

Setting Up Your Local AI Assistant πŸ› οΈ

Step 1: Get Ollama (Your New Best Friend)

Ollama is like Docker for AI models - it makes everything stupidly simple.

# Install Ollama (check their website for your OS)
# Then pull your model of choice

# For the 20B model (recommended for most setups)
ollama pull gpt-oss:20b

# For the 120B model (if you've got the hardware)
ollama pull gpt-oss:120b
Enter fullscreen mode Exit fullscreen mode

Step 2: Your First Conversation

ollama run gpt-oss:20b
Enter fullscreen mode Exit fullscreen mode

And just like that, you're chatting with AI running entirely on your machine! The first time I saw this work, I literally said "No way!" out loud.

Making It Pretty: Enter Open WebUI 🎨

Now, terminal chat is cool and all, but let's be honest - sometimes we want that sleek ChatGPT-like interface. This is where Open WebUI becomes your new favorite tool.

Think of Open WebUI as your personal ChatGPT interface, but for local models. It supports multiple models, has RAG capabilities, and honestly looks better than some paid services I've used.

Quick Setup Options:

The Python Way (My Preferred Method):

pip install open-webui
open-webui serve
Enter fullscreen mode Exit fullscreen mode

The Docker Way (For the Container Enthusiasts):

docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main
Enter fullscreen mode Exit fullscreen mode

Once it's running, open your browser to http://localhost:3000 and boom - you've got your own ChatGPT-like interface! Select your GPT-OSS model from the dropdown and start chatting.

If you're more of a visual learner, I found this fantastic tutorial that walks through the entire Ollama and Open WebUI setup process:

The video covers everything from installation to getting your first chat session running - perfect if you prefer following along visually!

For the API Lovers: Seamless Integration πŸ”Œ

Here's where it gets really exciting. Ollama exposes a Chat Completions-compatible API, which means minimal code changes if you're already using OpenAI's SDK.

from openai import OpenAI

# Just point to your local Ollama instance
client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"  # Dummy key, but required
)

response = client.chat.completions.create(
    model="gpt-oss:20b",
    messages=[
        {"role": "system", "content": "You're a helpful coding assistant."},
        {"role": "user", "content": "Explain async/await in JavaScript like I'm 5"}
    ]
)

print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

The first time I ran this code and realized I was getting responses from my own hardware instead of OpenAI's servers... chef's kiss πŸ‘¨β€πŸ³πŸ’‹

Function Calling: Because Why Not? πŸ› οΈ

GPT-OSS supports function calling out of the box. Here's a simple weather example:

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "The city name"}
                },
                "required": ["city"]
            },
        },
    }
]

response = client.chat.completions.create(
    model="gpt-oss:20b",
    messages=[{"role": "user", "content": "What's the weather like in Tokyo?"}],
    tools=tools
)

print(response.choices[0].message)
Enter fullscreen mode Exit fullscreen mode

Advanced: Agents SDK Integration πŸ€–

For those wanting to build more complex AI applications, GPT-OSS plays nicely with OpenAI's Agents SDK through LiteLLM:

import asyncio
from agents import Agent, Runner, function_tool
from agents.extensions.models.litellm_model import LitellmModel

@function_tool
def get_weather(city: str):
    return f"The weather in {city} is sunny and 72Β°F."

async def main():
    agent = Agent(
        name="WeatherBot",
        instructions="You're a helpful weather assistant.",
        model=LitellmModel(model="ollama/gpt-oss:20b"),
        tools=[get_weather],
    )

    result = await Runner.run(agent, "What's the weather in San Francisco?")
    print(result.final_output)

if __name__ == "__main__":
    asyncio.run(main())
Enter fullscreen mode Exit fullscreen mode

My Real-World Experience πŸ“ˆ

After running GPT-OSS locally for a few months, here's what I've noticed:

The Good:

  • Response times are actually pretty decent (especially on Apple Silicon)
  • No more worrying about API costs during experimentation
  • Perfect for code review and debugging sessions
  • Great for learning and understanding how these models work

The Challenges:

  • Initial setup requires some technical know-how
  • Quality isn't quite GPT-4 level (but getting close!)
  • Hardware requirements can be limiting

The Surprising:

  • The model "personality" feels different but equally helpful
  • Perfect for when you need AI help but don't want your code leaving your machine

Wrapping Up 🎬

Running your own AI locally isn't just about saving money or working offline - it's about understanding the technology, maintaining privacy, and having complete control over your tools.

Is it perfect? No. Is it worth trying? Absolutely.

Whether you're a privacy-conscious developer, a cost-conscious startup, or just someone who loves playing with cutting-edge tech, GPT-OSS with Ollama and Open WebUI is definitely worth your time.

Have you tried running AI models locally? What's been your experience? Drop your thoughts in the comments below!


Found this helpful? Give it a ❀️ and follow for more AI and development content!

Top comments (1)

Collapse
 
anik_sikder_313 profile image
Anik Sikder

I've been meaning to explore local AI setups, and your breakdown just gave me the push I needed. The way you laid out the trade-offs between the 20B and 120B models, the role of Ollama, and the seamless pairing with Open WebUI super clear and practical.