DEV Community

KAMAL KISHOR
KAMAL KISHOR

Posted on

Running GPT-OSS Locally with JavaScript and Ollama

Cloud AI is powerful, but it comes with tradeoffs—costs, privacy concerns, and internet dependency. That’s where GPT-OSS, OpenAI’s open-weight model, changes the game.

Available in two versions—gpt-oss-120b and gpt-oss-20b—this model family can run directly on your machine. The 20B variant only needs ~16GB of RAM, making it practical for local experimentation without enterprise hardware.

By pairing GPT-OSS with Ollama, we can build offline-first, private AI assistants using plain JavaScript.


🚀 Why JavaScript?

  • Universality: Runs everywhere (frontend, backend, desktop, mobile).
  • Simplicity: Fetch API makes HTTP calls trivial.
  • Ecosystem: Perfect for integrating into web apps, chatbots, and agents.

With Ollama exposing a REST API, JavaScript becomes one of the easiest ways to integrate GPT-OSS into your projects.


🛠 What You’ll Need

  1. A system with 16GB+ RAM and a GPU (or Apple Silicon Mac).
  2. Node.js (v18 or later) installed.
  3. Ollama installed and running.
  4. GPT-OSS model pulled locally:
   ollama pull gpt-oss:20b
Enter fullscreen mode Exit fullscreen mode

⚙️ Step 1: Initialize a Node.js Project

In your terminal:

mkdir gptoss-node
cd gptoss-node
npm init -y
npm install node-fetch
Enter fullscreen mode Exit fullscreen mode

📦 Step 2: Write Your Chat Client

Create a new file chat.js and add:

import fetch from "node-fetch";

async function chat() {
  const apiUrl = "http://localhost:11434/api/chat";
  const history = [];

  console.log("Local GPT-OSS Chat — type 'exit' to quit.\n");

  // Setup readline for interactive chat
  const readline = await import("node:readline/promises");
  const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout
  });

  while (true) {
    const userInput = await rl.question("You: ");

    if (userInput.toLowerCase() === "exit") {
      rl.close();
      break;
    }

    // Push user input into history
    history.push({ role: "user", content: userInput });

    // Call Ollama API
    const res = await fetch(apiUrl, {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({
        model: "gpt-oss:20b",
        messages: history
      })
    });

    const data = await res.json();
    const reply = data.message.content;

    console.log("Assistant:", reply, "\n");

    // Append assistant response to history
    history.push({ role: "assistant", content: reply });
  }
}

chat();
Enter fullscreen mode Exit fullscreen mode

This script keeps a rolling chat history, so the model remembers context across turns.


▶️ Step 3: Run Your Chat

Make sure Ollama is running in the background, then start your chat app:

node chat.js
Enter fullscreen mode Exit fullscreen mode

You’ll now have a fully local chatbot powered by GPT-OSS, streaming responses from your own machine—no API keys, no internet required.


🔧 What’s Next?

This basic chat loop is just the start. You can extend it into agentic applications:

  • 🗂 Document Q&A — plug in embeddings + vector search (RAG).
  • 🔗 Tool calling — connect GPT-OSS to APIs or databases.
  • 💻 Code assistant — generate snippets and explanations locally.
  • 🤖 Multi-agent systems — orchestrate multiple GPT-OSS instances for teamwork.

JavaScript also makes it trivial to wrap this into a web app using frameworks like Next.js or Express.


✅ Summary

In this post, you learned how to:

  1. Set up a Node.js project.
  2. Install dependencies (node-fetch).
  3. Write a chat loop that communicates with GPT-OSS through Ollama.
  4. Maintain conversation history for contextual replies.
  5. Plan next steps toward agents, RAG, and web integrations.

With GPT-OSS and Ollama, the future of AI isn’t just in the cloud—it’s on your laptop. JavaScript developers now have the power to build private, cost-free, offline-capable AI assistants with just a few lines of code.

Top comments (0)