Running GPT-OSS Locally with JavaScript and Ollama

#webdev #programming #ai #javascript

Cloud AI is powerful, but it comes with tradeoffs—costs, privacy concerns, and internet dependency. That’s where GPT-OSS, OpenAI’s open-weight model, changes the game.

Available in two versions—gpt-oss-120b and gpt-oss-20b—this model family can run directly on your machine. The 20B variant only needs ~16GB of RAM, making it practical for local experimentation without enterprise hardware.

By pairing GPT-OSS with Ollama, we can build offline-first, private AI assistants using plain JavaScript.

🚀 Why JavaScript?

Universality: Runs everywhere (frontend, backend, desktop, mobile).
Simplicity: Fetch API makes HTTP calls trivial.
Ecosystem: Perfect for integrating into web apps, chatbots, and agents.

With Ollama exposing a REST API, JavaScript becomes one of the easiest ways to integrate GPT-OSS into your projects.

🛠 What You’ll Need

A system with 16GB+ RAM and a GPU (or Apple Silicon Mac).
Node.js (v18 or later) installed.
Ollama installed and running.
GPT-OSS model pulled locally:

   ollama pull gpt-oss:20b

⚙️ Step 1: Initialize a Node.js Project

In your terminal:

mkdir gptoss-node
cd gptoss-node
npm init -y
npm install node-fetch

📦 Step 2: Write Your Chat Client

Create a new file chat.js and add:

import fetch from "node-fetch";

async function chat() {
  const apiUrl = "http://localhost:11434/api/chat";
  const history = [];

  console.log("Local GPT-OSS Chat — type 'exit' to quit.\n");

  // Setup readline for interactive chat
  const readline = await import("node:readline/promises");
  const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout
  });

  while (true) {
    const userInput = await rl.question("You: ");

    if (userInput.toLowerCase() === "exit") {
      rl.close();
      break;
    }

    // Push user input into history
    history.push({ role: "user", content: userInput });

    // Call Ollama API
    const res = await fetch(apiUrl, {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({
        model: "gpt-oss:20b",
        messages: history
      })
    });

    const data = await res.json();
    const reply = data.message.content;

    console.log("Assistant:", reply, "\n");

    // Append assistant response to history
    history.push({ role: "assistant", content: reply });
  }
}

chat();

This script keeps a rolling chat history, so the model remembers context across turns.

▶️ Step 3: Run Your Chat

Make sure Ollama is running in the background, then start your chat app:

node chat.js

You’ll now have a fully local chatbot powered by GPT-OSS, streaming responses from your own machine—no API keys, no internet required.

🔧 What’s Next?

This basic chat loop is just the start. You can extend it into agentic applications:

🗂 Document Q&A — plug in embeddings + vector search (RAG).
🔗 Tool calling — connect GPT-OSS to APIs or databases.
💻 Code assistant — generate snippets and explanations locally.
🤖 Multi-agent systems — orchestrate multiple GPT-OSS instances for teamwork.

JavaScript also makes it trivial to wrap this into a web app using frameworks like Next.js or Express.

✅ Summary

In this post, you learned how to:

Set up a Node.js project.
Install dependencies (node-fetch).
Write a chat loop that communicates with GPT-OSS through Ollama.
Maintain conversation history for contextual replies.
Plan next steps toward agents, RAG, and web integrations.

With GPT-OSS and Ollama, the future of AI isn’t just in the cloud—it’s on your laptop. JavaScript developers now have the power to build private, cost-free, offline-capable AI assistants with just a few lines of code.