Cloud AI is powerful, but it comes with tradeoffs—costs, privacy concerns, and internet dependency. That’s where GPT-OSS, OpenAI’s open-weight model, changes the game.
Available in two versions—gpt-oss-120b and gpt-oss-20b—this model family can run directly on your machine. The 20B variant only needs ~16GB of RAM, making it practical for local experimentation without enterprise hardware.
By pairing GPT-OSS with Ollama, we can build offline-first, private AI assistants using plain JavaScript.
🚀 Why JavaScript?
- Universality: Runs everywhere (frontend, backend, desktop, mobile).
- Simplicity: Fetch API makes HTTP calls trivial.
- Ecosystem: Perfect for integrating into web apps, chatbots, and agents.
With Ollama exposing a REST API, JavaScript becomes one of the easiest ways to integrate GPT-OSS into your projects.
🛠 What You’ll Need
- A system with 16GB+ RAM and a GPU (or Apple Silicon Mac).
- Node.js (v18 or later) installed.
- Ollama installed and running.
- GPT-OSS model pulled locally:
ollama pull gpt-oss:20b
⚙️ Step 1: Initialize a Node.js Project
In your terminal:
mkdir gptoss-node
cd gptoss-node
npm init -y
npm install node-fetch
📦 Step 2: Write Your Chat Client
Create a new file chat.js
and add:
import fetch from "node-fetch";
async function chat() {
const apiUrl = "http://localhost:11434/api/chat";
const history = [];
console.log("Local GPT-OSS Chat — type 'exit' to quit.\n");
// Setup readline for interactive chat
const readline = await import("node:readline/promises");
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout
});
while (true) {
const userInput = await rl.question("You: ");
if (userInput.toLowerCase() === "exit") {
rl.close();
break;
}
// Push user input into history
history.push({ role: "user", content: userInput });
// Call Ollama API
const res = await fetch(apiUrl, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: "gpt-oss:20b",
messages: history
})
});
const data = await res.json();
const reply = data.message.content;
console.log("Assistant:", reply, "\n");
// Append assistant response to history
history.push({ role: "assistant", content: reply });
}
}
chat();
This script keeps a rolling chat history, so the model remembers context across turns.
▶️ Step 3: Run Your Chat
Make sure Ollama is running in the background, then start your chat app:
node chat.js
You’ll now have a fully local chatbot powered by GPT-OSS, streaming responses from your own machine—no API keys, no internet required.
🔧 What’s Next?
This basic chat loop is just the start. You can extend it into agentic applications:
- 🗂 Document Q&A — plug in embeddings + vector search (RAG).
- 🔗 Tool calling — connect GPT-OSS to APIs or databases.
- 💻 Code assistant — generate snippets and explanations locally.
- 🤖 Multi-agent systems — orchestrate multiple GPT-OSS instances for teamwork.
JavaScript also makes it trivial to wrap this into a web app using frameworks like Next.js or Express.
✅ Summary
In this post, you learned how to:
- Set up a Node.js project.
- Install dependencies (
node-fetch
). - Write a chat loop that communicates with GPT-OSS through Ollama.
- Maintain conversation history for contextual replies.
- Plan next steps toward agents, RAG, and web integrations.
With GPT-OSS and Ollama, the future of AI isn’t just in the cloud—it’s on your laptop. JavaScript developers now have the power to build private, cost-free, offline-capable AI assistants with just a few lines of code.
Top comments (0)