DEV Community

Patrick DeVos
Patrick DeVos

Posted on

Add Streaming AI Chat to Any App in 10 Minutes (OpenAI-Compatible API)

You want AI chat in your app. You don't want to manage model hosting, GPU infrastructure, or OpenAI billing surprises. This guide gets you from zero to streaming AI chat in under 10 minutes using the LocalLLM Chat API — a hosted DeepSeek endpoint with session management and Server-Sent Events streaming.

What You Get

  • Streaming chat via SSE (tokens appear as they're generated)
  • Session management — each conversation has isolated context
  • OpenAI-compatible — easy to swap in if you're already using the OpenAI SDK pattern
  • No API key required for the model — just your RapidAPI key

Step 1: Get Your API Key

  1. Go to RapidAPI, create a free account
  2. Search "LocalLLM Chat"
  3. Subscribe to the free BASIC plan
  4. Copy your X-RapidAPI-Key

Step 2: Python — Streaming Chat

import requests
import json

KEY = "YOUR_RAPIDAPI_KEY"
HOST = "localllm-chat.p.rapidapi.com"
BASE = f"https://{HOST}"
HEADERS = {
    "X-RapidAPI-Key": KEY,
    "X-RapidAPI-Host": HOST,
    "Content-Type": "application/json"
}


def create_session(title: str = "My Chat") -> str:
    r = requests.post(f"{BASE}/sessions",
                      json={"title": title, "backend": "deepseek"},
                      headers=HEADERS)
    r.raise_for_status()
    return r.json()["id"]


def chat_stream(session_id: str, message: str):
    """Stream tokens as they arrive via SSE."""
    r = requests.post(
        f"{BASE}/sessions/{session_id}/chat",
        json={"message": message},
        headers=HEADERS,
        stream=True
    )
    r.raise_for_status()

    for line in r.iter_lines():
        if not line:
            continue
        raw = line.decode("utf-8")
        if raw.startswith("data:"):
            payload = raw[5:].strip()
            try:
                data = json.loads(payload)
                if data.get("event") == "token":
                    print(data["data"]["text"], end="", flush=True)
                elif data.get("event") == "done":
                    print()  # newline after stream ends
                    break
            except json.JSONDecodeError:
                pass


# Main loop
session = create_session("Support Bot")
print(f"Session: {session}\n")

while True:
    user_input = input("You: ").strip()
    if not user_input:
        continue
    if user_input.lower() in ("exit", "quit"):
        break
    print("AI: ", end="")
    chat_stream(session, user_input)
Enter fullscreen mode Exit fullscreen mode

Run it:

python chat.py
Enter fullscreen mode Exit fullscreen mode
Session: a1b2c3d4

You: What's the capital of France?
AI: The capital of France is Paris.

You: What's the best way to learn Python?
AI: The best way to learn Python is through hands-on practice...
Enter fullscreen mode Exit fullscreen mode

Step 3: JavaScript / Node.js

const RAPIDAPI_KEY = "YOUR_RAPIDAPI_KEY";
const BASE = "https://localllm-chat.p.rapidapi.com";
const HEADERS = {
  "X-RapidAPI-Key": RAPIDAPI_KEY,
  "X-RapidAPI-Host": "localllm-chat.p.rapidapi.com",
  "Content-Type": "application/json",
};

async function createSession(title = "JS Chat") {
  const res = await fetch(`${BASE}/sessions`, {
    method: "POST",
    headers: HEADERS,
    body: JSON.stringify({ title, backend: "deepseek" }),
  });
  const data = await res.json();
  return data.id;
}

async function streamChat(sessionId, message) {
  const res = await fetch(`${BASE}/sessions/${sessionId}/chat`, {
    method: "POST",
    headers: HEADERS,
    body: JSON.stringify({ message }),
  });

  const reader = res.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    for (const line of chunk.split("\n")) {
      if (!line.startsWith("data:")) continue;
      try {
        const data = JSON.parse(line.slice(5));
        if (data.event === "token") process.stdout.write(data.data.text);
        if (data.event === "done") process.stdout.write("\n");
      } catch {}
    }
  }
}

// Usage
const sessionId = await createSession();
await streamChat(sessionId, "Explain async/await in one paragraph.");
Enter fullscreen mode Exit fullscreen mode

Session Management

Each session maintains its own conversation history:

# List all sessions
r = requests.get(f"{BASE}/sessions", headers=HEADERS)
sessions = r.json()
for s in sessions:
    print(f"{s['id']}{s['title']} (last active: {s['last_active']})")

# Delete a session when done
requests.delete(f"{BASE}/sessions/{session_id}", headers=HEADERS)
Enter fullscreen mode Exit fullscreen mode

Building a Customer Support Bot

SYSTEM_PROMPT = """You are a helpful customer support agent for AcmeCorp.
You help customers with orders, returns, and product questions.
Be concise, friendly, and always offer to escalate to a human if needed."""

def support_bot():
    session = create_session("support")

    # Prime the session with system context
    chat_stream(session, f"[SYSTEM]: {SYSTEM_PROMPT}")
    print("Support bot ready.\n")

    while True:
        customer_msg = input("Customer: ")
        print("Bot: ", end="")
        chat_stream(session, customer_msg)

support_bot()
Enter fullscreen mode Exit fullscreen mode

Rate Limits & Pricing

Plan Requests/month Price
BASIC 100 Free
PRO 1,000 $9.99/mo

The free tier is enough for prototyping. PRO covers a production support bot handling ~30 conversations/day.

What's Under the Hood

The API runs DeepSeek — a frontier-class open-source LLM that benchmarks competitively with GPT-4 on reasoning and code tasks. Sessions are isolated per ID, so concurrent users don't share context.


Search "LocalLLM Chat" on RapidAPI by Circle of Wizards to get started. Free tier, no credit card needed.

Top comments (0)