Add Streaming AI Chat to Any App in 10 Minutes (OpenAI-Compatible API)

#webdev #python #ai #tutorial

You want AI chat in your app. You don't want to manage model hosting, GPU infrastructure, or OpenAI billing surprises. This guide gets you from zero to streaming AI chat in under 10 minutes using the LocalLLM Chat API — a hosted DeepSeek endpoint with session management and Server-Sent Events streaming.

What You Get

Streaming chat via SSE (tokens appear as they're generated)
Session management — each conversation has isolated context
OpenAI-compatible — easy to swap in if you're already using the OpenAI SDK pattern
No API key required for the model — just your RapidAPI key

Step 1: Get Your API Key

Go to RapidAPI, create a free account
Search "LocalLLM Chat"
Subscribe to the free BASIC plan
Copy your X-RapidAPI-Key

Step 2: Python — Streaming Chat

import requests
import json

KEY = "YOUR_RAPIDAPI_KEY"
HOST = "localllm-chat.p.rapidapi.com"
BASE = f"https://{HOST}"
HEADERS = {
    "X-RapidAPI-Key": KEY,
    "X-RapidAPI-Host": HOST,
    "Content-Type": "application/json"
}


def create_session(title: str = "My Chat") -> str:
    r = requests.post(f"{BASE}/sessions",
                      json={"title": title, "backend": "deepseek"},
                      headers=HEADERS)
    r.raise_for_status()
    return r.json()["id"]


def chat_stream(session_id: str, message: str):
    """Stream tokens as they arrive via SSE."""
    r = requests.post(
        f"{BASE}/sessions/{session_id}/chat",
        json={"message": message},
        headers=HEADERS,
        stream=True
    )
    r.raise_for_status()

    for line in r.iter_lines():
        if not line:
            continue
        raw = line.decode("utf-8")
        if raw.startswith("data:"):
            payload = raw[5:].strip()
            try:
                data = json.loads(payload)
                if data.get("event") == "token":
                    print(data["data"]["text"], end="", flush=True)
                elif data.get("event") == "done":
                    print()  # newline after stream ends
                    break
            except json.JSONDecodeError:
                pass


# Main loop
session = create_session("Support Bot")
print(f"Session: {session}\n")

while True:
    user_input = input("You: ").strip()
    if not user_input:
        continue
    if user_input.lower() in ("exit", "quit"):
        break
    print("AI: ", end="")
    chat_stream(session, user_input)

Run it:

python chat.py

Session: a1b2c3d4

You: What's the capital of France?
AI: The capital of France is Paris.

You: What's the best way to learn Python?
AI: The best way to learn Python is through hands-on practice...

Step 3: JavaScript / Node.js

const RAPIDAPI_KEY = "YOUR_RAPIDAPI_KEY";
const BASE = "https://localllm-chat.p.rapidapi.com";
const HEADERS = {
  "X-RapidAPI-Key": RAPIDAPI_KEY,
  "X-RapidAPI-Host": "localllm-chat.p.rapidapi.com",
  "Content-Type": "application/json",
};

async function createSession(title = "JS Chat") {
  const res = await fetch(`${BASE}/sessions`, {
    method: "POST",
    headers: HEADERS,
    body: JSON.stringify({ title, backend: "deepseek" }),
  });
  const data = await res.json();
  return data.id;
}

async function streamChat(sessionId, message) {
  const res = await fetch(`${BASE}/sessions/${sessionId}/chat`, {
    method: "POST",
    headers: HEADERS,
    body: JSON.stringify({ message }),
  });

  const reader = res.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    for (const line of chunk.split("\n")) {
      if (!line.startsWith("data:")) continue;
      try {
        const data = JSON.parse(line.slice(5));
        if (data.event === "token") process.stdout.write(data.data.text);
        if (data.event === "done") process.stdout.write("\n");
      } catch {}
    }
  }
}

// Usage
const sessionId = await createSession();
await streamChat(sessionId, "Explain async/await in one paragraph.");

Session Management

Each session maintains its own conversation history:

# List all sessions
r = requests.get(f"{BASE}/sessions", headers=HEADERS)
sessions = r.json()
for s in sessions:
    print(f"{s['id']} — {s['title']} (last active: {s['last_active']})")

# Delete a session when done
requests.delete(f"{BASE}/sessions/{session_id}", headers=HEADERS)

Building a Customer Support Bot

SYSTEM_PROMPT = """You are a helpful customer support agent for AcmeCorp.
You help customers with orders, returns, and product questions.
Be concise, friendly, and always offer to escalate to a human if needed."""

def support_bot():
    session = create_session("support")

    # Prime the session with system context
    chat_stream(session, f"[SYSTEM]: {SYSTEM_PROMPT}")
    print("Support bot ready.\n")

    while True:
        customer_msg = input("Customer: ")
        print("Bot: ", end="")
        chat_stream(session, customer_msg)

support_bot()

Rate Limits & Pricing

Plan	Requests/month	Price
BASIC	100	Free
PRO	1,000	$9.99/mo

The free tier is enough for prototyping. PRO covers a production support bot handling ~30 conversations/day.

What's Under the Hood

The API runs DeepSeek — a frontier-class open-source LLM that benchmarks competitively with GPT-4 on reasoning and code tasks. Sessions are isolated per ID, so concurrent users don't share context.

Search "LocalLLM Chat" on RapidAPI by Circle of Wizards to get started. Free tier, no credit card needed.

DEV Community