You want AI chat in your app. You don't want to manage model hosting, GPU infrastructure, or OpenAI billing surprises. This guide gets you from zero to streaming AI chat in under 10 minutes using the LocalLLM Chat API — a hosted DeepSeek endpoint with session management and Server-Sent Events streaming.
What You Get
- Streaming chat via SSE (tokens appear as they're generated)
- Session management — each conversation has isolated context
- OpenAI-compatible — easy to swap in if you're already using the OpenAI SDK pattern
- No API key required for the model — just your RapidAPI key
Step 1: Get Your API Key
- Go to RapidAPI, create a free account
- Search "LocalLLM Chat"
- Subscribe to the free BASIC plan
- Copy your
X-RapidAPI-Key
Step 2: Python — Streaming Chat
import requests
import json
KEY = "YOUR_RAPIDAPI_KEY"
HOST = "localllm-chat.p.rapidapi.com"
BASE = f"https://{HOST}"
HEADERS = {
"X-RapidAPI-Key": KEY,
"X-RapidAPI-Host": HOST,
"Content-Type": "application/json"
}
def create_session(title: str = "My Chat") -> str:
r = requests.post(f"{BASE}/sessions",
json={"title": title, "backend": "deepseek"},
headers=HEADERS)
r.raise_for_status()
return r.json()["id"]
def chat_stream(session_id: str, message: str):
"""Stream tokens as they arrive via SSE."""
r = requests.post(
f"{BASE}/sessions/{session_id}/chat",
json={"message": message},
headers=HEADERS,
stream=True
)
r.raise_for_status()
for line in r.iter_lines():
if not line:
continue
raw = line.decode("utf-8")
if raw.startswith("data:"):
payload = raw[5:].strip()
try:
data = json.loads(payload)
if data.get("event") == "token":
print(data["data"]["text"], end="", flush=True)
elif data.get("event") == "done":
print() # newline after stream ends
break
except json.JSONDecodeError:
pass
# Main loop
session = create_session("Support Bot")
print(f"Session: {session}\n")
while True:
user_input = input("You: ").strip()
if not user_input:
continue
if user_input.lower() in ("exit", "quit"):
break
print("AI: ", end="")
chat_stream(session, user_input)
Run it:
python chat.py
Session: a1b2c3d4
You: What's the capital of France?
AI: The capital of France is Paris.
You: What's the best way to learn Python?
AI: The best way to learn Python is through hands-on practice...
Step 3: JavaScript / Node.js
const RAPIDAPI_KEY = "YOUR_RAPIDAPI_KEY";
const BASE = "https://localllm-chat.p.rapidapi.com";
const HEADERS = {
"X-RapidAPI-Key": RAPIDAPI_KEY,
"X-RapidAPI-Host": "localllm-chat.p.rapidapi.com",
"Content-Type": "application/json",
};
async function createSession(title = "JS Chat") {
const res = await fetch(`${BASE}/sessions`, {
method: "POST",
headers: HEADERS,
body: JSON.stringify({ title, backend: "deepseek" }),
});
const data = await res.json();
return data.id;
}
async function streamChat(sessionId, message) {
const res = await fetch(`${BASE}/sessions/${sessionId}/chat`, {
method: "POST",
headers: HEADERS,
body: JSON.stringify({ message }),
});
const reader = res.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
for (const line of chunk.split("\n")) {
if (!line.startsWith("data:")) continue;
try {
const data = JSON.parse(line.slice(5));
if (data.event === "token") process.stdout.write(data.data.text);
if (data.event === "done") process.stdout.write("\n");
} catch {}
}
}
}
// Usage
const sessionId = await createSession();
await streamChat(sessionId, "Explain async/await in one paragraph.");
Session Management
Each session maintains its own conversation history:
# List all sessions
r = requests.get(f"{BASE}/sessions", headers=HEADERS)
sessions = r.json()
for s in sessions:
print(f"{s['id']} — {s['title']} (last active: {s['last_active']})")
# Delete a session when done
requests.delete(f"{BASE}/sessions/{session_id}", headers=HEADERS)
Building a Customer Support Bot
SYSTEM_PROMPT = """You are a helpful customer support agent for AcmeCorp.
You help customers with orders, returns, and product questions.
Be concise, friendly, and always offer to escalate to a human if needed."""
def support_bot():
session = create_session("support")
# Prime the session with system context
chat_stream(session, f"[SYSTEM]: {SYSTEM_PROMPT}")
print("Support bot ready.\n")
while True:
customer_msg = input("Customer: ")
print("Bot: ", end="")
chat_stream(session, customer_msg)
support_bot()
Rate Limits & Pricing
| Plan | Requests/month | Price |
|---|---|---|
| BASIC | 100 | Free |
| PRO | 1,000 | $9.99/mo |
The free tier is enough for prototyping. PRO covers a production support bot handling ~30 conversations/day.
What's Under the Hood
The API runs DeepSeek — a frontier-class open-source LLM that benchmarks competitively with GPT-4 on reasoning and code tasks. Sessions are isolated per ID, so concurrent users don't share context.
Search "LocalLLM Chat" on RapidAPI by Circle of Wizards to get started. Free tier, no credit card needed.
Top comments (0)