Running your own LLM usually means one of two things: paying OpenAI per-token forever, or dealing with GPU provisioning, quantization, and VRAM math at 1am. Neither is great when you just want AI chat in your app.
The LocalLLM Chat API is a third option. DeepSeek-powered, hosted, streaming, session-aware — and it handles the privacy problem most hosted LLM APIs skip entirely: your users don't share session state with each other.
What It Does
- Persistent chat sessions — create a session, send messages, sessions remember the full conversation history
- SSE streaming — tokens arrive in real-time, same as ChatGPT's streaming interface
- Per-user isolation — each API subscriber's sessions are completely invisible to every other subscriber; no cross-contamination of conversation history
-
DeepSeek backend — runs
deepseek-chat, one of the strongest open-weight models at any price point -
Simple REST — standard JSON in,
text/event-streamout
Quick Start
Sign up at RapidAPI, search "LocalLLM Chat" by Circle of Wizards, subscribe to the free BASIC plan, and grab your X-RapidAPI-Key.
pip install requests sseclient-py
Step 1: Create a Session
A session holds conversation history. Each message you send is added to the context automatically — no need to re-send prior turns yourself.
import requests
KEY = "YOUR_RAPIDAPI_KEY"
HOST = "localllm-chat.p.rapidapi.com"
BASE = f"https://{HOST}"
HEADERS = {
"X-RapidAPI-Key": KEY,
"X-RapidAPI-Host": HOST,
"Content-Type": "application/json",
}
def create_session() -> str:
r = requests.post(f"{BASE}/sessions", headers=HEADERS,
json={"backend": "deepseek"})
r.raise_for_status()
data = r.json()
print(f"Session {data['id']} | model: {data['model']}")
return data["id"]
session_id = create_session()
Response:
{
"id": "43962dfa",
"created_at": "2026-06-06T23:41:32.167345+00:00",
"last_active": "2026-06-06T23:41:32.167384+00:00",
"backend": "deepseek",
"model": "deepseek-chat"
}
Step 2: Stream a Response
Chat uses Server-Sent Events. Each token arrives as a separate event, so your UI can render progressively instead of waiting for the full response.
import sseclient
def chat(session_id: str, message: str) -> str:
r = requests.post(
f"{BASE}/sessions/{session_id}/chat",
headers={**HEADERS, "Accept": "text/event-stream"},
json={"message": message},
stream=True,
)
r.raise_for_status()
full_text = ""
client = sseclient.SSEClient(r)
for event in client.events():
if event.event == "token":
import json
token = json.loads(event.data)["text"]
print(token, end="", flush=True)
full_text += token
elif event.event == "done":
print() # newline after stream ends
break
return full_text
response = chat(session_id, "In one sentence, what is quantum entanglement?")
Raw SSE stream:
event: token
data: {"text": "Quantum"}
event: token
data: {"text": " entanglement"}
event: token
data: {"text": " is"}
event: token
data: {"text": " a"}
event: token
data: {"text": " physical"}
event: token
data: {"text": " phenomenon"}
...
event: done
data: {}
The done event signals end of stream — check for it explicitly so you don't hang waiting for more tokens.
Step 3: Multi-Turn Conversation
Sessions handle history server-side. Just keep sending to the same session ID:
session_id = create_session()
# Turn 1
chat(session_id, "My name is Alex. Remember that.")
# → "Of course, Alex! I'll remember your name..."
# Turn 2
chat(session_id, "What's my name?")
# → "Your name is Alex, as you mentioned earlier."
# Turn 3
chat(session_id, "Give me a haiku about that.")
# → "Alex speaks today / A name carried through the stream / Echo finds its mark"
The model has full context of every prior turn. You don't re-send history. Sessions persist as long as the subscription is active.
JavaScript / Browser Integration
For frontend apps, use fetch with streaming:
const KEY = "YOUR_RAPIDAPI_KEY";
const HOST = "localllm-chat.p.rapidapi.com";
const BASE = `https://${HOST}`;
// Create session
async function createSession() {
const res = await fetch(`${BASE}/sessions`, {
method: "POST",
headers: {
"X-RapidAPI-Key": KEY,
"X-RapidAPI-Host": HOST,
"Content-Type": "application/json",
},
body: JSON.stringify({ backend: "deepseek" }),
});
const data = await res.json();
return data.id;
}
// Stream chat response
async function chat(sessionId, message, onToken) {
const res = await fetch(`${BASE}/sessions/${sessionId}/chat`, {
method: "POST",
headers: {
"X-RapidAPI-Key": KEY,
"X-RapidAPI-Host": HOST,
"Content-Type": "application/json",
},
body: JSON.stringify({ message }),
});
const reader = res.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop(); // keep incomplete line
let eventType = "";
for (const line of lines) {
if (line.startsWith("event: ")) {
eventType = line.slice(7).trim();
} else if (line.startsWith("data: ") && eventType === "token") {
const token = JSON.parse(line.slice(6)).text;
onToken(token);
}
}
}
}
// Usage
const sessionId = await createSession();
let output = "";
await chat(sessionId, "Explain WebSockets in 2 sentences.", (token) => {
output += token;
document.getElementById("output").textContent = output; // live update
});
Session Management
List your sessions — you only see your own. Other subscribers' sessions are invisible to you:
def list_sessions() -> list:
r = requests.get(f"{BASE}/sessions", headers=HEADERS)
r.raise_for_status()
return r.json()
sessions = list_sessions()
for s in sessions:
print(f"{s['id']} | {s['message_count']} messages | last: {s['last_active'][:10]}")
43962dfa | 3 messages | last: 2026-06-06
d09998f4 | 3 messages | last: 2026-06-06
Delete a session when done:
def delete_session(session_id: str):
r = requests.delete(f"{BASE}/sessions/{session_id}", headers=HEADERS)
r.raise_for_status()
delete_session(session_id)
A Note on Privacy
Most LLM proxy APIs run all users through shared conversation state or log everything to a central store. LocalLLM isolates sessions by API subscriber identity — the X-RapidAPI-User header the gateway injects. Your session files never appear in another subscriber's session list, and vice versa.
This matters if you're building multi-tenant apps, handling user conversations with any sensitivity, or just don't want to accidentally bleed one user's context into another's chat history.
Building a Simple CLI Chatbot
Put it all together:
import requests
import sseclient
import json
KEY = "YOUR_RAPIDAPI_KEY"
HOST = "localllm-chat.p.rapidapi.com"
BASE = f"https://{HOST}"
HEADERS = {
"X-RapidAPI-Key": KEY,
"X-RapidAPI-Host": HOST,
"Content-Type": "application/json",
}
def create_session() -> str:
r = requests.post(f"{BASE}/sessions", headers=HEADERS,
json={"backend": "deepseek"})
r.raise_for_status()
return r.json()["id"]
def chat_stream(session_id: str, message: str):
r = requests.post(
f"{BASE}/sessions/{session_id}/chat",
headers={**HEADERS, "Accept": "text/event-stream"},
json={"message": message},
stream=True,
)
r.raise_for_status()
for event in sseclient.SSEClient(r).events():
if event.event == "token":
print(json.loads(event.data)["text"], end="", flush=True)
elif event.event == "done":
print()
return
def main():
print("LocalLLM Chat (DeepSeek) — type 'quit' to exit\n")
session_id = create_session()
print(f"Session: {session_id}\n")
while True:
try:
user_input = input("You: ").strip()
except (EOFError, KeyboardInterrupt):
print("\nBye.")
break
if user_input.lower() in ("quit", "exit", "q"):
break
if not user_input:
continue
print("AI: ", end="")
chat_stream(session_id, user_input)
if __name__ == "__main__":
main()
Run it, start a conversation — context carries across every turn automatically.
Why DeepSeek?
DeepSeek-V3 (deepseek-chat) consistently outperforms GPT-4o on coding benchmarks and matches it on reasoning tasks, at a fraction of the inference cost. For apps that don't need OpenAI branding, it's a straightforward swap.
The BASIC plan covers personal projects and prototypes. PRO ($9.99/mo) unlocks higher request limits for production traffic.
The API is live on RapidAPI — search "LocalLLM Chat" by Circle of Wizards. Free to try. If you build something on top of it — a chatbot, a writing tool, a customer support prototype — drop a link in the comments.
Top comments (0)