What We're Building
A production-ready chatbot with:
- Streaming responses (tokens appear in real-time)
- Conversation memory
- Error handling and retries
- Cost tracking
Prerequisites
- Python 3.10+
- An API key (get one at Token China - free 100K tokens)
Step 1: Install Dependencies
pip install openai
Step 2: Basic Chatbot
from openai import OpenAI
client = OpenAI(
api_key="your-api-key",
base_url="https://api.token-china.cc/v1"
)
class ChatBot:
def __init__(self, system_prompt="You are a helpful assistant."):
self.client = client
self.model = "deepseek-v4-pro"
self.messages = [{"role": "system", "content": system_prompt}]
self.total_tokens = 0
self.total_cost = 0.0
def chat(self, user_input: str) -> str:
self.messages.append({"role": "user", "content": user_input})
try:
response = self.client.chat.completions.create(
model=self.model,
messages=self.messages,
temperature=0.7,
max_tokens=2000
)
assistant_msg = response.choices[0].message.content
self.messages.append({"role": "assistant", "content": assistant_msg})
usage = response.usage
self.total_tokens += usage.total_tokens
self.total_cost += (usage.prompt_tokens / 1_000_000 * 2.0 +
usage.completion_tokens / 1_000_000 * 2.0)
return assistant_msg
except Exception as e:
return f"Error: {str(e)}"
# Usage
bot = ChatBot("You are a Python expert who gives concise answers.")
while True:
user_input = input("\nYou: ")
if user_input.lower() in ["quit", "exit", "q"]:
break
response = bot.chat(user_input)
print(f"\nBot: {response}")
Step 3: Add Streaming
Streaming makes the bot feel much more responsive:
def chat_stream(self, user_input: str):
self.messages.append({"role": "user", "content": user_input})
try:
stream = self.client.chat.completions.create(
model=self.model,
messages=self.messages,
temperature=0.7,
max_tokens=2000,
stream=True
)
full_response = ""
for chunk in stream:
if chunk.choices[0].delta.content:
content = chunk.choices[0].delta.content
full_response += content
print(content, end="", flush=True)
self.messages.append({"role": "assistant", "content": full_response})
return full_response
except Exception as e:
print(f"\nError: {str(e)}")
return None
Step 4: Production Considerations
Error Handling
import time
from openai import APIError, RateLimitError, APITimeoutError
def chat_with_retry(self, user_input: str, max_retries=3):
for attempt in range(max_retries):
try:
return self.chat(user_input)
except RateLimitError:
wait_time = 2 ** attempt
print(f"Rate limited. Waiting {wait_time}s...")
time.sleep(wait_time)
except APITimeoutError:
print(f"Timeout on attempt {attempt + 1}. Retrying...")
time.sleep(1)
return "Sorry, I'm having trouble connecting."
Why DeepSeek for Chatbots?
- Cost effective - $2/1M tokens vs $15/1M for GPT-4o
- Fast - DeepSeek V4 Flash is optimized for speed
- 128K context - handle long conversations
- OpenAI compatible - use existing tools and libraries
Try It Yourself
Get a free API key at Token China and start building. You get 100K free tokens to test with.
Built something cool with DeepSeek? Share it in the comments!
Top comments (0)