DEV Community

Jesse
Jesse

Posted on • Edited on

Build a Production-Ready Chatbot with DeepSeek and Python in 10 Minutes

What We're Building

A production-ready chatbot with:

  • Streaming responses (tokens appear in real-time)
  • Conversation memory
  • Error handling and retries
  • Cost tracking

Prerequisites

  • Python 3.10+
  • An API key (get one at Token China - free 100K tokens)

Step 1: Install Dependencies

pip install openai
Enter fullscreen mode Exit fullscreen mode

Step 2: Basic Chatbot

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key",
    base_url="https://api.token-china.cc/v1"
)

class ChatBot:
    def __init__(self, system_prompt="You are a helpful assistant."):
        self.client = client
        self.model = "deepseek-v4-pro"
        self.messages = [{"role": "system", "content": system_prompt}]
        self.total_tokens = 0
        self.total_cost = 0.0

    def chat(self, user_input: str) -> str:
        self.messages.append({"role": "user", "content": user_input})

        try:
            response = self.client.chat.completions.create(
                model=self.model,
                messages=self.messages,
                temperature=0.7,
                max_tokens=2000
            )

            assistant_msg = response.choices[0].message.content
            self.messages.append({"role": "assistant", "content": assistant_msg})

            usage = response.usage
            self.total_tokens += usage.total_tokens
            self.total_cost += (usage.prompt_tokens / 1_000_000 * 2.0 + 
                              usage.completion_tokens / 1_000_000 * 2.0)

            return assistant_msg

        except Exception as e:
            return f"Error: {str(e)}"

# Usage
bot = ChatBot("You are a Python expert who gives concise answers.")

while True:
    user_input = input("\nYou: ")
    if user_input.lower() in ["quit", "exit", "q"]:
        break

    response = bot.chat(user_input)
    print(f"\nBot: {response}")
Enter fullscreen mode Exit fullscreen mode

Step 3: Add Streaming

Streaming makes the bot feel much more responsive:

def chat_stream(self, user_input: str):
    self.messages.append({"role": "user", "content": user_input})

    try:
        stream = self.client.chat.completions.create(
            model=self.model,
            messages=self.messages,
            temperature=0.7,
            max_tokens=2000,
            stream=True
        )

        full_response = ""
        for chunk in stream:
            if chunk.choices[0].delta.content:
                content = chunk.choices[0].delta.content
                full_response += content
                print(content, end="", flush=True)

        self.messages.append({"role": "assistant", "content": full_response})
        return full_response

    except Exception as e:
        print(f"\nError: {str(e)}")
        return None
Enter fullscreen mode Exit fullscreen mode

Step 4: Production Considerations

Error Handling

import time
from openai import APIError, RateLimitError, APITimeoutError

def chat_with_retry(self, user_input: str, max_retries=3):
    for attempt in range(max_retries):
        try:
            return self.chat(user_input)
        except RateLimitError:
            wait_time = 2 ** attempt
            print(f"Rate limited. Waiting {wait_time}s...")
            time.sleep(wait_time)
        except APITimeoutError:
            print(f"Timeout on attempt {attempt + 1}. Retrying...")
            time.sleep(1)
    return "Sorry, I'm having trouble connecting."
Enter fullscreen mode Exit fullscreen mode

Why DeepSeek for Chatbots?

  1. Cost effective - $2/1M tokens vs $15/1M for GPT-4o
  2. Fast - DeepSeek V4 Flash is optimized for speed
  3. 128K context - handle long conversations
  4. OpenAI compatible - use existing tools and libraries

Try It Yourself

Get a free API key at Token China and start building. You get 100K free tokens to test with.


Built something cool with DeepSeek? Share it in the comments!

Top comments (0)