DEV Community

Cover image for How to Build a Code Assistant Chatbot with the Claude API and Python
Jon Ibañez del campo
Jon Ibañez del campo

Posted on

How to Build a Code Assistant Chatbot with the Claude API and Python

You already use Claude every day.

For debugging Python. For understanding C++ errors. For summarizing papers before an exam. At some point you stop going to the browser and start wondering — can I just have this inside my terminal, in my workflow, without switching windows every five minutes?

Yes. That's what this tutorial builds.

A terminal chatbot that lives in your code environment. You paste code, ask questions, get answers, and keep the conversation going — all without leaving your terminal.

This is part two of a series. If you haven't set up the Claude API yet, start with part one.


What We're Building

A terminal chatbot with three behaviors:

  • You paste code and it reviews it — finds bugs, suggests improvements, explains what's happening
  • You ask follow-up questions and it remembers the full context of what you shared
  • You type exit and it stops

No UI. No framework. Just Python and the Claude API.


Setup

Same as part one. If you already have the environment ready, skip this.

mkdir code-assistant
cd code-assistant
python -m venv venv
Enter fullscreen mode Exit fullscreen mode

Activate:

# Mac/Linux
source venv/bin/activate

# Windows
venv\Scripts\activate
Enter fullscreen mode Exit fullscreen mode

Install dependencies:

pip install anthropic python-dotenv
Enter fullscreen mode Exit fullscreen mode

Create your .env file:

ANTHROPIC_API_KEY=your-key-here
Enter fullscreen mode Exit fullscreen mode

The Core: A Loop That Remembers

The key difference between a single API call and a chatbot is memory.

In part one, every call was independent. Here, we keep a history list and pass it on every request — so Claude always knows what was said before.

from dotenv import load_dotenv
from anthropic import Anthropic

load_dotenv()
client = Anthropic()

history = []

def chat(user_message: str) -> str:
    history.append({"role": "user", "content": user_message})

    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        system=(
            "You are a code review assistant. "
            "When the user shares code, review it: identify bugs, explain what each part does, "
            "and suggest improvements. Be direct and specific. "
            "When the user asks follow-up questions, refer back to the code they shared."
        ),
        messages=history
    )

    reply = response.content[0].text
    history.append({"role": "assistant", "content": reply})

    return reply
Enter fullscreen mode Exit fullscreen mode

max_tokens is set to 2048 here instead of 1024. Code reviews tend to be longer — give Claude room to explain properly.


The Terminal Loop

Now wrap that function in a loop that keeps running until the user decides to stop:

def main():
    print("Code Assistant — type 'exit' to quit\n")

    while True:
        user_input = input("You: ").strip()

        if not user_input:
            continue

        if user_input.lower() == "exit":
            break

        response = chat(user_input)
        print(f"\nClaude: {response}\n")

if __name__ == "__main__":
    main()
Enter fullscreen mode Exit fullscreen mode

strip() removes accidental whitespace. The empty input check prevents sending blank messages to the API and wasting tokens.


The Full Script

Put it all together in assistant.py:

from dotenv import load_dotenv
from anthropic import Anthropic, APIError, RateLimitError, APIConnectionError

load_dotenv()
client = Anthropic()

history = []

def chat(user_message: str) -> str:
    history.append({"role": "user", "content": user_message})

    try:
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=2048,
            system=(
                "You are a code review assistant. "
                "When the user shares code, review it: identify bugs, explain what each part does, "
                "and suggest improvements. Be direct and specific. "
                "When the user asks follow-up questions, refer back to the code they shared."
            ),
            messages=history
        )

        reply = response.content[0].text
        history.append({"role": "assistant", "content": reply})
        return reply

    except RateLimitError:
        return "Rate limit reached. Wait a moment and try again."

    except APIConnectionError:
        return "Connection failed. Check your internet."

    except APIError as e:
        return f"API error {e.status_code}."

def main():
    print("Code Assistant — type 'exit' to quit\n")

    while True:
        user_input = input("You: ").strip()

        if not user_input:
            continue

        if user_input.lower() == "exit":
            break

        response = chat(user_input)
        print(f"\nClaude: {response}\n")

if __name__ == "__main__":
    main()
Enter fullscreen mode Exit fullscreen mode

Run it:

python assistant.py
Enter fullscreen mode Exit fullscreen mode

How It Behaves

A real session looks like this:

Code Assistant — type 'exit' to quit

You: def divide(a, b): return a / b

Claude: The function works for most inputs but has one critical issue:
it will raise a ZeroDivisionError if b is 0. There's no input
validation. Here's a safer version:

def divide(a, b):
    if b == 0:
        return None  # or raise ValueError("Cannot divide by zero")
    return a / b

Also worth adding type hints if this is going into a larger codebase:

def divide(a: float, b: float) -> float | None:
    if b == 0:
        return None
    return a / b

You: why use None instead of raising an error?

Claude: It depends on how the function is used. Returning None is
better when the caller expects to handle missing results gracefully —
for example, in data pipelines where a failed calculation shouldn't
stop everything. Raising ValueError is better when divide(a, 0) should
never happen and you want to catch it early during development...
Enter fullscreen mode Exit fullscreen mode

Claude remembers the code from the first message. The follow-up question gets a specific answer, not a generic one.


What's Happening Under the Hood

Every time you send a message, the full history goes with it:

Call 1: [user: "def divide(a, b)..."]
Call 2: [user: "def divide...", assistant: "...", user: "why use None?"]
Call 3: [user: "def divide...", assistant: "...", user: "why use None?", assistant: "...", user: "next question"]
Enter fullscreen mode Exit fullscreen mode

The history grows with every turn. For long sessions this means more tokens per call — and higher cost. For a personal tool used in short bursts, it's not a problem. For a production app with many users, you'd want to trim old history at some point.


Adapting the System Prompt

The system prompt is what makes this a code assistant instead of a general chatbot. Change it and you get a completely different tool.

A Python tutor:

system="You are a Python tutor for university students. Explain concepts clearly, use analogies, and always show working examples."
Enter fullscreen mode Exit fullscreen mode

A code translator:

system="You are a code translator. When the user shares code in any language, rewrite it in Python. Explain the key differences."
Enter fullscreen mode Exit fullscreen mode

A documentation writer:

system="You are a technical writer. When the user shares a function, write clear docstrings and usage examples for it."
Enter fullscreen mode Exit fullscreen mode

Same script. Different system prompt. Completely different tool.


Common Mistakes

Pasting multiline code directly into the terminal
Most terminals handle it fine, but some break on indentation. If your code isn't being read correctly, wrap it in triple quotes when pasting or read it from a file instead.

History growing too large
Long sessions accumulate a lot of tokens. If responses start slowing down or costs seem high, clear the history list between sessions.

max_tokens too low for code reviews
Code explanations are longer than regular answers. 1024 cuts them off. 2048 is safer for this use case.


What's Next

This is a working tool you can use today. From here, a few natural extensions:

Read code directly from files instead of pasting — open("script.py").read() and pass the content as the message.

Add streaming so responses appear word by word instead of all at once — the same pattern from part one works here.

Save the conversation history to a file so you can review past sessions.

The full Claude API documentation is at platform.claude.com/docs.


# The whole thing, simplified

from dotenv import load_dotenv
from anthropic import Anthropic

load_dotenv()
client = Anthropic()
history = []

while True:
    user_input = input("You: ").strip()
    if user_input.lower() == "exit":
        break

    history.append({"role": "user", "content": user_input})
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        system="You are a code review assistant. Be direct and specific.",
        messages=history
    )
    reply = response.content[0].text
    history.append({"role": "assistant", "content": reply})
    print(f"\nClaude: {reply}\n")
Enter fullscreen mode Exit fullscreen mode

Top comments (0)