You already use Claude every day.
For debugging Python. For understanding C++ errors. For summarizing papers before an exam. At some point you stop going to the browser and start wondering — can I just have this inside my terminal, in my workflow, without switching windows every five minutes?
Yes. That's what this tutorial builds.
A terminal chatbot that lives in your code environment. You paste code, ask questions, get answers, and keep the conversation going — all without leaving your terminal.
This is part two of a series. If you haven't set up the Claude API yet, start with part one.
What We're Building
A terminal chatbot with three behaviors:
- You paste code and it reviews it — finds bugs, suggests improvements, explains what's happening
- You ask follow-up questions and it remembers the full context of what you shared
- You type
exitand it stops
No UI. No framework. Just Python and the Claude API.
Setup
Same as part one. If you already have the environment ready, skip this.
mkdir code-assistant
cd code-assistant
python -m venv venv
Activate:
# Mac/Linux
source venv/bin/activate
# Windows
venv\Scripts\activate
Install dependencies:
pip install anthropic python-dotenv
Create your .env file:
ANTHROPIC_API_KEY=your-key-here
The Core: A Loop That Remembers
The key difference between a single API call and a chatbot is memory.
In part one, every call was independent. Here, we keep a history list and pass it on every request — so Claude always knows what was said before.
from dotenv import load_dotenv
from anthropic import Anthropic
load_dotenv()
client = Anthropic()
history = []
def chat(user_message: str) -> str:
history.append({"role": "user", "content": user_message})
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
system=(
"You are a code review assistant. "
"When the user shares code, review it: identify bugs, explain what each part does, "
"and suggest improvements. Be direct and specific. "
"When the user asks follow-up questions, refer back to the code they shared."
),
messages=history
)
reply = response.content[0].text
history.append({"role": "assistant", "content": reply})
return reply
max_tokens is set to 2048 here instead of 1024. Code reviews tend to be longer — give Claude room to explain properly.
The Terminal Loop
Now wrap that function in a loop that keeps running until the user decides to stop:
def main():
print("Code Assistant — type 'exit' to quit\n")
while True:
user_input = input("You: ").strip()
if not user_input:
continue
if user_input.lower() == "exit":
break
response = chat(user_input)
print(f"\nClaude: {response}\n")
if __name__ == "__main__":
main()
strip() removes accidental whitespace. The empty input check prevents sending blank messages to the API and wasting tokens.
The Full Script
Put it all together in assistant.py:
from dotenv import load_dotenv
from anthropic import Anthropic, APIError, RateLimitError, APIConnectionError
load_dotenv()
client = Anthropic()
history = []
def chat(user_message: str) -> str:
history.append({"role": "user", "content": user_message})
try:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
system=(
"You are a code review assistant. "
"When the user shares code, review it: identify bugs, explain what each part does, "
"and suggest improvements. Be direct and specific. "
"When the user asks follow-up questions, refer back to the code they shared."
),
messages=history
)
reply = response.content[0].text
history.append({"role": "assistant", "content": reply})
return reply
except RateLimitError:
return "Rate limit reached. Wait a moment and try again."
except APIConnectionError:
return "Connection failed. Check your internet."
except APIError as e:
return f"API error {e.status_code}."
def main():
print("Code Assistant — type 'exit' to quit\n")
while True:
user_input = input("You: ").strip()
if not user_input:
continue
if user_input.lower() == "exit":
break
response = chat(user_input)
print(f"\nClaude: {response}\n")
if __name__ == "__main__":
main()
Run it:
python assistant.py
How It Behaves
A real session looks like this:
Code Assistant — type 'exit' to quit
You: def divide(a, b): return a / b
Claude: The function works for most inputs but has one critical issue:
it will raise a ZeroDivisionError if b is 0. There's no input
validation. Here's a safer version:
def divide(a, b):
if b == 0:
return None # or raise ValueError("Cannot divide by zero")
return a / b
Also worth adding type hints if this is going into a larger codebase:
def divide(a: float, b: float) -> float | None:
if b == 0:
return None
return a / b
You: why use None instead of raising an error?
Claude: It depends on how the function is used. Returning None is
better when the caller expects to handle missing results gracefully —
for example, in data pipelines where a failed calculation shouldn't
stop everything. Raising ValueError is better when divide(a, 0) should
never happen and you want to catch it early during development...
Claude remembers the code from the first message. The follow-up question gets a specific answer, not a generic one.
What's Happening Under the Hood
Every time you send a message, the full history goes with it:
Call 1: [user: "def divide(a, b)..."]
Call 2: [user: "def divide...", assistant: "...", user: "why use None?"]
Call 3: [user: "def divide...", assistant: "...", user: "why use None?", assistant: "...", user: "next question"]
The history grows with every turn. For long sessions this means more tokens per call — and higher cost. For a personal tool used in short bursts, it's not a problem. For a production app with many users, you'd want to trim old history at some point.
Adapting the System Prompt
The system prompt is what makes this a code assistant instead of a general chatbot. Change it and you get a completely different tool.
A Python tutor:
system="You are a Python tutor for university students. Explain concepts clearly, use analogies, and always show working examples."
A code translator:
system="You are a code translator. When the user shares code in any language, rewrite it in Python. Explain the key differences."
A documentation writer:
system="You are a technical writer. When the user shares a function, write clear docstrings and usage examples for it."
Same script. Different system prompt. Completely different tool.
Common Mistakes
Pasting multiline code directly into the terminal
Most terminals handle it fine, but some break on indentation. If your code isn't being read correctly, wrap it in triple quotes when pasting or read it from a file instead.
History growing too large
Long sessions accumulate a lot of tokens. If responses start slowing down or costs seem high, clear the history list between sessions.
max_tokens too low for code reviews
Code explanations are longer than regular answers. 1024 cuts them off. 2048 is safer for this use case.
What's Next
This is a working tool you can use today. From here, a few natural extensions:
Read code directly from files instead of pasting — open("script.py").read() and pass the content as the message.
Add streaming so responses appear word by word instead of all at once — the same pattern from part one works here.
Save the conversation history to a file so you can review past sessions.
The full Claude API documentation is at platform.claude.com/docs.
# The whole thing, simplified
from dotenv import load_dotenv
from anthropic import Anthropic
load_dotenv()
client = Anthropic()
history = []
while True:
user_input = input("You: ").strip()
if user_input.lower() == "exit":
break
history.append({"role": "user", "content": user_input})
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
system="You are a code review assistant. Be direct and specific.",
messages=history
)
reply = response.content[0].text
history.append({"role": "assistant", "content": reply})
print(f"\nClaude: {reply}\n")
Top comments (0)