LangChain + Ollama: A Practical Guide to Building AI Agents with Python
This guide teaches you how to build real, working AI agents using Ollama and LangChain.
What You'll Learn
In this guide, you'll discover:
- ✅ How to set up Ollama + LangChain (10 minutes)
- ✅ When to use
ollama.chat()vsChatOllama()(quick decision tree) - ✅ How to build agents that remember things (persistent storage)
- ✅ Real, working examples (copy & paste ready)
- ✅ Performance tuning for your machine
- ✅ How to deploy to production
Quick Decision: Which Tool to Use?
┌─────────────────────────────────────────┐
│ Want to use AI in your Python code? │
└────────────┬────────────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ Building a multi-step AI agent that │
│ makes decisions and uses tools? │
└────────────┬─────────────────────────────┘
│
YES │ NO
│ └──────────────────────┐
│ │
▼ ▼
Use ChatOllama() Use ollama.chat()
✅ For agents ✅ For simple queries
✅ For tools ✅ For streaming
✅ For state mgmt ✅ For speed
✅ For production ✅ For prototyping
Performance at a Glance
| Operation | Time | Notes |
|---|---|---|
| ollama.chat() response | 15-25ms | Fastest |
| ChatOllama() response | 35-55ms | More features |
| Streaming first token | 5-20ms | Real-time feedback |
| Tool execution | 2-12ms | Overhead varies |
Real-world: On a laptop with 8GB RAM, you'll get responses in under 100ms most of the time.
For local AI, this is blazingly fast. (Cloud APIs add 500ms+ of network latency)
Part 1: Simple Queries with ollama.chat()
When to use: You just need to ask the AI something and get an answer.
Setup (2 minutes)
First, make sure Ollama is running:
# Terminal 1: Start Ollama
ollama serve
Now you're ready to code.
Your First Query
import ollama
response = ollama.chat(
model="qwen2.5-coder:latest",
messages=[
{"role": "user", "content": "What is 2 + 2?"}
]
)
print(response['message']['content'])
# Output: "2 + 2 equals 4"
Streaming (See Responses as They Generate)
Want to see the AI think in real-time?
import ollama
print("AI: ", end="", flush=True)
for chunk in ollama.chat(
model="qwen2.5-coder:latest",
messages=[
{"role": "user", "content": "Write a haiku about code"}
],
stream=True
):
print(chunk['message']['content'], end="", flush=True)
print() # Newline at end
Output:
AI: Lines of logic dance,
Bugs and fixes both take turns—
Code shapes the future.
Multi-Turn Conversation (Remember Context)
Ask follow-up questions:
import ollama
messages = []
while True:
user_input = input("You: ")
# Add your message
messages.append({"role": "user", "content": user_input})
# Get response
response = ollama.chat(
model="qwen2.5-coder:latest",
messages=messages
)
ai_response = response['message']['content']
print(f"\nAI: {ai_response}\n")
# Add AI's response so it remembers context
messages.append({"role": "assistant", "content": ai_response})
Try this conversation:
You: What is a lambda function in Python?
AI: A lambda function is a small anonymous function...
You: How is it different from a regular function?
AI: Great question! The key differences are...
Notice how the AI knows you're talking about Python, because it remembers the context.
Once context limit is reached, expect errors to appear.
Part 2: Building AI Agents with ChatOllama()
When to use: You're building something more sophisticated—agents that make decisions, use tools, and manage state.
Setup (5 minutes)
pip install langchain-ollama langchain langgraph
Your First Agent
An agent is an AI that can:
- ✅ Make decisions
- ✅ Use tools to accomplish tasks
- ✅ Keep track of conversation state
- ✅ Handle multiple steps
Let's build one that can tell time:
from langchain_ollama import ChatOllama
from langchain.tools import tool
from langchain.agents import create_agent
# Step 1: Create a tool
@tool
def get_current_time() -> str:
"""Get the current time."""
from datetime import datetime
return datetime.now().strftime("%H:%M:%S")
# Step 2: Create the AI
llm = ChatOllama(
model="qwen2.5-coder:latest",
temperature=0.0 # Be deterministic
)
# Step 3: Create the agent
agent = create_agent(
llm,
tools=[get_current_time],
system_prompt="You are a helpful time assistant."
)
# Step 4: Use it
result = agent.invoke({
"messages": [{"role": "user", "content": "What time is it right now?"}]
})
print(result['output'])
# Output: "It is currently 14:23:45"
What just happened?
- You asked the agent what time it is
- The agent decided it needed to use the
get_current_timetool - It called the tool and got the time
- It gave you a friendly response
The agent made the decision. You just provided the tools.
Adding Multiple Tools
Tools let your agent accomplish real things:
from langchain.tools import tool
@tool
def add_numbers(a: int, b: int) -> int:
"""Add two numbers together."""
return a + b
@tool
def multiply_numbers(a: int, b: int) -> int:
"""Multiply two numbers together."""
return a * b
# Create agent with multiple tools
agent = create_agent(
llm,
tools=[add_numbers, multiply_numbers, get_current_time],
system_prompt="You are a helpful math assistant."
)
# The agent will decide which tool to use
result = agent.invoke({
"messages": [{"role": "user", "content": "What's 25 * 4?"}]
})
print(result['output'])
# Output: "25 * 4 equals 100"
The agent automatically chose the multiply_numbers tool!
If you need, you can add verbose logging in each of the functions to keep track of which tools were used by the agent. This is also how you protect these tools by creating an input request to confirm the usage of the tool to avoid the agent doing the wrong actions.
Agents that Remember Things
What if you want the agent to remember user preferences or conversation history?
from agent_workspace.hybrid_store import HybridStore
# Create persistent storage
store = HybridStore(
storage_dir="agent_workspace/storage"
)
# Tool that saves preferences
@tool
def save_preference(key: str, value: str, runtime) -> str:
"""Save a user preference that persists."""
store = runtime.store
store.put(("preferences",), key, {"value": value})
return f"Saved: {key} = {value}"
@tool
def get_preference(key: str, runtime) -> str:
"""Retrieve a saved preference."""
store = runtime.store
pref = store.get(("preferences",), key)
if pref:
return f"Your {key} is: {pref.value['value']}"
return "No preference found"
# Create agent WITH persistent storage
agent = create_agent(
llm,
tools=[save_preference, get_preference],
store=store, # Connect the storage
system_prompt="You help manage user preferences."
)
# Session 1: Save preference
print("=== Session 1 ===")
result1 = agent.invoke({
"messages": [{"role": "user", "content": "Remember that my favorite color is blue"}]
})
print(result1['output'])
# Session 2: Retrieve preference (even after restart!)
print("\n=== Session 2 (After Restart) ===")
result2 = agent.invoke({
"messages": [{"role": "user", "content": "What's my favorite color?"}]
})
print(result2['output'])
# Output: "Your favorite color is: blue"
The magic: Data saved in Session 1 is still there in Session 2, even if you restart your computer! The HybridStore will be available in MagicPythong Library, it is a custom made class to save/restore the runtime store from LangChain to file.
Part 3: Real-World Examples
Example 1: A Personal Code Assistant
from langchain.tools import tool
from langchain.agents import create_agent
from langchain_ollama import ChatOllama
@tool
def check_python_syntax(code: str) -> str:
"""Check if Python code is valid."""
try:
compile(code, '<string>', 'exec')
return "✅ Syntax is valid!"
except SyntaxError as e:
return f"❌ Syntax error: {e}"
@tool
def explain_code(code: str) -> str:
"""Provide a simple explanation of what code does."""
# In a real app, you'd call the LLM here
return "This code does X, Y, and Z"
llm = ChatOllama(model="qwen2.5-coder:latest", temperature=0.0)
agent = create_agent(
llm,
tools=[check_python_syntax, explain_code],
system_prompt="You are a Python code assistant. Help the user write and understand code."
)
# Usage
code = """
def greet(name):
print(f"Hello, {name}!")
"""
result = agent.invoke({
"messages": [{"role": "user", "content": f"Is this Python code valid?\n\n{code}"}]
})
print(result['output'])
# Output: "Yes, this Python code is valid..."
Example 2: A Data Analysis Agent
import json
from langchain.tools import tool
from langchain.agents import create_agent
from langchain_ollama import ChatOllama
from agent_workspace.hybrid_store import HybridStore
# Sample data
SALES_DATA = [
{"product": "Laptop", "sales": 15},
{"product": "Phone", "sales": 42},
{"product": "Tablet", "sales": 28},
{"product": "Headphones", "sales": 35}
]
@tool
def get_sales_data() -> str:
"""Get the latest sales data."""
return json.dumps(SALES_DATA)
@tool
def save_report(summary: str, runtime) -> str:
"""Save analysis report."""
store = runtime.store
store.put(("reports",), "latest", {"summary": summary})
return "Report saved!"
@tool
def get_saved_report(runtime) -> str:
"""Retrieve the latest saved report."""
store = runtime.store
report = store.get(("reports",), "latest")
if report:
return f"Latest report: {report.value['summary']}"
return "No report found"
llm = ChatOllama(model="qwen2.5-coder:latest", temperature=0.0)
store = HybridStore()
agent = create_agent(
llm,
tools=[get_sales_data, save_report, get_saved_report],
store=store,
system_prompt="You are a data analyst. Help users understand their sales data."
)
# Usage
result = agent.invoke({
"messages": [{"role": "user", "content": "Analyze our sales data and give me a summary"}]
})
print(result['output'])
Part 4: Choosing the Right Model
Ollama has different size models. Pick based on your computer:
If you have 4GB or less RAM
Use Qwen2.5-Coder 1.5B
llm = ChatOllama(model="qwen2.5-coder:1.5b")
✅ Fast
⚠️ Less capable
If you have 8GB RAM
Use Qwen2.5-Coder 7B
llm = ChatOllama(model="qwen2.5-coder:7b")
✅ Good balance
✅ Handles most tasks
If you have 16GB+ RAM
Use Qwen3-Coder 30B
llm = ChatOllama(model="qwen3-coder:30b")
✅ Most capable
⚠️ Slower
Pull a model:
ollama pull qwen2.5-coder:7b
Part 5: Tuning Performance
Make responses faster
llm = ChatOllama(
model="qwen2.5-coder:7b",
temperature=0.0, # ← Deterministic (faster)
num_predict=128, # ← Shorter responses
)
Make responses more creative
llm = ChatOllama(
model="qwen2.5-coder:7b",
temperature=0.7, # ← More creative
num_predict=512, # ← Longer responses
)
Use GPU (if you have NVIDIA)
llm = ChatOllama(
model="qwen2.5-coder:7b",
num_gpu=35, # ← Use GPU layers
)
Part 6: Common Issues & Fixes
Issue 1: "Connection refused"
Problem: Getting an error when trying to use the AI
Fix:
# Terminal 1: Start Ollama
ollama serve
Then run your Python code in a different terminal.
Issue 2: "Model not found"
Problem: Error says the model doesn't exist
Fix:
# Download the model
ollama pull qwen2.5-coder:latest
Issue 3: "Out of memory"
Problem: "CUDA out of memory" or system slows down
Fix: Use a smaller model
# Instead of 32B
llm = ChatOllama(model="qwen2.5-coder:7b")
Issue 4: Slow responses
Problem: Takes too long to get a response
Fix:
llm = ChatOllama(
model="qwen2.5-coder:1.5b", # Smaller model
temperature=0.0, # Deterministic
num_predict=128, # Shorter output
)
Part 7: Next Steps
You now have enough to build:
- ✅ Chat bots
- ✅ Code assistants
- ✅ Data analysis agents
- ✅ Personal AI assistants
Resources
- Ollama: https://ollama.ai
- LangChain: https://langchain.com
- Qwen2.5-Coder: https://github.com/QwenLM/Qwen
Happy coding! 🚀
Top comments (0)