DEV Community

IO_Node
IO_Node

Posted on

Hello again, here's a LangChain Ollama helper sheet :)

LangChain + Ollama: A Practical Guide to Building AI Agents with Python

This guide teaches you how to build real, working AI agents using Ollama and LangChain.


What You'll Learn

In this guide, you'll discover:

  • ✅ How to set up Ollama + LangChain (10 minutes)
  • ✅ When to use ollama.chat() vs ChatOllama() (quick decision tree)
  • ✅ How to build agents that remember things (persistent storage)
  • ✅ Real, working examples (copy & paste ready)
  • ✅ Performance tuning for your machine
  • ✅ How to deploy to production

Quick Decision: Which Tool to Use?

┌─────────────────────────────────────────┐
│  Want to use AI in your Python code?    │
└────────────┬────────────────────────────┘
             │
             ▼
┌──────────────────────────────────────────┐
│  Building a multi-step AI agent that     │
│  makes decisions and uses tools?         │
└────────────┬─────────────────────────────┘
             │
        YES  │  NO
             │   └──────────────────────┐
             │                          │
             ▼                          ▼
      Use ChatOllama()          Use ollama.chat()
      ✅ For agents             ✅ For simple queries
      ✅ For tools              ✅ For streaming
      ✅ For state mgmt         ✅ For speed
      ✅ For production         ✅ For prototyping
Enter fullscreen mode Exit fullscreen mode

Performance at a Glance

Operation Time Notes
ollama.chat() response 15-25ms Fastest
ChatOllama() response 35-55ms More features
Streaming first token 5-20ms Real-time feedback
Tool execution 2-12ms Overhead varies

Real-world: On a laptop with 8GB RAM, you'll get responses in under 100ms most of the time.

For local AI, this is blazingly fast. (Cloud APIs add 500ms+ of network latency)


Part 1: Simple Queries with ollama.chat()

When to use: You just need to ask the AI something and get an answer.

Setup (2 minutes)

First, make sure Ollama is running:

# Terminal 1: Start Ollama
ollama serve
Enter fullscreen mode Exit fullscreen mode

Now you're ready to code.

Your First Query

import ollama

response = ollama.chat(
    model="qwen2.5-coder:latest",
    messages=[
        {"role": "user", "content": "What is 2 + 2?"}
    ]
)

print(response['message']['content'])
# Output: "2 + 2 equals 4"
Enter fullscreen mode Exit fullscreen mode

Streaming (See Responses as They Generate)

Want to see the AI think in real-time?

import ollama

print("AI: ", end="", flush=True)

for chunk in ollama.chat(
    model="qwen2.5-coder:latest",
    messages=[
        {"role": "user", "content": "Write a haiku about code"}
    ],
    stream=True
):
    print(chunk['message']['content'], end="", flush=True)

print()  # Newline at end
Enter fullscreen mode Exit fullscreen mode

Output:

AI: Lines of logic dance,
Bugs and fixes both take turns—
Code shapes the future.
Enter fullscreen mode Exit fullscreen mode

Multi-Turn Conversation (Remember Context)

Ask follow-up questions:

import ollama

messages = []

while True:
    user_input = input("You: ")

    # Add your message
    messages.append({"role": "user", "content": user_input})

    # Get response
    response = ollama.chat(
        model="qwen2.5-coder:latest",
        messages=messages
    )

    ai_response = response['message']['content']
    print(f"\nAI: {ai_response}\n")

    # Add AI's response so it remembers context
    messages.append({"role": "assistant", "content": ai_response})
Enter fullscreen mode Exit fullscreen mode

Try this conversation:

You: What is a lambda function in Python?
AI: A lambda function is a small anonymous function...

You: How is it different from a regular function?
AI: Great question! The key differences are...
Enter fullscreen mode Exit fullscreen mode

Notice how the AI knows you're talking about Python, because it remembers the context.
Once context limit is reached, expect errors to appear.


Part 2: Building AI Agents with ChatOllama()

When to use: You're building something more sophisticated—agents that make decisions, use tools, and manage state.

Setup (5 minutes)

pip install langchain-ollama langchain langgraph
Enter fullscreen mode Exit fullscreen mode

Your First Agent

An agent is an AI that can:

  1. ✅ Make decisions
  2. ✅ Use tools to accomplish tasks
  3. ✅ Keep track of conversation state
  4. ✅ Handle multiple steps

Let's build one that can tell time:

from langchain_ollama import ChatOllama
from langchain.tools import tool
from langchain.agents import create_agent

# Step 1: Create a tool
@tool
def get_current_time() -> str:
    """Get the current time."""
    from datetime import datetime
    return datetime.now().strftime("%H:%M:%S")

# Step 2: Create the AI
llm = ChatOllama(
    model="qwen2.5-coder:latest",
    temperature=0.0  # Be deterministic
)

# Step 3: Create the agent
agent = create_agent(
    llm,
    tools=[get_current_time],
    system_prompt="You are a helpful time assistant."
)

# Step 4: Use it
result = agent.invoke({
    "messages": [{"role": "user", "content": "What time is it right now?"}]
})

print(result['output'])
# Output: "It is currently 14:23:45"
Enter fullscreen mode Exit fullscreen mode

What just happened?

  1. You asked the agent what time it is
  2. The agent decided it needed to use the get_current_time tool
  3. It called the tool and got the time
  4. It gave you a friendly response

The agent made the decision. You just provided the tools.

Adding Multiple Tools

Tools let your agent accomplish real things:

from langchain.tools import tool

@tool
def add_numbers(a: int, b: int) -> int:
    """Add two numbers together."""
    return a + b

@tool
def multiply_numbers(a: int, b: int) -> int:
    """Multiply two numbers together."""
    return a * b

# Create agent with multiple tools
agent = create_agent(
    llm,
    tools=[add_numbers, multiply_numbers, get_current_time],
    system_prompt="You are a helpful math assistant."
)

# The agent will decide which tool to use
result = agent.invoke({
    "messages": [{"role": "user", "content": "What's 25 * 4?"}]
})

print(result['output'])
# Output: "25 * 4 equals 100"
Enter fullscreen mode Exit fullscreen mode

The agent automatically chose the multiply_numbers tool!
If you need, you can add verbose logging in each of the functions to keep track of which tools were used by the agent. This is also how you protect these tools by creating an input request to confirm the usage of the tool to avoid the agent doing the wrong actions.

Agents that Remember Things

What if you want the agent to remember user preferences or conversation history?

from agent_workspace.hybrid_store import HybridStore

# Create persistent storage
store = HybridStore(
    storage_dir="agent_workspace/storage"
)

# Tool that saves preferences
@tool
def save_preference(key: str, value: str, runtime) -> str:
    """Save a user preference that persists."""
    store = runtime.store
    store.put(("preferences",), key, {"value": value})
    return f"Saved: {key} = {value}"

@tool
def get_preference(key: str, runtime) -> str:
    """Retrieve a saved preference."""
    store = runtime.store
    pref = store.get(("preferences",), key)
    if pref:
        return f"Your {key} is: {pref.value['value']}"
    return "No preference found"

# Create agent WITH persistent storage
agent = create_agent(
    llm,
    tools=[save_preference, get_preference],
    store=store,  # Connect the storage
    system_prompt="You help manage user preferences."
)

# Session 1: Save preference
print("=== Session 1 ===")
result1 = agent.invoke({
    "messages": [{"role": "user", "content": "Remember that my favorite color is blue"}]
})
print(result1['output'])

# Session 2: Retrieve preference (even after restart!)
print("\n=== Session 2 (After Restart) ===")
result2 = agent.invoke({
    "messages": [{"role": "user", "content": "What's my favorite color?"}]
})
print(result2['output'])
# Output: "Your favorite color is: blue"
Enter fullscreen mode Exit fullscreen mode

The magic: Data saved in Session 1 is still there in Session 2, even if you restart your computer! The HybridStore will be available in MagicPythong Library, it is a custom made class to save/restore the runtime store from LangChain to file.


Part 3: Real-World Examples

Example 1: A Personal Code Assistant

from langchain.tools import tool
from langchain.agents import create_agent
from langchain_ollama import ChatOllama

@tool
def check_python_syntax(code: str) -> str:
    """Check if Python code is valid."""
    try:
        compile(code, '<string>', 'exec')
        return "✅ Syntax is valid!"
    except SyntaxError as e:
        return f"❌ Syntax error: {e}"

@tool
def explain_code(code: str) -> str:
    """Provide a simple explanation of what code does."""
    # In a real app, you'd call the LLM here
    return "This code does X, Y, and Z"

llm = ChatOllama(model="qwen2.5-coder:latest", temperature=0.0)

agent = create_agent(
    llm,
    tools=[check_python_syntax, explain_code],
    system_prompt="You are a Python code assistant. Help the user write and understand code."
)

# Usage
code = """
def greet(name):
    print(f"Hello, {name}!")
"""

result = agent.invoke({
    "messages": [{"role": "user", "content": f"Is this Python code valid?\n\n{code}"}]
})

print(result['output'])
# Output: "Yes, this Python code is valid..."
Enter fullscreen mode Exit fullscreen mode

Example 2: A Data Analysis Agent

import json
from langchain.tools import tool
from langchain.agents import create_agent
from langchain_ollama import ChatOllama
from agent_workspace.hybrid_store import HybridStore

# Sample data
SALES_DATA = [
    {"product": "Laptop", "sales": 15},
    {"product": "Phone", "sales": 42},
    {"product": "Tablet", "sales": 28},
    {"product": "Headphones", "sales": 35}
]

@tool
def get_sales_data() -> str:
    """Get the latest sales data."""
    return json.dumps(SALES_DATA)

@tool
def save_report(summary: str, runtime) -> str:
    """Save analysis report."""
    store = runtime.store
    store.put(("reports",), "latest", {"summary": summary})
    return "Report saved!"

@tool
def get_saved_report(runtime) -> str:
    """Retrieve the latest saved report."""
    store = runtime.store
    report = store.get(("reports",), "latest")
    if report:
        return f"Latest report: {report.value['summary']}"
    return "No report found"

llm = ChatOllama(model="qwen2.5-coder:latest", temperature=0.0)
store = HybridStore()

agent = create_agent(
    llm,
    tools=[get_sales_data, save_report, get_saved_report],
    store=store,
    system_prompt="You are a data analyst. Help users understand their sales data."
)

# Usage
result = agent.invoke({
    "messages": [{"role": "user", "content": "Analyze our sales data and give me a summary"}]
})

print(result['output'])
Enter fullscreen mode Exit fullscreen mode

Part 4: Choosing the Right Model

Ollama has different size models. Pick based on your computer:

If you have 4GB or less RAM

Use Qwen2.5-Coder 1.5B

llm = ChatOllama(model="qwen2.5-coder:1.5b")
Enter fullscreen mode Exit fullscreen mode

✅ Fast

⚠️ Less capable

If you have 8GB RAM

Use Qwen2.5-Coder 7B

llm = ChatOllama(model="qwen2.5-coder:7b")
Enter fullscreen mode Exit fullscreen mode

✅ Good balance

✅ Handles most tasks

If you have 16GB+ RAM

Use Qwen3-Coder 30B

llm = ChatOllama(model="qwen3-coder:30b")
Enter fullscreen mode Exit fullscreen mode

✅ Most capable

⚠️ Slower

Pull a model:

ollama pull qwen2.5-coder:7b
Enter fullscreen mode Exit fullscreen mode

Part 5: Tuning Performance

Make responses faster

llm = ChatOllama(
    model="qwen2.5-coder:7b",
    temperature=0.0,      # ← Deterministic (faster)
    num_predict=128,      # ← Shorter responses
)
Enter fullscreen mode Exit fullscreen mode

Make responses more creative

llm = ChatOllama(
    model="qwen2.5-coder:7b",
    temperature=0.7,      # ← More creative
    num_predict=512,      # ← Longer responses
)
Enter fullscreen mode Exit fullscreen mode

Use GPU (if you have NVIDIA)

llm = ChatOllama(
    model="qwen2.5-coder:7b",
    num_gpu=35,           # ← Use GPU layers
)
Enter fullscreen mode Exit fullscreen mode

Part 6: Common Issues & Fixes

Issue 1: "Connection refused"

Problem: Getting an error when trying to use the AI

Fix:

# Terminal 1: Start Ollama
ollama serve
Enter fullscreen mode Exit fullscreen mode

Then run your Python code in a different terminal.

Issue 2: "Model not found"

Problem: Error says the model doesn't exist

Fix:

# Download the model
ollama pull qwen2.5-coder:latest
Enter fullscreen mode Exit fullscreen mode

Issue 3: "Out of memory"

Problem: "CUDA out of memory" or system slows down

Fix: Use a smaller model

# Instead of 32B
llm = ChatOllama(model="qwen2.5-coder:7b")
Enter fullscreen mode Exit fullscreen mode

Issue 4: Slow responses

Problem: Takes too long to get a response

Fix:

llm = ChatOllama(
    model="qwen2.5-coder:1.5b",  # Smaller model
    temperature=0.0,              # Deterministic
    num_predict=128,              # Shorter output
)
Enter fullscreen mode Exit fullscreen mode

Part 7: Next Steps

You now have enough to build:

  • ✅ Chat bots
  • ✅ Code assistants
  • ✅ Data analysis agents
  • ✅ Personal AI assistants

Resources


Happy coding! 🚀

Top comments (0)