LangChain + Ollama in the Wild: Hard Learned Lessons on Building a Custom LLM Agent

#langchain #ollama #ai #development

I stumbled upon Ollama, an open-source application that makes it easy to run large language models locally with minimal setup. Integrating LangChain with Ollama is straightforward: you can wire up a model with just a few lines of code.

Turning that integration into shippable code, something reliable enough to move beyond a demo requires more care. Through trial and error, I ran into several pitfalls that made the difference between a quick prototype and a stable system.

For this article, I’ll use a simple scheduling agent as the example to walk through four key lessons.

src-https://ollama.com/public/blog/meta-ollama-llama3.png

1. Skipping Explicit Schemas

If you let the model “be concise,” it will drift into natural language instead of structured outputs. The fix is to define JSON schemas directly in the system prompt.

SYSTEM_PROMPT = """
You are a calendaring assistant. Actions:

1. create_meeting
   { "person": string, "datetime": string, "reason": string }

2. reschedule_meeting
   { "person": string, "new_datetime": string }

3. cancel_meeting
   { "person": string }

4. escalate_issue
   { "reason": string }

Output only valid JSON:
{ "action": "<name>", "input": {...} }
"""

2. Post Processing Is Not Optional

Even with a schema, the model sometimes returns almost JSON: stray commas, comments, or text. That’s why you should always sanitize before parsing.

import re, json

def clean_json(payload: str) -> str:
    no_comments = re.sub(r'//.*', '', payload)
    return re.sub(r',(\s*[}\]])', r'\1', no_comments).strip()

def parse_payload(raw: str):
    payload = clean_json(raw)
    return json.loads(payload)

3. No Timeouts/Retries

Network hiccups or model stalls will block your system if you don’t enforce limits. Ollama doesn’t provide retries or timeouts out of the box, so you need to add them at the call site.

from langchain_ollama import ChatOllama
from langchain.schema import SystemMessage, HumanMessage

llm = ChatOllama(model="llama3", temperature=0)

def call_with_retry(messages, retries=2):
    for attempt in range(retries):
        try:
            return llm.invoke(input=messages, timeout=10)  # enforce timeout
        except Exception:
            if attempt == retries - 1:
                raise

4. Ignoring Drifts

As prompts or models change, outputs can silently drift. A schema that worked last week may suddenly fail. Adding lightweight regression checks helps you catch this early.

def test_golden_case():
    messages = [SystemMessage(content=SYSTEM_PROMPT),
                HumanMessage(content="Book a lunch with Alex at 1pm")]
    ai_msg = call_with_retry(messages)
    cmd = parse_payload(ai_msg.content)
    assert "action" in cmd and "input" in cmd

Putting It Together:

import json, re
from langchain_ollama import ChatOllama
from langchain.schema import SystemMessage, HumanMessage, AIMessage

# 1) Define your agent meeting functions as JSON in the system prompt
SYSTEM_PROMPT = """
You are a calendaring assistant. Actions:

1. create_meeting
   { "person": string, "datetime": string, "reason": string }

2. reschedule_meeting
   { "person": string, "new_datetime": string }

3. cancel_meeting
   { "person": string }

4. escalate_issue
   { "reason": string }

Output only valid JSON:
{ "action": "<name>", "input": {...} }
"""

# 2) Init the Ollama model
llm = ChatOllama(model="llama3", temperature=0)

# 3) AI message as cleaned up JSON
def parse_payload(raw: str):
    payload = clean_json(raw)
    return json.loads(payload)

# 4) Remove any JS-style comments and trailing commas in objects/arrays
def clean_json(payload: str) -> str:
    no_comments = re.sub(r'//.*', '', payload)
    return re.sub(r',(\s*[}\]])', r'\1', no_comments).strip()

# 5) Invoke llm model with retry limit
def call_with_retry(messages, retries=2):
    for attempt in range(retries):
        try:
            return llm.invoke(input=messages)
        except Exception:
            if attempt == retries - 1:
                raise

# 6) Execute custom agent and to return the result
def run_agent(user_request: str):
    messages = [SystemMessage(content=SYSTEM_PROMPT), HumanMessage(content=user_request)]
    ai_msg: AIMessage = call_with_retry(messages)
    cmd = parse_payload(ai_msg.content)

    action, params = cmd["action"], cmd["input"]

    if action == "create_meeting":
        return {"result": f"Created meeting with {params['person']} on {params['datetime']}"}

    elif action == "reschedule_meeting":
        return {"result": f"Rescheduled {params['person']} to {params['new_datetime']}"}

    elif action == "cancel_meeting":
        return {"result": f"Cancelled meeting with {params['person']}"}

    elif action == "escalate_issue":
        return {"result": f"Escalated due to: {params['reason']}"}

    else:
        return {"error": f"Unknown action: {action}"}

# 7) Optional - regression check
def test_golden_case():
    messages = [SystemMessage(content=SYSTEM_PROMPT),
                HumanMessage(content="Reschedule a lunch with Alex at 1pm")]
    ai_msg = call_with_retry(messages)
    cmd = parse_payload(ai_msg.content)
    print(cmd["action"])
    print(cmd["input"])
    assert "action" in cmd and "input" in cmd

# Application start
if __name__ == "__main__":
    query = "Book a call with Mr. Russell for next Thursday at 3 PST for a quick lunch"
    print(run_agent(query))
    # Optional validation step
    #test_golden_case()

O/P: {'result': 'Created meeting with Mr. Russell on 2023-03-16T15:00:00-08:00'}

Takeaway

LangChain + Ollama is fast to set up, but brittle if you skip guardrails. These small investments turn a fragile code into a service you can trust.

QQ???

Ollama updated lib now supports tools, so would you wrap dispatch logic into a proper toolset (using LangChain Tool abstraction) or keep it closer to plain Python for control?