James

Posted on Sep 16

Building a Human-in-the-Loop AI App with LangGraph and Ollama

#python #ai #llm #langgraph

Have you ever wanted to hit the pause button on an AI's reasoning process, take a closer look at what it's doing, and steer it in a better direction before it continues? That's the core idea behind a Human-in-the-Loop (HITL) system, and in this tutorial, we'll build one from scratch using LangGraph for workflow orchestration, Ollama for local AI model inference, and Python for the glue that holds it all together. This setup allows you to create interactive AI applications where humans can intervene at key points, making the system more reliable and adaptable for real-world use cases like content generation, decision-making tools, or even debugging assistants.

This guide serves as a detailed companion to my YouTube walkthrough, where I demonstrate the entire process live with code explanations and troubleshooting tips.

📺 Watch the step-by-step video here Click for Video

By following along, you'll gain hands-on experience with several key concepts: designing modular AI workflows using LangGraph's graph-based architecture, leveraging large language models (LLMs) to refine user inputs automatically, implementing pauses for human oversight to ensure quality and alignment, and generating high-quality outputs with Ollama's efficient local models. These skills are increasingly valuable as AI systems evolve to incorporate more human-AI collaboration.

🚀 What We’re Building

At its heart, we're constructing a command-line interface (CLI) Python application that transforms a basic user prompt into a polished, AI-generated response through a structured pipeline. The app starts by accepting a simple prompt from the user, then uses an LLM to enhance it for clarity and effectiveness—think of this as prompt engineering on autopilot. Next, it halts the process to allow human review, where you can approve the improved prompt or tweak it manually. Finally, it feeds the refined prompt back into the LLM to produce a comprehensive final answer.

What makes this special is LangGraph's role in modeling the workflow as a directed graph: each phase (like prompt improvement or human review) becomes a "node," and the connections between them define the flow. This graph-based approach ensures the app is flexible, easy to extend (e.g., adding more nodes for additional checks), and stateful, meaning it remembers progress across steps. We'll use Ollama to run everything locally, keeping things fast, private, and cost-free compared to cloud-based APIs. By the end, you'll have a reusable template for building HITL systems that balance AI autonomy with human control.

🛠️ Step 1 — Setup and Imports

To get started, we need to install the necessary dependencies. If you haven't already, run pip install langgraph langchain-ollama rich in your terminal. These libraries provide the building blocks: LangGraph for the graph workflow, LangChain's Ollama integration for model interactions, and Rich for enhanced console output.

Here's the foundational code with our imports:

import uuid
from typing import TypedDict, Optional

from langgraph.graph import StateGraph, START, END
from langgraph.types import interrupt, Command
from langgraph.checkpoint.memory import InMemorySaver
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_ollama import ChatOllama
from rich import print

Let's break this down: uuid helps generate unique identifiers for each app session, ensuring that multiple runs don't interfere with each other—especially useful if you expand this to a multi-user setup. StateGraph is the core of LangGraph, allowing us to define nodes and edges programmatically. The interrupt and Command types enable the pausing mechanism for human input. InMemorySaver acts as a simple checkpoint system to persist state between steps, which is crucial for resuming after interruptions. LangChain's HumanMessage and SystemMessage classes structure our interactions with the LLM, while ChatOllama connects to your locally running Ollama model. Finally, rich.print jazzes up the console with colors, emojis, and formatting for a more engaging user experience.

With imports in place, initialize the LLM like so:

llm = ChatOllama(model='phi4-mini', temperature=0.7)

We're using the 'phi4-mini' model here for its balance of speed and capability on everyday hardware, but you could swap it for something like 'llama3' if you need more power. The temperature=0.7 setting introduces a bit of creativity without making outputs too unpredictable—adjust this based on your needs for determinism versus innovation.

🧩 Step 2 — Defining Application State

In LangGraph, the application's "memory" is a shared state dictionary that gets updated as it flows through the graph. This ensures every node has access to the latest data without redundant passing.

class PromptState(TypedDict):
    original_prompt: str
    improved_prompt: Optional[str]
    final_prompt: Optional[str]
    final_answer: Optional[str]
    step: str

This PromptState class tracks the evolution of the prompt and response: the original_prompt captures the user's raw input, improved_prompt holds the LLM's refined version, final_prompt is the post-human-review result, final_answer stores the ultimate output, and step is a simple tracker for debugging (e.g., knowing if we're at "prompt_improved" or "human_reviewed"). Using TypedDict with Optional fields adds type safety, helping catch errors early in development. This state design is extensible—if you wanted to add features like logging timestamps or confidence scores, you could easily expand this dictionary.

🧠 Step 3 — Improving the Prompt

The magic begins here: our first node uses the LLM to rewrite the user's prompt, making it more precise and effective. This step mimics advanced prompt engineering techniques, often leading to dramatically better results from the final AI generation.

def improve_prompt_node(state: PromptState) -> PromptState:
    print(f"🤖 Improving prompt with Ollama: {state['original_prompt']}")

    system_message = """You are a prompt engineering expert. Your goal is to take the user's original prompt and improve it for clarity, specificity, and effectiveness when used with an LLM. Make it more detailed, add context if needed, and ensure it's optimized for generating high-quality responses. Return only the improved prompt, no explanations."""

    improved_prompt = call_llm(state['original_prompt'], system_message)

    return {
        **state,
        "improved_prompt": improved_prompt,
        "step": "prompt_improved"
    }

In this function, we craft a system message that instructs the LLM on how to refine the prompt—emphasizing clarity and optimization without adding fluff. The call_llm function (which you'll need to define as a helper to invoke llm.invoke with the messages) handles the actual model call. For example, if the user inputs "Tell me about machine learning," the LLM might output something like "Provide a comprehensive overview of machine learning, including its history, key algorithms, real-world applications, and future trends." This automated improvement reduces user effort and boosts output quality.

✋ Step 4 — Human-in-the-Loop Review

This node introduces the "human" element, pausing the graph to solicit feedback. It's what makes the app truly interactive and prevents AI from running unchecked.

def human_review_node(state: PromptState) -> PromptState:
    print("👤 Requesting human review...")

    human_response = interrupt({
        "task": "Please review and edit the improved prompt if needed",
        "original_prompt": state["original_prompt"],
        "improved_prompt": state["improved_prompt"]
    })

    # Either accept or edit the improved prompt
    final_prompt = human_response.get("edited_prompt", state["improved_prompt"])

    return {
        **state,
        "final_prompt": final_prompt,
        "step": "human_reviewed"
    }

When executed, the app displays the improved prompt in the console and waits for input: pressing Enter accepts it as-is, or you can type modifications. LangGraph's interrupt mechanism halts execution here, saving the state so you can resume later. This is ideal for scenarios where AI might hallucinate or misinterpret, allowing human expertise to refine the direction—think content moderation or personalized advice systems.

📝 Step 5 — Final Answer Generation

With the prompt finalized, this node generates the comprehensive response using Ollama.

def answer_prompt_node(state: PromptState) -> PromptState:
    print(f"🤖 Generating final answer with Ollama for: {state['final_prompt']}")

    system_message = """You are a helpful AI assistant. Provide a comprehensive answer to the user's prompt, drawing on accurate knowledge and clear explanations. Structure your response logically with sections if appropriate."""

    final_answer = call_llm(state['final_prompt'], system_message)

    return {
        **state,
        "final_answer": final_answer,
        "step": "completed"
    }

The system message guides the LLM toward structured, informative outputs. For instance, it might turn a refined prompt into a detailed essay with headings, examples, and references, ensuring the final answer is polished and user-friendly.

🔗 Step 6 — Building the Graph

Now we assemble the pieces into a cohesive workflow graph.

def create_app():
    builder = StateGraph(PromptState)

    builder.add_node("improve_prompt", improve_prompt_node)
    builder.add_node("human_review", human_review_node)
    builder.add_node("answer_prompt", answer_prompt_node)

    builder.add_edge(START, "improve_prompt")
    builder.add_edge("improve_prompt", "human_review")
    builder.add_edge("human_review", "answer_prompt")
    builder.add_edge("answer_prompt", END)

    return builder.compile(checkpointer=InMemorySaver())

This creates a linear flow from start to end, but LangGraph supports branches, loops, or conditionals if you want to add complexity (e.g., rerouting based on human feedback). The in-memory checkpointer ensures state persistence, making interruptions seamless.

💻 Step 7 — Running It as a CLI

Tie it all together in a main function for easy execution.

def main():
    app = create_app()
    config = {"configurable": {"thread_id": str(uuid.uuid4())}}

    user_prompt = input("Enter your prompt: ").strip() or "Tell me about machine learning"

    initial_state = {"original_prompt": user_prompt, "step": "starting"}

    print(f"🚀 Processing with Ollama: {user_prompt}")
    result = app.invoke(initial_state, config=config)

    if "__interrupt__" in result:
        # Logic to display improved prompt and collect human input...
        # For example, print the details and use input() to get edits
        human_response = {"edited_prompt": input("Edit prompt (or Enter to approve): ")}  # Simplified example
        final_result = app.invoke(Command(resume=human_response), config=config)

        print(f"Final prompt: {final_result['final_prompt']}")
        print(f"🎯 Answer:\n{final_result['final_answer']}")

This loop handles the full cycle, including resuming after the human review interrupt. Run it with python your_script.py, input a prompt, and watch the HITL magic unfold.

🎯 Wrapping Up

You've now constructed a robust Human-in-the-Loop AI application that automates prompt refinement, incorporates human oversight for accuracy, and delivers refined answers via Ollama—all orchestrated by LangGraph's powerful workflow engine. This approach combines the efficiency of AI with the nuance of human judgment, making it perfect for applications where reliability matters, such as educational tools, creative writing aids, or enterprise decision support.

To see a live demo and dive deeper into customizations, check out the companion YouTube video:

👉 Watch here Link to Video

DEV Community