DEV Community

Midas126
Midas126

Posted on

Building a Local AI Agent with Ollama and LangChain: A Practical Guide

Building a Local AI Agent with Ollama and LangChain: A Practical Guide

While cloud-based AI APIs dominate headlines, there's a quiet revolution happening on local machines. Developers are discovering the power of running large language models (LLMs) locally—no API keys, no usage limits, and complete data privacy. In this guide, I'll show you how to build a fully functional AI agent that runs entirely on your computer using Ollama and LangChain.

Why Go Local?

Before we dive into code, let's address the elephant in the room: why bother with local AI when cloud APIs are so convenient?

Privacy & Security: Your data never leaves your machine. This is crucial for sensitive documents, proprietary code, or personal information.

Cost Control: No surprise bills at the end of the month. Once you've downloaded a model, it's yours to use as much as you want.

Offline Capability: Work on planes, in remote locations, or anywhere without internet connectivity.

Customization: Fine-tune models on your specific data without worrying about vendor lock-in.

The Tech Stack: Ollama + LangChain

Our solution combines two powerful tools:

Ollama: A framework for running LLMs locally. It supports a growing library of models including Llama 2, Mistral, CodeLlama, and more.

LangChain: A framework for building applications with LLMs, providing tools for chaining, memory, and agent creation.

Setting Up Your Environment

First, let's get everything installed:

# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.ai/install.sh | sh

# Or download from https://ollama.ai for Windows

# Install Python dependencies
pip install langchain langchain-community chromadb sentence-transformers
Enter fullscreen mode Exit fullscreen mode

Now, let's pull a model. For this tutorial, I'll use llama2:7b (7 billion parameters), which runs well on most modern laptops:

ollama pull llama2:7b
Enter fullscreen mode Exit fullscreen mode

Building a Document Q&A Agent

Let's create a practical application: a document question-answering system that can process your local files.

from langchain_community.llms import Ollama
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain.document_loaders import DirectoryLoader, TextLoader
import os

class LocalDocumentAgent:
    def __init__(self, model_name="llama2:7b"):
        # Initialize the local LLM
        self.llm = Ollama(model=model_name)

        # Initialize embeddings (for vector search)
        self.embeddings = OllamaEmbeddings(model=model_name)

        # Persistent storage for our vector database
        self.persist_directory = "./chroma_db"

        self.vectorstore = None
        self.qa_chain = None

    def load_documents(self, directory_path):
        """Load all text documents from a directory"""
        print(f"Loading documents from {directory_path}...")

        loader = DirectoryLoader(
            directory_path,
            glob="**/*.txt",
            loader_cls=TextLoader,
            show_progress=True
        )

        documents = loader.load()

        # Split documents into chunks for processing
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000,
            chunk_overlap=200,
            length_function=len,
        )

        chunks = text_splitter.split_documents(documents)
        print(f"Created {len(chunks)} document chunks")

        # Create vector store from document chunks
        self.vectorstore = Chroma.from_documents(
            documents=chunks,
            embedding=self.embeddings,
            persist_directory=self.persist_directory
        )

        # Persist the database to disk
        self.vectorstore.persist()

        # Create the Q&A chain
        self.qa_chain = RetrievalQA.from_chain_type(
            llm=self.llm,
            chain_type="stuff",
            retriever=self.vectorstore.as_retriever(
                search_kwargs={"k": 3}
            ),
            return_source_documents=True
        )

        return len(chunks)

    def query(self, question):
        """Ask a question about the loaded documents"""
        if not self.qa_chain:
            return "Please load documents first using load_documents()"

        result = self.qa_chain({"query": question})

        print("\n" + "="*50)
        print(f"Question: {question}")
        print("="*50)
        print(f"Answer: {result['result']}")
        print("\nSources:")
        for i, doc in enumerate(result['source_documents'], 1):
            print(f"{i}. {doc.metadata.get('source', 'Unknown')}")
        print("="*50)

        return result

# Usage example
if __name__ == "__main__":
    agent = LocalDocumentAgent()

    # Load documents from a directory
    agent.load_documents("./my_documents/")

    # Ask questions
    agent.query("What are the main topics discussed in these documents?")
    agent.query("Summarize the key findings from the research papers.")
Enter fullscreen mode Exit fullscreen mode

Creating a Code Analysis Assistant

Let's build something more specialized—a code analysis tool that can explain, debug, and refactor your code:

from langchain.agents import Tool, AgentExecutor, create_react_agent
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory
import subprocess
import ast

class CodeAnalysisAgent:
    def __init__(self, model_name="codellama:7b"):
        self.llm = Ollama(model=model_name)
        self.memory = ConversationBufferMemory(memory_key="chat_history")

        # Define tools for our agent
        self.tools = [
            Tool(
                name="CodeExplainer",
                func=self.explain_code,
                description="Explains what a piece of code does"
            ),
            Tool(
                name="CodeDebugger",
                func=self.debug_code,
                description="Finds and explains bugs in code"
            ),
            Tool(
                name="CodeRefactor",
                func=self.refactor_code,
                description="Suggests improvements and refactoring for code"
            ),
            Tool(
                name="RunPython",
                func=self.run_python_code,
                description="Executes Python code and returns the output"
            )
        ]

        # Create the agent
        prompt = PromptTemplate.from_template(
            """You are a helpful code assistant. You have access to the following tools:

            {tools}

            Use the following format:

            Question: the input question you must answer
            Thought: you should always think about what to do
            Action: the action to take, should be one of [{tool_names}]
            Action Input: the input to the action
            Observation: the result of the action
            ... (this Thought/Action/Action Input/Observation can repeat N times)
            Thought: I now know the final answer
            Final Answer: the final answer to the original input question

            Begin!

            Previous conversation:
            {chat_history}

            Question: {input}
            Thought:{agent_scratchpad}"""
        )

        self.agent = create_react_agent(
            llm=self.llm,
            tools=self.tools,
            prompt=prompt
        )

        self.agent_executor = AgentExecutor(
            agent=self.agent,
            tools=self.tools,
            memory=self.memory,
            verbose=True,
            handle_parsing_errors=True
        )

    def explain_code(self, code):
        """Explain what the code does"""
        prompt = f"""Explain this code in detail:

        ```
{% endraw %}
python
        {code}
{% raw %}

        ```

        Explain:
        1. What the code does
        2. Key functions and their purposes
        3. The algorithm or logic flow
        4. Any important edge cases

        Keep the explanation clear and concise."""

        return self.llm.invoke(prompt)

    def debug_code(self, code):
        """Find and explain bugs in code"""
        try:
            # First, try to parse the code
            ast.parse(code)
            syntax_ok = True
        except SyntaxError as e:
            syntax_ok = False
            syntax_error = str(e)

        prompt = f"""Analyze this code for bugs:

        ```
{% endraw %}
python
        {code}
{% raw %}

        ```

        {'Note: The code has syntax errors: ' + syntax_error if not syntax_ok else 'The code appears syntactically correct.'}

        Look for:
        1. Syntax errors
        2. Logical errors
        3. Potential runtime errors
        4. Bad practices or anti-patterns

        Provide specific fixes for any issues found."""

        return self.llm.invoke(prompt)

    def refactor_code(self, code):
        """Suggest improvements for code"""
        prompt = f"""Refactor and improve this code:

        ```
{% endraw %}
python
        {code}
{% raw %}

        ```

        Suggest improvements for:
        1. Readability and clarity
        2. Performance optimization
        3. Better error handling
        4. Following Python best practices
        5. Reducing complexity

        Provide the refactored code with explanations of changes."""

        return self.llm.invoke(prompt)

    def run_python_code(self, code):
        """Execute Python code safely"""
        # Basic safety check (in production, use proper sandboxing)
        dangerous_patterns = [
            "import os", "import sys", "__import__", "eval(", "exec(",
            "open(", "subprocess", "shutil", "socket"
        ]

        for pattern in dangerous_patterns:
            if pattern in code.lower():
                return f"Security warning: Code contains potentially dangerous pattern: {pattern}"

        try:
            result = subprocess.run(
                ["python", "-c", code],
                capture_output=True,
                text=True,
                timeout=5
            )
            return f"Output:\n{result.stdout}\n\nErrors:\n{result.stderr}"
        except subprocess.TimeoutExpired:
            return "Error: Code execution timed out (possible infinite loop)"
        except Exception as e:
            return f"Error executing code: {str(e)}"

    def analyze(self, question):
        """Main interface to ask questions about code"""
        return self.agent_executor.invoke({"input": question})

# Example usage
if __name__ == "__main__":
    code_agent = CodeAnalysisAgent()

    # Example code to analyze
    sample_code = """
    def calculate_average(numbers):
        total = 0
        for i in range(len(numbers)):
            total += numbers[i]
        return total / len(numbers)

    result = calculate_average([1, 2, 3, 4, 5])
    print(f"The average is: {result}")
    """

    # Use the agent
    print("=== Code Explanation ===")
    explanation = code_agent.explain_code(sample_code)
    print(explanation)

    print("\n=== Code Refactoring ===")
    refactored = code_agent.refactor_code(sample_code)
    print(refactored)

    print("\n=== Interactive Analysis ===")
    response = code_agent.analyze(
        "What's wrong with this code and how can I fix it?\n" + sample_code
    )
    print(response["output"])
Enter fullscreen mode Exit fullscreen mode

Performance Optimization Tips

Running LLMs locally requires some optimization. Here are my hard-earned tips:

1. Model Selection: Start with smaller models (7B parameters) and work your way up. mistral:7b and llama2:7b are great starting points.

2. Quantization: Use quantized versions (like llama2:7b-q4_0) for better performance with minimal quality loss.

3. Context Window Management: Be mindful of context limits. Use document chunking and summarization for long texts.

4. Batch Processing: Process multiple queries in batches when possible.

5. GPU Acceleration: If you have an NVIDIA GPU, Ollama can use CUDA for significant speedups.

Real-World Applications

I've used this setup for:

  • Documentation Search: Instant answers from internal docs and API references
  • Code Review: Automated analysis of pull requests
  • Learning Assistant: Explaining complex concepts from textbooks
  • Meeting Notes Analysis: Summarizing and extracting action items
  • Personal Knowledge Base: Querying my own notes and bookmarks

Challenges and Limitations

It's not all sunshine and rainbows. Be aware of:

  • Hardware Requirements: Larger models need substantial RAM (16GB+ recommended)
  • Speed: Inference is slower than cloud APIs (but improving rapidly)
  • Model Quality: Some local models aren't as capable as GPT-4 (yet)
  • Setup Complexity: More moving parts than a simple API call

The Future is Local (and Open)

The trend toward local AI isn't just about privacy or cost—it's about democratization. As models get better and hardware gets cheaper, we're moving toward a future where every developer can have their own AI assistant, customized to their needs and running on their terms.

Your Turn

Ready to build your own local AI agent? Start by:

  1. Installing Ollama and pulling a model
  2. Experimenting with the code examples above
  3. Customizing the agents for your specific use case
  4. Sharing what you build with the community

The best part? You own the entire stack. No vendor lock-in, no usage limits, just pure AI capability at your fingertips.

What will you build with your local AI agent? Share your projects and experiences in the comments below. Let's build the future of local AI together!


Want to dive deeper? Check out the Ollama GitHub repo and LangChain documentation for more advanced features and capabilities.

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.