Building a Local AI Agent with Ollama and LangChain: A Practical Guide
While cloud-based AI APIs dominate headlines, there's a quiet revolution happening on local machines. Developers are discovering the power of running large language models (LLMs) locally—no API keys, no usage limits, and complete data privacy. In this guide, I'll show you how to build a fully functional AI agent that runs entirely on your computer using Ollama and LangChain.
Why Go Local?
Before we dive into code, let's address the elephant in the room: why bother with local AI when cloud APIs are so convenient?
Privacy & Security: Your data never leaves your machine. This is crucial for sensitive documents, proprietary code, or personal information.
Cost Control: No surprise bills at the end of the month. Once you've downloaded a model, it's yours to use as much as you want.
Offline Capability: Work on planes, in remote locations, or anywhere without internet connectivity.
Customization: Fine-tune models on your specific data without worrying about vendor lock-in.
The Tech Stack: Ollama + LangChain
Our solution combines two powerful tools:
Ollama: A framework for running LLMs locally. It supports a growing library of models including Llama 2, Mistral, CodeLlama, and more.
LangChain: A framework for building applications with LLMs, providing tools for chaining, memory, and agent creation.
Setting Up Your Environment
First, let's get everything installed:
# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.ai/install.sh | sh
# Or download from https://ollama.ai for Windows
# Install Python dependencies
pip install langchain langchain-community chromadb sentence-transformers
Now, let's pull a model. For this tutorial, I'll use llama2:7b (7 billion parameters), which runs well on most modern laptops:
ollama pull llama2:7b
Building a Document Q&A Agent
Let's create a practical application: a document question-answering system that can process your local files.
from langchain_community.llms import Ollama
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain.document_loaders import DirectoryLoader, TextLoader
import os
class LocalDocumentAgent:
def __init__(self, model_name="llama2:7b"):
# Initialize the local LLM
self.llm = Ollama(model=model_name)
# Initialize embeddings (for vector search)
self.embeddings = OllamaEmbeddings(model=model_name)
# Persistent storage for our vector database
self.persist_directory = "./chroma_db"
self.vectorstore = None
self.qa_chain = None
def load_documents(self, directory_path):
"""Load all text documents from a directory"""
print(f"Loading documents from {directory_path}...")
loader = DirectoryLoader(
directory_path,
glob="**/*.txt",
loader_cls=TextLoader,
show_progress=True
)
documents = loader.load()
# Split documents into chunks for processing
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
length_function=len,
)
chunks = text_splitter.split_documents(documents)
print(f"Created {len(chunks)} document chunks")
# Create vector store from document chunks
self.vectorstore = Chroma.from_documents(
documents=chunks,
embedding=self.embeddings,
persist_directory=self.persist_directory
)
# Persist the database to disk
self.vectorstore.persist()
# Create the Q&A chain
self.qa_chain = RetrievalQA.from_chain_type(
llm=self.llm,
chain_type="stuff",
retriever=self.vectorstore.as_retriever(
search_kwargs={"k": 3}
),
return_source_documents=True
)
return len(chunks)
def query(self, question):
"""Ask a question about the loaded documents"""
if not self.qa_chain:
return "Please load documents first using load_documents()"
result = self.qa_chain({"query": question})
print("\n" + "="*50)
print(f"Question: {question}")
print("="*50)
print(f"Answer: {result['result']}")
print("\nSources:")
for i, doc in enumerate(result['source_documents'], 1):
print(f"{i}. {doc.metadata.get('source', 'Unknown')}")
print("="*50)
return result
# Usage example
if __name__ == "__main__":
agent = LocalDocumentAgent()
# Load documents from a directory
agent.load_documents("./my_documents/")
# Ask questions
agent.query("What are the main topics discussed in these documents?")
agent.query("Summarize the key findings from the research papers.")
Creating a Code Analysis Assistant
Let's build something more specialized—a code analysis tool that can explain, debug, and refactor your code:
from langchain.agents import Tool, AgentExecutor, create_react_agent
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory
import subprocess
import ast
class CodeAnalysisAgent:
def __init__(self, model_name="codellama:7b"):
self.llm = Ollama(model=model_name)
self.memory = ConversationBufferMemory(memory_key="chat_history")
# Define tools for our agent
self.tools = [
Tool(
name="CodeExplainer",
func=self.explain_code,
description="Explains what a piece of code does"
),
Tool(
name="CodeDebugger",
func=self.debug_code,
description="Finds and explains bugs in code"
),
Tool(
name="CodeRefactor",
func=self.refactor_code,
description="Suggests improvements and refactoring for code"
),
Tool(
name="RunPython",
func=self.run_python_code,
description="Executes Python code and returns the output"
)
]
# Create the agent
prompt = PromptTemplate.from_template(
"""You are a helpful code assistant. You have access to the following tools:
{tools}
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin!
Previous conversation:
{chat_history}
Question: {input}
Thought:{agent_scratchpad}"""
)
self.agent = create_react_agent(
llm=self.llm,
tools=self.tools,
prompt=prompt
)
self.agent_executor = AgentExecutor(
agent=self.agent,
tools=self.tools,
memory=self.memory,
verbose=True,
handle_parsing_errors=True
)
def explain_code(self, code):
"""Explain what the code does"""
prompt = f"""Explain this code in detail:
```
{% endraw %}
python
{code}
{% raw %}
```
Explain:
1. What the code does
2. Key functions and their purposes
3. The algorithm or logic flow
4. Any important edge cases
Keep the explanation clear and concise."""
return self.llm.invoke(prompt)
def debug_code(self, code):
"""Find and explain bugs in code"""
try:
# First, try to parse the code
ast.parse(code)
syntax_ok = True
except SyntaxError as e:
syntax_ok = False
syntax_error = str(e)
prompt = f"""Analyze this code for bugs:
```
{% endraw %}
python
{code}
{% raw %}
```
{'Note: The code has syntax errors: ' + syntax_error if not syntax_ok else 'The code appears syntactically correct.'}
Look for:
1. Syntax errors
2. Logical errors
3. Potential runtime errors
4. Bad practices or anti-patterns
Provide specific fixes for any issues found."""
return self.llm.invoke(prompt)
def refactor_code(self, code):
"""Suggest improvements for code"""
prompt = f"""Refactor and improve this code:
```
{% endraw %}
python
{code}
{% raw %}
```
Suggest improvements for:
1. Readability and clarity
2. Performance optimization
3. Better error handling
4. Following Python best practices
5. Reducing complexity
Provide the refactored code with explanations of changes."""
return self.llm.invoke(prompt)
def run_python_code(self, code):
"""Execute Python code safely"""
# Basic safety check (in production, use proper sandboxing)
dangerous_patterns = [
"import os", "import sys", "__import__", "eval(", "exec(",
"open(", "subprocess", "shutil", "socket"
]
for pattern in dangerous_patterns:
if pattern in code.lower():
return f"Security warning: Code contains potentially dangerous pattern: {pattern}"
try:
result = subprocess.run(
["python", "-c", code],
capture_output=True,
text=True,
timeout=5
)
return f"Output:\n{result.stdout}\n\nErrors:\n{result.stderr}"
except subprocess.TimeoutExpired:
return "Error: Code execution timed out (possible infinite loop)"
except Exception as e:
return f"Error executing code: {str(e)}"
def analyze(self, question):
"""Main interface to ask questions about code"""
return self.agent_executor.invoke({"input": question})
# Example usage
if __name__ == "__main__":
code_agent = CodeAnalysisAgent()
# Example code to analyze
sample_code = """
def calculate_average(numbers):
total = 0
for i in range(len(numbers)):
total += numbers[i]
return total / len(numbers)
result = calculate_average([1, 2, 3, 4, 5])
print(f"The average is: {result}")
"""
# Use the agent
print("=== Code Explanation ===")
explanation = code_agent.explain_code(sample_code)
print(explanation)
print("\n=== Code Refactoring ===")
refactored = code_agent.refactor_code(sample_code)
print(refactored)
print("\n=== Interactive Analysis ===")
response = code_agent.analyze(
"What's wrong with this code and how can I fix it?\n" + sample_code
)
print(response["output"])
Performance Optimization Tips
Running LLMs locally requires some optimization. Here are my hard-earned tips:
1. Model Selection: Start with smaller models (7B parameters) and work your way up. mistral:7b and llama2:7b are great starting points.
2. Quantization: Use quantized versions (like llama2:7b-q4_0) for better performance with minimal quality loss.
3. Context Window Management: Be mindful of context limits. Use document chunking and summarization for long texts.
4. Batch Processing: Process multiple queries in batches when possible.
5. GPU Acceleration: If you have an NVIDIA GPU, Ollama can use CUDA for significant speedups.
Real-World Applications
I've used this setup for:
- Documentation Search: Instant answers from internal docs and API references
- Code Review: Automated analysis of pull requests
- Learning Assistant: Explaining complex concepts from textbooks
- Meeting Notes Analysis: Summarizing and extracting action items
- Personal Knowledge Base: Querying my own notes and bookmarks
Challenges and Limitations
It's not all sunshine and rainbows. Be aware of:
- Hardware Requirements: Larger models need substantial RAM (16GB+ recommended)
- Speed: Inference is slower than cloud APIs (but improving rapidly)
- Model Quality: Some local models aren't as capable as GPT-4 (yet)
- Setup Complexity: More moving parts than a simple API call
The Future is Local (and Open)
The trend toward local AI isn't just about privacy or cost—it's about democratization. As models get better and hardware gets cheaper, we're moving toward a future where every developer can have their own AI assistant, customized to their needs and running on their terms.
Your Turn
Ready to build your own local AI agent? Start by:
- Installing Ollama and pulling a model
- Experimenting with the code examples above
- Customizing the agents for your specific use case
- Sharing what you build with the community
The best part? You own the entire stack. No vendor lock-in, no usage limits, just pure AI capability at your fingertips.
What will you build with your local AI agent? Share your projects and experiences in the comments below. Let's build the future of local AI together!
Want to dive deeper? Check out the Ollama GitHub repo and LangChain documentation for more advanced features and capabilities.
Top comments (0)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.