Running Local LLMs in Your Development Workflow
In 2026, developers are increasingly turning to local LLMs to address privacy, cost, and latency concerns. This guide shows you how to integrate Ollama into your real development workflow for practical tasks like code review, test generation, and documentation.
Why Go Local?
Cloud AI assistants are powerful but come with tradeoffs:
- Data leaves your network
- Recurring API costs add up
- Latency impacts productivity
- Rate limits can interrupt work
Local LLMs solve these issues while providing surprisingly capable assistance for many development tasks.
Getting Started with Ollama
Installation
# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh
# Windows (via WSL2)
winget install Ollama.Ollama
Pull Models
# For coding assistance
ollama pull qwen2.5-coder:7b
# For faster completions
ollama pull phi3:mini
# For reasoning tasks
ollama pull llama3.1:8b
Start Server
ollama serve
Using Ollama as an OpenAI-Compatible API
Ollama exposes an OpenAI-compatible API at http://localhost:11434/v1.
Python Example
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama"
)
response = client.chat.completions.create(
model="qwen2.5-coder:7b",
messages=[{"role": "user", "content": "Explain the difference between == and === in JavaScript"}]
)
print(response.choices[0].message.content)
JavaScript Example
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:11434/v1",
apiKey: "ollama"
});
const response = await client.chat.completions.create({
model: "qwen2.5-coder:7b",
messages: [{ role: "user", content: "Write a Python function to calculate fibonacci numbers" }]
});
console.log(response.choices[0].message.content);
Practical Development Use Cases
Code Review Helper
Create a pre-commit hook that runs your code through a local model:
#!/bin/bash
# .git/hooks/pre-commit
# Get staged changes
diff=$(git diff --cached)
if [ -n "$diff" ]; then
echo "Running local AI code review..."
review=$(curl -s http://localhost:11434/api/generate -d '{"model": "llama3.1:8b", "prompt": "Review this code for obvious bugs and security issues. Be brief.\n\n'"$diff"'", "stream": false}' | python3 -c "import sys, json; print(json.load(sys.stdin)['response'])")
echo "Review:"
echo "$review"
fi
Test Generation Assistant
Use local models to generate unit tests:
def generate_tests(function_code):
prompt = "Write pytest unit tests for this Python function.
Include normal cases, edge cases, and error conditions.
Return only the test code.
Function:
" + function_code
response = client.chat.completions.create(
model="qwen2.5-coder:7b",
messages=[{"role": "user", "content": prompt}],
temperature=0.1
)
return response.choices[0].message.content
Documentation Generator
def generate_docstring(code_snippet):
response = client.chat.completions.create(
model="phi3:mini",
messages=[{"role": "user", "content": "Write a clear docstring for this code:\n\n" + code_snippet}],
temperature=0.3
)
return response.choices[0].message.content
Choosing the Right Model
Different models excel at different tasks:
| Task | Model | Notes |
|---|---|---|
| Code completion | qwen2.5-coder:7b | Purpose-built for code |
| Quick answers | phi3:mini | Very fast, 2GB size |
| Reasoning | llama3.1:8b | Better for complex logic |
| SQL queries | qwen2.5-coder:7b | Excellent SQL understanding |
Editor Integration
VS Code with Continue
Install Continue extension and add to settings:
{
"models": [{
"title": "Local Qwen2.5 Coder",
"provider": "ollama",
"model": "qwen2.5-coder:7b",
"apiBase": "http://localhost:11434"
}]
}
JetBrains IDEs
Use the JetBrains AI plugin with custom Ollama endpoint or try community plugins.
Performance Optimization
- GPU Acceleration: Ollama automatically uses available GPU
- Keep Models Warm: Prevent unloading with periodic requests
- Context Window: Smaller context = faster responses
ollama run qwen2.5-coder:7b --num-ctx 2048
Realistic Expectations
Local models in 2026 are excellent for:
- Boilerplate generation
- Simple code transformations
- Test creation
- Documentation
- Learning explanations
They're less ideal for:
- Complex architectural decisions
- Debugging subtle production issues
- Cutting-edge research questions
- Multimodal tasks
Getting Started Right Now
Try this 5-minute experiment:
- Install Ollama
- Run
ollama pull qwen2.5-coder:7b - Start server with
ollama serve - Test with:
curl http://localhost:11434/api/generate -d '{"model":"qwen2.5-coder:7b","prompt":"Hello world","stream":false}'
You'll see that capable AI assistance can run entirely on your machine today.
What tasks are you hoping to offload to local AI? Share your experiments in the comments!
Top comments (0)