DEV Community

Cover image for Running Local LLMs in Your Development Workflow
ElysiumQuill
ElysiumQuill

Posted on

Running Local LLMs in Your Development Workflow

Running Local LLMs in Your Development Workflow

In 2026, developers are increasingly turning to local LLMs to address privacy, cost, and latency concerns. This guide shows you how to integrate Ollama into your real development workflow for practical tasks like code review, test generation, and documentation.

Why Go Local?

Cloud AI assistants are powerful but come with tradeoffs:

  • Data leaves your network
  • Recurring API costs add up
  • Latency impacts productivity
  • Rate limits can interrupt work

Local LLMs solve these issues while providing surprisingly capable assistance for many development tasks.

Getting Started with Ollama

Installation

# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows (via WSL2)
winget install Ollama.Ollama
Enter fullscreen mode Exit fullscreen mode

Pull Models

# For coding assistance
ollama pull qwen2.5-coder:7b

# For faster completions
ollama pull phi3:mini

# For reasoning tasks
ollama pull llama3.1:8b
Enter fullscreen mode Exit fullscreen mode

Start Server

ollama serve
Enter fullscreen mode Exit fullscreen mode

Using Ollama as an OpenAI-Compatible API

Ollama exposes an OpenAI-compatible API at http://localhost:11434/v1.

Python Example

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)

response = client.chat.completions.create(
    model="qwen2.5-coder:7b",
    messages=[{"role": "user", "content": "Explain the difference between == and === in JavaScript"}]
)

print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

JavaScript Example

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:11434/v1",
  apiKey: "ollama"
});

const response = await client.chat.completions.create({
  model: "qwen2.5-coder:7b",
  messages: [{ role: "user", content: "Write a Python function to calculate fibonacci numbers" }]
});

console.log(response.choices[0].message.content);
Enter fullscreen mode Exit fullscreen mode

Practical Development Use Cases

Code Review Helper

Create a pre-commit hook that runs your code through a local model:

#!/bin/bash
# .git/hooks/pre-commit

# Get staged changes
diff=$(git diff --cached)

if [ -n "$diff" ]; then
  echo "Running local AI code review..."

  review=$(curl -s http://localhost:11434/api/generate -d '{"model": "llama3.1:8b", "prompt": "Review this code for obvious bugs and security issues. Be brief.\n\n'"$diff"'", "stream": false}' | python3 -c "import sys, json; print(json.load(sys.stdin)['response'])")

  echo "Review:"
  echo "$review"
fi
Enter fullscreen mode Exit fullscreen mode

Test Generation Assistant

Use local models to generate unit tests:

def generate_tests(function_code):
    prompt = "Write pytest unit tests for this Python function.
Include normal cases, edge cases, and error conditions.
Return only the test code.

Function:
" + function_code

    response = client.chat.completions.create(
        model="qwen2.5-coder:7b",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.1
    )

    return response.choices[0].message.content
Enter fullscreen mode Exit fullscreen mode

Documentation Generator

def generate_docstring(code_snippet):
    response = client.chat.completions.create(
        model="phi3:mini",
        messages=[{"role": "user", "content": "Write a clear docstring for this code:\n\n" + code_snippet}],
        temperature=0.3
    )

    return response.choices[0].message.content
Enter fullscreen mode Exit fullscreen mode

Choosing the Right Model

Different models excel at different tasks:

Task Model Notes
Code completion qwen2.5-coder:7b Purpose-built for code
Quick answers phi3:mini Very fast, 2GB size
Reasoning llama3.1:8b Better for complex logic
SQL queries qwen2.5-coder:7b Excellent SQL understanding

Editor Integration

VS Code with Continue

Install Continue extension and add to settings:

{
  "models": [{
    "title": "Local Qwen2.5 Coder",
    "provider": "ollama",
    "model": "qwen2.5-coder:7b",
    "apiBase": "http://localhost:11434"
  }]
}
Enter fullscreen mode Exit fullscreen mode

JetBrains IDEs

Use the JetBrains AI plugin with custom Ollama endpoint or try community plugins.

Performance Optimization

  1. GPU Acceleration: Ollama automatically uses available GPU
  2. Keep Models Warm: Prevent unloading with periodic requests
  3. Context Window: Smaller context = faster responses
   ollama run qwen2.5-coder:7b --num-ctx 2048
Enter fullscreen mode Exit fullscreen mode

Realistic Expectations

Local models in 2026 are excellent for:

  • Boilerplate generation
  • Simple code transformations
  • Test creation
  • Documentation
  • Learning explanations

They're less ideal for:

  • Complex architectural decisions
  • Debugging subtle production issues
  • Cutting-edge research questions
  • Multimodal tasks

Getting Started Right Now

Try this 5-minute experiment:

  1. Install Ollama
  2. Run ollama pull qwen2.5-coder:7b
  3. Start server with ollama serve
  4. Test with: curl http://localhost:11434/api/generate -d '{"model":"qwen2.5-coder:7b","prompt":"Hello world","stream":false}'

You'll see that capable AI assistance can run entirely on your machine today.


What tasks are you hoping to offload to local AI? Share your experiments in the comments!

Top comments (0)