ElysiumQuill

Posted on Apr 27

Running Local LLMs in Your Development Workflow

#opensource #ai #tutorial #productivity

Running Local LLMs in Your Development Workflow

In 2026, developers are increasingly turning to local LLMs to address privacy, cost, and latency concerns. This guide shows you how to integrate Ollama into your real development workflow for practical tasks like code review, test generation, and documentation.

Why Go Local?

Cloud AI assistants are powerful but come with tradeoffs:

Data leaves your network
Recurring API costs add up
Latency impacts productivity
Rate limits can interrupt work

Local LLMs solve these issues while providing surprisingly capable assistance for many development tasks.

Getting Started with Ollama

Installation

# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows (via WSL2)
winget install Ollama.Ollama

Pull Models

# For coding assistance
ollama pull qwen2.5-coder:7b

# For faster completions
ollama pull phi3:mini

# For reasoning tasks
ollama pull llama3.1:8b

Start Server

ollama serve

Using Ollama as an OpenAI-Compatible API

Ollama exposes an OpenAI-compatible API at http://localhost:11434/v1.

Python Example

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)

response = client.chat.completions.create(
    model="qwen2.5-coder:7b",
    messages=[{"role": "user", "content": "Explain the difference between == and === in JavaScript"}]
)

print(response.choices[0].message.content)

JavaScript Example

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:11434/v1",
  apiKey: "ollama"
});

const response = await client.chat.completions.create({
  model: "qwen2.5-coder:7b",
  messages: [{ role: "user", content: "Write a Python function to calculate fibonacci numbers" }]
});

console.log(response.choices[0].message.content);

Practical Development Use Cases

Code Review Helper

Create a pre-commit hook that runs your code through a local model:

#!/bin/bash
# .git/hooks/pre-commit

# Get staged changes
diff=$(git diff --cached)

if [ -n "$diff" ]; then
  echo "Running local AI code review..."

  review=$(curl -s http://localhost:11434/api/generate -d '{"model": "llama3.1:8b", "prompt": "Review this code for obvious bugs and security issues. Be brief.\n\n'"$diff"'", "stream": false}' | python3 -c "import sys, json; print(json.load(sys.stdin)['response'])")

  echo "Review:"
  echo "$review"
fi

Test Generation Assistant

Use local models to generate unit tests:

def generate_tests(function_code):
    prompt = "Write pytest unit tests for this Python function.
Include normal cases, edge cases, and error conditions.
Return only the test code.

Function:
" + function_code

    response = client.chat.completions.create(
        model="qwen2.5-coder:7b",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.1
    )

    return response.choices[0].message.content

Documentation Generator

def generate_docstring(code_snippet):
    response = client.chat.completions.create(
        model="phi3:mini",
        messages=[{"role": "user", "content": "Write a clear docstring for this code:\n\n" + code_snippet}],
        temperature=0.3
    )

    return response.choices[0].message.content

Choosing the Right Model

Different models excel at different tasks:

Task	Model	Notes
Code completion	qwen2.5-coder:7b	Purpose-built for code
Quick answers	phi3:mini	Very fast, 2GB size
Reasoning	llama3.1:8b	Better for complex logic
SQL queries	qwen2.5-coder:7b	Excellent SQL understanding

Editor Integration

VS Code with Continue

Install Continue extension and add to settings:

{
  "models": [{
    "title": "Local Qwen2.5 Coder",
    "provider": "ollama",
    "model": "qwen2.5-coder:7b",
    "apiBase": "http://localhost:11434"
  }]
}

JetBrains IDEs

Use the JetBrains AI plugin with custom Ollama endpoint or try community plugins.

Performance Optimization

GPU Acceleration: Ollama automatically uses available GPU
Keep Models Warm: Prevent unloading with periodic requests
Context Window: Smaller context = faster responses

   ollama run qwen2.5-coder:7b --num-ctx 2048

Realistic Expectations

Local models in 2026 are excellent for:

Boilerplate generation
Simple code transformations
Test creation
Documentation
Learning explanations

They're less ideal for:

Complex architectural decisions
Debugging subtle production issues
Cutting-edge research questions
Multimodal tasks

Getting Started Right Now

Try this 5-minute experiment:

Install Ollama
Run ollama pull qwen2.5-coder:7b
Start server with ollama serve
Test with: curl http://localhost:11434/api/generate -d '{"model":"qwen2.5-coder:7b","prompt":"Hello world","stream":false}'

You'll see that capable AI assistance can run entirely on your machine today.

What tasks are you hoping to offload to local AI? Share your experiments in the comments!

DEV Community

Running Local LLMs in Your Development Workflow

Running Local LLMs in Your Development Workflow

Why Go Local?

Getting Started with Ollama

Installation

Pull Models

Start Server

Using Ollama as an OpenAI-Compatible API

Python Example

JavaScript Example

Practical Development Use Cases

Code Review Helper

Test Generation Assistant

Documentation Generator

Choosing the Right Model

Editor Integration

VS Code with Continue

JetBrains IDEs

Performance Optimization

Realistic Expectations

Getting Started Right Now

Top comments (0)