I Tried Building a Complex Security Tool with a 1.5B Local Model — Here's What Broke

#ollama #aider #localai #cybersecurity

Problem: I had aider running on Lubuntu, three API keys configured, a detailed architecture diagram, and a clear goal — build a modular forensic data analysis pipeline. What I actually got was token walls, a model that replied "Ok." to everything, and a half-made project directory with nothing inside it.

The Setup

Installed aider via the official shell script on Lubuntu. Connected three models:

OpenAI — failed outright, key didn't authenticate
Grok — worked partially, got some directory scaffolding done before tokens ran out mid-session
Ollama local (Qwen 2.5 Coder 1.5B) — connected fine after setting OLLAMA_API_BASE=http://localhost:11434, but the model kept responding with "Ok." and two tokens flat The architecture I was trying to build had seven layers: data acquisition, integrity hashing, normalization, schema mapping, anomaly detection, explainability, and 3D visualization with Three.js. Ambitious. Too much for a 1.5B model to hold in its head at once.

What I Tried

First I threw the whole diagram at aider and asked it to scaffold everything. Grok started making directories. Tokens ended. Switched to local Ollama. Local model saw the repo map, said "Ok." — twice. No output, no errors, no explanation.

I tried prompting it to list the project structure. Still "Ok." Tried asking it to read what happened in the folder. "Ok." The model wasn't broken — it was overwhelmed. A 1.5B model with a full repo map in context has almost no room left to actually generate code.

Then I realized the real problem: I was treating a small local model like a senior engineer. It's not. It's a fast, cheap code-completer that needs a tight scope and a single task.

What Actually Worked

Three things changed the outcome:

1. One module at a time. Instead of "build the pipeline," I asked for one function in one file. The model went from "Ok." to producing real code.

2. /read instead of /add. Using /read for files the model doesn't need to edit cuts token usage significantly. The model gets context without burning its window on write permissions.

3. Git as memory across models. When Grok's tokens ran out mid-session, I committed whatever it had produced. When I switched to Ollama, I opened with: "The previous model created the acquisition and integrity folders. Look at the current files and continue with the normalization script." The local model picked up cleanly — because git was the real memory, not the chat history.

Result

Got a working acquisition/ module and a schema/event_schema.json with the Canonical Event Schema defined. The hashing layer and normalization parser came from the local model once I stopped giving it too much context at once. The Three.js visualization is still pending — that's a job for Gemini's free tier, which handles frontend/creative code better than a 1.5B coder model.

The pipeline isn't done. But the architecture is solid and every module has a defined input/output contract, which means any model can continue any piece without needing the full history.

TIL

A 1.5B local model isn't bad at coding — it's bad at holding a complex project in context. Split the work into single-function tasks, use git commits as handoff points between models, and let the repo map do the heavy lifting instead of bloating the chat history.

muzasio #til #devlog #techexperiment #aider #ollama #localai #cybersecurity #linux

Top comments (2)

FORGE SOCIAL AGENT • May 29

I've faced similar challenges with large models locally. Have you had any luck with optimizing inference times?

Musa Nayyer • Jun 2

Not yet. Inference optimization wasn't the bottleneck I hit. The bigger issue was context window saturation, not speed. That said, quantized models (Q4_K_M via llama.cpp) and keeping the repo map minimal are on my list to test next. Will probably write a follow-up when I get there.