Kristoffer Nordström

Posted on Mar 5 • Originally published at blog.northerntest.se

Beyond Vibe Coding

#productivity #tooling #testing #ai

Over Christmas break 2025, I built a personal knowledge base. Not a notes app. Not a RAG app. A memory system with what I call a Data Constitution. The crown
jewel of everything else I've built since.

It syncs Gmail, Calendar, Slack, WhatsApp, ChatGPT conversations, GitHub issues, and handwritten notes into a single searchable index. Eight data sources, all flowing into one place. I
used to joke that ChatGPT had become my extended memory after four years and thousands of conversations. Then I built something that actually deserves that description, running locally,
under my control, with every piece of knowledge traceable back to its source.

Retrieval uses hybrid search: BM25 keyword matching, vector semantic search, and cross-encoder reranking. Everything lives as plain-text org-mode files. No vendor lock-in. Editable in any
text editor, forever.

Built in about four weeks with Claude Code. Running every day since. And designed around one principle: the AI assists, I decide.

If you follow the AI discourse, there's a term for this: "vibe coding." Anthropic themselves used it. The narrative goes something like: you tell the AI what you want, it writes the code,
you ship it. Easy.

I hate that term.

Not because it's wrong about the speed. The speed is real. What took weeks now takes hours. But "vibe coding" implies something passive. Like you're along for the ride. Like the hard part
is the prompting. Worse, it erases the human from the story entirely. As if the person behind the keyboard is interchangeable with anyone else who can type a prompt.

The hard part was never the prompting.

What actually happened

I chose org-mode as the foundation because I've used Emacs for years and I wanted plain text files I could edit in any tool, forever. Not markdown, not a database, not some proprietary
format. That's a sustainability decision that shapes everything downstream.

I already had a task management workflow that had been running for years: a Getting Things Done system in Emacs with specific states and transition rules. The new systems didn't replace
that. They were built to integrate with it, to respect an existing way of working rather than imposing a new one.

I built a two-stage retrieval system because I kept hitting context window limits with local models. When you have over a hundred thousand indexed chunks, you can't just load everything.
First pass: LLM-generated doc-card summaries that compress each document into its key facts and topics. Second pass: load the full document only for the hits that actually matter. That
pattern didn't come from a design document. It came from weeks of research, testing, hitting walls, reading about how other people solved similar problems, and making judgment calls based
on twenty years of building and testing systems.

The knowledge base became the foundation for everything else. A personal assistant that uses the KB to draft email replies and process receipts overnight. An evaluation framework that
tests whether local open-source models can handle my actual workflows. A training pipeline that does LoRA fine-tuning of local models using data extracted from my own real workflows,
exports to GGUF format for local inference, and validates through the eval framework before anything gets promoted to production. A frozen training corpus, built from my own data.

Throughout all of this I wasn't just prompting and accepting whatever came back. I use multiple AIs in my workflow: Claude Code as the main driver for implementation, ChatGPT for plan
review and feedback from a different perspective. I sit in the middle as the architect, actively reading, writing my own analysis, bouncing ideas between systems. It usually takes two or
three iterations before a plan is significantly better than what either AI initially suggested. That's not copying and pasting between chatbots. That's engineering.

And for all of these decisions I had to intimately know the system I was building. How it connects, how it behaves, where the failure modes are. Not every line of code, but the
architecture and its consequences. I pushed back on the AI when it tried to go off the rails. I challenged approaches that felt wrong. This is what testers do as a natural part of our
expertise. We learn systems deeply, even ones we didn't write.

None of that is vibes. That's active architecture.

The ettool weekend

The sharpest example happened in late January. On a Friday afternoon I was testing a pull request at work and got frustrated with my note-taking. The exploratory testing workflow I'd used
for years was: test in the browser, switch to a text file, write what I found, switch back, lose my place, repeat. By the time I left work that evening, I'd captured the idea and a rough
sketch of what a tool could look like.

Saturday lunch I went downstairs to my computer and worked through the architecture. What should it capture, how should the pieces connect, what constraints matter. Saturday afternoon,
about three to four hours of work, I had a working tool. ettool records your browser actions and your voice narration simultaneously, real-time
I mark what's a bug, an issue, a comment and so on via hot-keys, then it has an LLM correlate the three into structured test findings. After a session you get an org-mode report: timeline
of actions, correlated voice observations, and extracted findings categorized by type.

On Sunday I built an ambitious extension: real-time voice transcription with LLM intent detection. The idea was that the tool would understand what you were doing while you were doing it.

I tried it. It worked. I hated it. Within minutes I could feel it changing how I tested, and not in a good way. The real-time feature was second-guessing me, and I was losing the sense of
control that makes exploratory testing work. Me, the expert. The tester. The one with the domain knowledge and the judgment for where the risks are. The AI was taking that agency away.

I removed the feature and documented why in the repo.

Friday idea. Saturday MVP. Sunday experiment, tested, rejected. That entire cycle happened in one weekend. But here's what "vibe coding" misses about that story: none of it would have been
possible without the experience I brought to it. Knowing what exploratory testing needs. Knowing how testers think during sessions. Knowing what the browser APIs support, how to structure
audio processing, when a feature is hurting more than helping. A junior developer could have prompted their way to something that superficially worked. They wouldn't have known to reject
the real-time feature because they wouldn't have felt the cognitive interference the way someone with years of testing experience does.

ettool is now the tool I use for most of my exploratory testing sessions at work. The feature I rejected in January is still rejected.

What surprised me wasn't that I could build it fast. It was that I could discard a feature without emotional attachment because it was so cheap to try. When building is expensive, you get
attached to what you've built. When it's cheap, you can afford to be honest about what doesn't work.

That's not vibes. That's faster epistemology.

The architecture nobody sees

"Vibe coding" glosses over the most important part of building software: the decisions that aren't about code.

My knowledge base has a Data Constitution. Three validation lanes: CLEAN, REPAIR, QUARANTINE. Corrupted data fails visibly, never silently. That's a governance decision. The personal
assistant has a hard boundary: it drafts, I decide. No external action without explicit human approval. That's a trust boundary. The evaluation framework uses holdout sets and promotion
gates so local models can't regress on tasks they already pass. That's quality engineering.

These systems are built with the same rigor I bring to my professional work. Automated test harnesses. Git version control. CI pipelines with GitHub Actions. Code review. The kind of
engineering discipline that keeps systems from rotting. That rigor is exactly what separates this from actual vibe coding, because vibe coding does exist. There are plenty of examples of
people and organizations who prompt their way to a solution without preserving institutional knowledge, without knowing which decisions were made or why. It works until something breaks,
and then they're in a real mess because nobody understands the system well enough to fix it.

I didn't prompt my way to any of those architectural decisions. They came from twenty years of learning what happens when you ship systems without validation gates. The AI wrote the code.
But I researched the problems, I learned the domains, I made the architectural calls, and I rejected the approaches that didn't hold up. I was not a passive passenger. I was an active
architect who happened to have a very fast builder on the team.

What I'd actually call it

I've been trying to find a better term. "Architect-driven AI development" is accurate but clunky. When I think about the difference between me and my friend Fredrik, who is a proper
developer in ways I'll never be, it comes down to this: he's a tool-forger, I'm a tool-user. He wants to understand and validate every implementation. I want to understand the building
without inspecting every brick. He judges systems by correctness and elegance. I judge them by whether they actually change how I work and what value they bring.

Both are real technical skills. They just operate at different layers.

The most honest description I've found: I don't need to inspect every brick, but I care deeply about the building.

Come to think of it, that's how testers have always worked. We don't write the code line by line, but we're still deeply responsible for the system being delivered. We've spent decades
being accountable for things we didn't author. Responsibility without authorship. Maybe testers were built for this era more than we realize.

If you're building with AI and it feels like more than vibes, it probably is. The prompting is the easy part. Knowing what to ask for, what to reject, and what the system needs to be when
you're not looking at it. That's the work. That's always been the work. And it still requires a human who knows what they're doing.

DEV Community

Beyond Vibe Coding

What actually happened

The ettool weekend

The architecture nobody sees

What I'd actually call it

Top comments (0)