DEV Community: Matt Buscher

Gemma 4 Just Dropped. It's the Sharpest Tool in the Shed. Do You Have a Plan?

Matt Buscher — Sat, 04 Apr 2026 13:01:17 +0000

Google released Gemma 4 on April 2nd — four models, Apache 2.0 licensed, running locally on your phone, your Raspberry Pi, your laptop. The 26B MoE model hits 88.3% on AIME 2026 with only 3.8B active parameters. The 31B Dense is the #3 open model in the world right now.

256K context window. Native vision and audio. 140+ languages. Offline. Near-zero latency.

This is genuinely incredible. And it's exactly where things get dangerous.

The Power Tool Problem

I've spent 35+ years managing people, processes, and teams. Founded multiple companies. Last 4 years deploying AI internally and to organizations.

Here's what I've learned: the more powerful the tool, the more critical the plan.

Gemma 4 running locally is like handing someone an industrial multi-tool with every attachment — cutting, drilling, grinding, polishing — and saying "go build something." The capability is real. But without a structured approach, you're going to spend more time recovering from mistakes than building anything useful.

I see this pattern constantly. A developer spins up a local model, starts a project, has an incredible first session. Comes back the next day and the model has no idea what happened. Re-explains everything. Loses 30 minutes of context. Makes a decision that contradicts yesterday's because nobody wrote it down.

Three sessions in, the project is a mess of conflicting directions and duplicated work.

The model didn't fail. The management did.

What Changes When the Model Runs Locally

Cloud-hosted models have guardrails baked into the platform. Rate limits force you to slow down. API costs make you think before you prompt. The friction is annoying, but it's also protective.

Local models remove all of that friction. Gemma 4 on your machine means unlimited conversations, zero cost per query, no waiting. That sounds like freedom. It is freedom. But freedom without structure is just chaos with better hardware.

When you're running locally, you need to bring your own structure. The model won't remember your last session. It won't track your decisions. It won't warn you when you're contradicting something you said yesterday. That's your job.

The Framework Approach

This is why I built PromptPack — a methodology for managing AI collaboration the same way you'd manage a team. The core idea is dead simple: if it matters, move it from chat into markdown.

Chat shapes your thinking. Markdown preserves your thinking. The AI executes your thinking. But only if you give it structured context instead of hoping it remembers a conversation from three sessions ago.

For a local Gemma 4 setup, three things matter most:

Session continuity. A PICKUP.md that tells the model exactly where you left off — what was decided, what changed, what's next, what to watch out for. Every session starts by reading this file. Every session ends by updating it. No cold starts. No re-explaining.

Progressive disclosure. You don't dump your entire project into a 256K context window just because you can. You load what the current task needs. Orientation layer for simple questions. Core working context for implementation. Deep context only when resolving ambiguity. This isn't just organization — it's how you keep a local model focused instead of drowning in its own context.

Guardrails before building. Before you write a single line of code, you define what the system must never do. Constraints, boundaries, non-negotiables. With a model as capable as Gemma 4, the temptation to just start building is enormous. Resist it. Ten minutes of guardrail definition saves ten hours of unwinding bad decisions.

Why This Matters More for Local Models

With cloud models, you have a support structure. The platform manages sessions. The API has rate limits. There's documentation and community patterns.

With a local Gemma 4, you're on your own. You're the platform. You're the session manager. You're the one deciding how much context to load and when to stop and define constraints before building.

That's not a weakness of local AI. It's the whole point. You get full control. But full control means full responsibility.

The developers who will get the most out of Gemma 4 running locally aren't the ones with the best hardware. They're the ones with the best project structure. A markdown folder with a README, a spec, constraints, and a handoff file will outperform a $5,000 GPU setup with no plan every single time.

The Sharpest Tool in the Shed

Gemma 4 is real. The benchmarks are real. Running a model this capable on a Raspberry Pi would have been science fiction two years ago.

But here's the thing I keep coming back to after 35 years of managing people and 4 years of managing AI: power without structure is just expensive chaos. It was true for teams of people. It's true for teams of agents. It'll be true for whatever comes next.

The sharpest tool in the shed still needs someone who knows what they're building.

Bring a plan.

I'm Matt Buscher — 40+ years in engineering and executive management, 4 years deploying AI. I spent a career teaching people how to manage. Now I'm teaching AI how to be managed. More at aipromptpacks.io.

The 8 Mistakes That Kill AI Projects (And the System That Fixes All of Them)

Matt Buscher — Fri, 03 Apr 2026 01:36:41 +0000

Most AI projects don't fail because of bad prompts. They fail because there's no structure around the work.

I've been deploying AI workflows across startups, public libraries, and my own software products for about four years now. Different teams, different tools, different industries — but the same patterns keep showing up.

The prompts are fine. The AI is capable. But the project still falls apart after session two because nobody built the operating system the AI needs to do consistent, compounding work.

I started calling them the Chaos Loop, because once you hit one, each mistake feeds the next.

The Chaos Loop: Why AI Projects Spiral

These eight mistakes don't happen in isolation. They cascade. You hit the Chat Trap, which causes a Premature Build, which leads to Scope Bloat, which creates Context Overload — and suddenly you're three weeks in with nothing shippable.

1. The Chat Trap

This is the foundational mistake, and nearly everyone makes it. You treat the AI chat window as your project workspace. All your decisions, architecture notes, constraints, and progress live inside a conversation that will evaporate the moment you close the tab.

The Chat Trap feels productive because you're getting immediate responses. But you're building on sand. Every piece of knowledge you generate is ephemeral. Close the session, and you start over. The fix isn't to have longer chats — it's to preserve your thinking outside the chat in structured, persistent files your AI can reference later.

2. The Premature Build

You have an idea, you open a chat, and you immediately ask the AI to start building. No spec. No constraints. No success criteria. The AI happily generates code, content, or architecture — and it's wrong, because it was never told what "right" looks like.

I once watched a team go through three complete rebuilds of a feature because nobody wrote a one-page spec first. Three rounds of generating, reviewing, discarding, and re-prompting — when ten minutes of upfront specification would have nailed it on the first pass. If you don't define what you want before you start building, you'll define it through expensive iteration after.

3. Scope Bloat

AI makes it easy to add things. Too easy. You ask for a login page and the AI offers OAuth, SSO, passwordless auth, and social login. You ask for a dashboard and suddenly there are twelve widgets, three chart libraries, and a notification system nobody requested.

Without explicit constraints and a defined feature boundary, AI will expand scope until the project becomes impossible to ship. The solution is a constraints file — a document that tells your AI not just what to build, but what not to build. Guardrails aren't limitations; they're how you actually finish things.

4. Context Overload

This is the opposite of having too little context — and it's just as destructive. You dump your entire codebase, all your docs, every conversation transcript into the prompt, thinking more context equals better results. Instead, you get 50,000 tokens of noise and an AI that can't find the signal.

Large context windows aren't the solution. They're a trap of their own. What you need is progressive disclosure — feeding the AI the right context for the current task, not all context for every task.

5. Missing Guardrails

You ask the AI to build a feature without telling it your tech stack, your code conventions, your deployment environment, or your performance constraints. So it picks its own. It chooses a framework you don't use. It writes patterns that conflict with your codebase.

Missing guardrails don't just create bad output — they create slow output. You spend more time correcting the AI's assumptions than you would have spent defining them upfront. A constraints document that lists your stack, your conventions, and your non-negotiables pays for itself in the first session.

6. Skill Duplication

You write the same kind of prompt over and over. Every project starts with the same boilerplate instructions: "You are a senior developer. Follow these conventions. Use this stack." You're re-teaching the AI things it already knew — in the last session, with a different project.

Reusable skill files solve this. Instead of re-prompting every time, you build a library of instructions your AI can load on demand. Write them once, reuse them across every project. Your AI's capabilities should compound across projects, not reset every time.

7. The Cold Start

Monday morning. You open your AI tool. Where were you? What did you decide last Thursday? What's the current state of the database migration? The AI doesn't know. You don't remember. So you spend 20 minutes re-explaining the project from scratch.

The Cold Start isn't just an annoyance; it's a compounding tax. Every session that starts with re-explanation wastes tokens, wastes time, and introduces inconsistency. A structured handoff document — updated at the end of every session — eliminates cold starts entirely. Two minutes of writing saves twenty minutes of re-explaining.

8. The Stale Cascade

Your project evolves, but your documentation doesn't. The spec from week one no longer reflects reality. The architecture doc describes a system you already changed. Now your AI is working from outdated instructions, generating output that conflicts with the current state of the project.

This is the quiet killer. Everything looks organized — you have docs, specs, a task list — but the content is stale. The AI follows the stale docs faithfully, which means it's faithfully building the wrong thing. Version-controlled project files with regular updates are the only defense.

The Pattern Behind the Pattern

Look at all eight mistakes together and a single root cause emerges: there's no operating system for the project. The AI has no persistent structure to work within. Each session is improvised. Each prompt is ad hoc. There's no canonical source of truth, no defined workflow, no mechanism for knowledge to compound across sessions.

The AI isn't the problem. The absence of structure around the AI is the problem.

What the Fix Looks Like

The shift is simple but fundamental: stop treating AI as a chat partner and start treating it as a managed team member. Give it the same things you'd give a new developer joining your project — a project brief, a spec, constraints, conventions, and a status update on where things stand.

When you add that structure — a project definition, a spec, constraints, reusable skills, a handoff document, and version-controlled files — every one of these eight mistakes disappears. Not because you're prompting better, but because you've built the infrastructure that makes good output inevitable.

Your thinking lives in markdown files, not chat bubbles. Your AI reads structured context before it starts working. Updates flow into persistent documents, not ephemeral conversations. Every session compounds on the last one instead of starting over.

This works with any AI — ChatGPT, Claude, Gemini, Copilot — because it's built on markdown files, not platform-specific features. If your AI can read text, it can use this system.

Hear It as a Conversation

If you prefer audio, here's a quick podcast-style breakdown of these same patterns:

I wrote a more detailed version of this with visual diagrams on my site. I also built a set of templates called PromptPack that implements this system if you want to skip the setup.

What's the mistake that bites you the most? I'm curious whether others have seen the same patterns.