Simran Shaikh

Posted on May 16

I Built a Tool That Proves Your Code Is Yours — Here's What Gemma 4 Made Possible

#devchallenge #gemmachallenge #gemma

Gemma 4 Challenge: Build With Gemma 4 Submission

There is a question spreading quietly through the software industry right now. Hiring managers are asking it. Hackathon judges are asking it. Open source maintainers are asking it.

"Did you actually build this?"

Nobody has a good answer yet. I tried to build one — and Gemma 4 is the reason it worked.

The Problem Nobody Is Talking About

AI-assisted development has gone mainstream fast. Cursor, Copilot, Lovable, Bolt — developers are shipping real products with significant AI assistance, and that is genuinely fine. The tools exist, the skills are in using them well.

But a trust gap is forming. When you submit a project to a hackathon, post a repo on GitHub, or show work in a job interview, reviewers are increasingly skeptical. The portfolio that used to signal skill now also signals a question mark.

The current answer to "did you build this?" is essentially: trust me.

That is not good enough. And trying to detect AI-generated code is an arms race nobody will win — models improve, detection fails, repeat.

I wanted a different approach: instead of detecting AI, document the human.

What I Built: VibeSafe

VibeSafe is a browser-based code auditor that takes your project files, sends them to Gemma 4 31B in a single prompt, and returns a Proof of Authorship certificate — a structured document identifying your human architectural decisions versus AI-assisted patterns.

The output looks like this:

HUMAN ARCHITECTURAL DECISIONS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. Separation of database connection into factory function
   Evidence: get_db() pattern used consistently across modules

2. Deliberate stateless token design
   Evidence: login() returns user ID directly — intentional tradeoff

3. Privacy-first architecture: no backend, direct browser API calls
   Evidence: all external calls made from frontend, no server layer

These are the things only I would have decided. The boilerplate React hooks and Tailwind utility classes? Gemma 4 flags those as AI-assisted. The product decisions, the tradeoffs, the specific ways the pieces connect? Those are mine.

Why Gemma 4 Specifically

I tried this concept with smaller models first. It did not work.

The problem is that distinguishing intent from output requires holding the entire codebase in mind simultaneously. A model that has only seen half your files cannot tell you whether your architectural choices are consistent across the project. It cannot spot that you made the same deliberate tradeoff in three different places — which is actually the strongest signal of human authorship.

The 262K Context Window Changes the Analysis

Gemma 4's 262K context window means I send everything in one shot:

const combined = files
  .map(f => `\n\n=== FILE: ${f.name} ===\n${f.content}`)
  .join('')

// One prompt. Entire project. Gemma 4 sees everything at once.
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
  body: JSON.stringify({
    model: 'google/gemma-4-31b-it:free',
    messages: [{ role: 'user', content: buildPrompt(combined) }]
  })
})

No chunking. No lost context. No missed cross-file patterns. The model sees the whole picture before making any judgment — the same way a senior engineer would read a codebase before commenting on it.

31B Dense vs the MoE Model

I specifically chose the 31B Dense model over the 26B MoE for this use case.

The MoE model activates ~3.8B parameters per token — it is faster and more efficient, ideal for high-throughput applications. But security analysis needs consistent reasoning quality on every single token. Missing one vulnerability because a parameter set was not activated is worse than slower inference. For a tool that is auditing your code for real risks, I wanted the full model engaged on every decision.

Reasoning Mode for Authorship Detection

The part that surprised me most was how well Gemma 4 handles the authorship question when prompted correctly. Generic "review my code" prompts produce generic answers. But when you ask specifically about intent:

Look for architectural decisions that reflect product thinking.
Look for specific tradeoffs that reveal human judgment.
Distinguish these from patterns that are generic and could be AI-generated.

Gemma 4 produces genuinely insightful distinctions. It identified that my choice to put authorship as the hero card — not security — was a human product decision. It noticed the privacy-first architecture (no backend) as a deliberate tradeoff, not a default. It caught that I reused the same terminal aesthetic across components as a consistent design language.

That is not code review. That is architectural reasoning.

What Running VibeSafe on Itself Taught Me

I ran VibeSafe on its own source code. The results were honest in a way I did not expect.

Human decisions Gemma 4 identified:

Authorship card as hero feature (product decision, not default)
Direct browser-to-API architecture (privacy tradeoff)
Terminal aesthetic as unified design language
Certificate export as plain text (accessibility over PDF complexity)

AI-assisted patterns it flagged:

Standard Tailwind utility class combinations
Boilerplate useState/useEffect patterns
Generic error boundary structure

Originality score: 74/100

That feels right. A good chunk of the implementation is standard React patterns. But the product decisions — what to build, how to frame it, what matters to the user — those are mine. 74 out of 100 captures that honestly.

What This Means for Developers Right Now

Open models at Gemma 4's capability level running on free infrastructure changes what individual developers can build.

Six months ago, this analysis would have required:

A paid API with expensive per-token costs
A backend to handle large context requests
Chunking logic to split codebases into pieces
Multiple round-trips losing context between calls

Now it is a single fetch() call from a React component. Free. 262K tokens. Full model. No backend.

The barrier between "idea" and "working product" for AI-powered developer tools has dropped significantly. VibeSafe went from concept to working demo in a weekend — not because the engineering is simple, but because Gemma 4 handles the hard part.

The Bigger Picture

The "did you build this?" problem is not going away. It is going to intensify as models improve and AI-assisted development becomes more capable.

But I think the framing of the question is wrong. The interesting question is not "how much did AI write?" — it is "what did the human decide?"

Architecture. Product instincts. Tradeoffs. The specific shape of a solution. These things are still fundamentally human, even when the implementation is AI-assisted. They are also what actually matter in a developer.

VibeSafe is a first attempt at making those decisions visible and documentable. Gemma 4's reasoning capability and context window are what made it possible to build something that actually captures them.