Akshat Uniyal

Posted on May 15

BugTheatre AI: Turning Screenshots, Logs, and Stack Traces Into Debugging Case Files with Gemma 4

#devchallenge #gemmachallenge #gemma

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

My app did not need another chatbot.

It needed an investigator.

Debugging rarely starts with a clean stack trace. It usually starts with a screenshot, a vague error, a half-copied log, and the feeling that something obvious is hiding in plain sight.

So I built BugTheatre AI — a local-first debugging investigator powered by Gemma 4 through Ollama.

BugTheatre AI turns messy bug evidence into a structured debugging case file. Instead of returning one generic answer, it builds a complete investigation board with:

a prime suspect
evidence found
missing evidence
confidence score
affected layer
false leads to avoid
safe patch plan
validation steps
rollback guidance
postmortem-ready export

The goal is simple:

Do not build an AI that merely answers bugs. Build an AI that investigates bugs.

Demo

The walkthrough follows this flow:

New Case -> Investigate Bug -> Investigation Board -> Patch Room -> Postmortem -> Export

BugTheatre starts with messy debugging evidence, lets local Gemma analyze it, and then turns the result into a structured case board, safe patch plan, and postmortem-ready export.

The most important screen is the Investigation Board, because it shows the main transformation: a vague bug report becomes an organized debugging case.

Code

Repository:

https://github.com/AkshatUniyal/bugtheatre-ai

Current POC stack:

Streamlit frontend
Python application logic
Ollama local model server
Gemma 4 E2B as the default local model
optional Gemma 4 E4B support
strict JSON investigation schema
local case history in data/cases.json
Markdown and JSON exports

How I Used Gemma 4

BugTheatre AI uses Gemma 4 E2B locally through Ollama as the default demo model.

I chose local Gemma because debugging evidence often contains sensitive information:

private URLs
internal file paths
customer IDs
stack traces
screenshots of proprietary systems
configuration details
environment names
API routes
session and auth clues

So local inference was not just a technical preference.

It was a product requirement.

Gemma 4 is used as the core reasoning layer for:

reading mixed debugging evidence
extracting visible facts
separating evidence from inference
identifying the prime suspect
listing missing information
avoiding likely false leads
proposing a safe patch path
generating validation steps
producing a postmortem-ready summary

The important design choice was to avoid using Gemma as a generic chat interface.

I wanted Gemma to act like an investigation engine.

The app asks Gemma for structured JSON, then the UI turns that into debugging artifacts that are easier to review, share, and act on.

Gemma 4 E4B is also supported as a higher-quality local option. For the challenge demo, E2B gives a practical balance between privacy, local performance, and useful reasoning on developer hardware.

Why I Built It

Most debugging tools assume the developer already knows where the problem is.

But real bugs are rarely that clean.

A developer may have:

a screenshot from staging
a browser console error
a partial stack trace
a failing network request
a config snippet
a vague user report
an environment detail that may or may not matter

The problem is not just finding a possible fix.

The real problem is knowing:

what evidence matters
what is probably noise
what the safest first action is
what missing information would raise confidence
what change could accidentally make things worse
how to explain the issue to the team afterward

That is where BugTheatre AI fits.

It treats a bug like a case file.

Example Case: Laravel 419 / CSRF Failure

For one demo, I used a PHP/Laravel admin bug.

The user is on an order details page and tries to update an order status from Processing to Shipped.

The UI shows:

Failed to update order status.
Page expired. Please refresh and try again.

DevTools shows:

POST /admin/orders/1024/status -> 419 Page Expired
CSRF token mismatch
TokenMismatchException
VerifyCsrfToken.php
Session driver: redis

A generic chatbot might say:

Try refreshing the page.

BugTheatre AI builds a deeper investigation:

Prime suspect: expired or stale Laravel CSRF/session token during admin inactivity
Evidence: HTTP 419, CSRF mismatch, page idle time, failed status update request
Missing evidence: Laravel logs, VerifyCsrfToken.php, SESSION_DRIVER, cookie/session settings, Redis health
False leads to avoid: database issue, payment issue, order calculation bug
Safe patch path: refresh/regenerate CSRF token before submission, handle 419 gracefully, preserve unsaved intent, verify session/cookie/CSRF middleware
Validation: reproduce idle session, submit form, verify re-authentication or token refresh path
Rollback: revert token-refresh handling if it blocks valid admin submissions

This is the kind of output I wanted: not just a fix, but a safe debugging path.

How It Works

BugTheatre AI follows a structured investigation pipeline:

User evidence
  -> privacy guard and secret redaction
  -> screenshot encoding, if provided
  -> strict investigation prompt
  -> local Gemma 4 through Ollama
  -> structured JSON case file
  -> rendered debugging views
  -> Markdown / JSON export

The model is not used as a free-form chat box.

Instead, Gemma is asked to return a structured case file with fields like:

case title
summary
severity
confidence
affected layer
prime suspect
evidence found
missing evidence
false leads
fix plan
validation steps
rollback plan
postmortem

That JSON is then rendered into dedicated product views.

Product Screens

BugTheatre AI has separate screens for separate debugging jobs.

Dashboard

The dashboard acts as the command center.

It shows:

current active investigation
saved cases
export readiness
local AI mode
demo story
next best action

This helps judges or teammates understand the workflow quickly.

New Case

The New Case screen captures messy evidence:

case title
language/framework
environment
short description
expected behavior
actual behavior
logs / stack trace
code or config snippet
screenshot evidence

Only the title is required, but more evidence improves the investigation.

Sample Cases

This is the demo gallery.

It includes prebuilt examples such as:

React hydration mismatch
FastAPI dependency mismatch
Docker port mismatch

This makes the project easy to test without inventing a bug first.

Investigation Board

This is the main screen.

It shows:

prime suspect
confidence
severity
affected layer
fastest fix
executive debug summary
evidence found
missing evidence
case snapshot

This is where the product stops feeling like a chatbot and starts feeling like a real debugging assistant.

Patch Room

Patch Room converts diagnosis into action.

It includes:

quick patch
clean fix
prevention
validation
safe patch review
suggested commands
action plan
rollback plan

For risky areas such as auth, session handling, dependency changes, deployment, payment, or data paths, BugTheatre is intentionally conservative. It avoids unsafe recommendations and pushes the user toward validation before change.

Postmortem

The postmortem view turns the debugging session into a workplace artifact.

It includes:

technical artifacts
summary
impact
root cause
detection
resolution
prevention
follow-up actions

Exports are available as Markdown and JSON.

Why This Is Not Just a Chatbot

A chatbot usually gives a prose answer.

BugTheatre AI turns one messy bug into a reusable investigation artifact.

That difference matters.

A bug investigation needs structure:

What is the strongest hypothesis?
What evidence supports it?
What evidence is missing?
What should we avoid changing first?
What is the safest patch path?
How do we validate the fix?
What do we tell the team afterward?

BugTheatre AI gives each of those questions a dedicated place in the workflow.

The result feels less like asking AI for help and more like opening a debugging case file.

Output Quality and Safety Guardrails

One thing I learned quickly: debugging recommendations can be dangerous if they are too confident or too generic.

So BugTheatre AI is designed to be conservative around risky areas.

For example:

Auth/session bugs should not blindly recommend shorter session timeouts.
Dependency bugs should tie recommendations to observed versions and lockfiles.
Deployment bugs should validate runtime facts before changing app code.
Payment or data-loss paths should recommend verification before modification.

In the Laravel 419 case, the safer recommendation is not “reduce the session timeout.”

The safer path is:

refresh or regenerate the CSRF token before submission
warn the user before session expiry
handle 419 gracefully with re-authentication or refresh
preserve unsaved intent where possible
verify Laravel session, cookie, CSRF middleware, and Redis/session configuration

This is the kind of caution I wanted the product to show.

Limitations

BugTheatre AI is a proof of concept, not an autonomous production fixer.

It does not modify code directly.

It does not deploy fixes.

It does not replace developer judgment.

Gemma helps structure the evidence and suggest likely patch paths, but developers should still validate fixes with logs, tests, and environment-specific checks.

Some limitations:

local model output can vary
screenshot-only analysis depends on readable text
strict JSON generation needs validation and repair loops
production use would need stronger permissions, audit trails, and integrations
a production UI would likely benefit from a React/Next.js frontend instead of Streamlit

What Worked Well

A few things worked better than expected:

Gemma 4 could reason over mixed debugging evidence.
Local inference made the privacy story much stronger.
Screenshot input made the demo feel realistic.
The investigation-board format made the output easier to trust.
Patch Room helped convert diagnosis into action.
Postmortem export turned a debugging session into something a team could actually use.

The biggest win was this:

BugTheatre did not just explain the bug.

It organized the debugging process.

Challenges

The hardest part was not building the UI.

The hardest part was controlling the output.

For a debugging product, the model needs to be useful without being reckless.

That meant spending time on:

structured prompts
strict JSON response shape
evidence vs inference separation
missing evidence fields
safer patch language
conservative recommendations for auth/session/deployment cases

Another challenge was screenshot analysis. If the screenshot is readable, Gemma can extract useful signals. But if the screenshot is noisy or cropped, the model needs to admit uncertainty instead of pretending it knows everything.

That is why “missing evidence” is a first-class section in the product.

Future Improvements

If I continue the project, I would like to add:

React/Next.js frontend
FastAPI backend
SQLite case history
JSON schema validation and retry/repair loop
GitHub issue export
Slack summary export
PDF reports
saved case detail pages
team workspaces
project-specific debugging playbooks
stronger secret redaction
automatic evidence quality scoring

A more advanced version could also support repository-level investigation, where BugTheatre reads the relevant files and connects the screenshot/log evidence back to actual code paths.

Closing

BugTheatre AI is built around a simple belief:

Debugging is not just about getting an answer.

It is about building a path from symptom to evidence to action.

That is why I built BugTheatre AI as an investigator, not a chatbot.

Gemma 4 made that practical locally by combining screenshot understanding, evidence extraction, and structured reasoning into a private debugging workflow.

Do not build an AI that merely answers bugs. Build an AI that investigates bugs.

DEV Community