This is a submission for the Gemma 4 Challenge: Build with Gemma 4
What I Built
My app did not need another chatbot.
It needed an investigator.
Debugging rarely starts with a clean stack trace. It usually starts with a screenshot, a vague error, a half-copied log, and the feeling that something obvious is hiding in plain sight.
So I built BugTheatre AI — a local-first debugging investigator powered by Gemma 4 through Ollama.
BugTheatre AI turns messy bug evidence into a structured debugging case file. Instead of returning one generic answer, it builds a complete investigation board with:
- a prime suspect
- evidence found
- missing evidence
- confidence score
- affected layer
- false leads to avoid
- safe patch plan
- validation steps
- rollback guidance
- postmortem-ready export
The goal is simple:
Do not build an AI that merely answers bugs. Build an AI that investigates bugs.
Demo
The walkthrough follows this flow:
New Case -> Investigate Bug -> Investigation Board -> Patch Room -> Postmortem -> Export
BugTheatre starts with messy debugging evidence, lets local Gemma analyze it, and then turns the result into a structured case board, safe patch plan, and postmortem-ready export.
The most important screen is the Investigation Board, because it shows the main transformation: a vague bug report becomes an organized debugging case.
Code
Repository:
https://github.com/AkshatUniyal/bugtheatre-ai
Current POC stack:
- Streamlit frontend
- Python application logic
- Ollama local model server
- Gemma 4 E2B as the default local model
- optional Gemma 4 E4B support
- strict JSON investigation schema
- local case history in
data/cases.json - Markdown and JSON exports
How I Used Gemma 4
BugTheatre AI uses Gemma 4 E2B locally through Ollama as the default demo model.
I chose local Gemma because debugging evidence often contains sensitive information:
- private URLs
- internal file paths
- customer IDs
- stack traces
- screenshots of proprietary systems
- configuration details
- environment names
- API routes
- session and auth clues
So local inference was not just a technical preference.
It was a product requirement.
Gemma 4 is used as the core reasoning layer for:
- reading mixed debugging evidence
- extracting visible facts
- separating evidence from inference
- identifying the prime suspect
- listing missing information
- avoiding likely false leads
- proposing a safe patch path
- generating validation steps
- producing a postmortem-ready summary
The important design choice was to avoid using Gemma as a generic chat interface.
I wanted Gemma to act like an investigation engine.
The app asks Gemma for structured JSON, then the UI turns that into debugging artifacts that are easier to review, share, and act on.
Gemma 4 E4B is also supported as a higher-quality local option. For the challenge demo, E2B gives a practical balance between privacy, local performance, and useful reasoning on developer hardware.
Why I Built It
Most debugging tools assume the developer already knows where the problem is.
But real bugs are rarely that clean.
A developer may have:
- a screenshot from staging
- a browser console error
- a partial stack trace
- a failing network request
- a config snippet
- a vague user report
- an environment detail that may or may not matter
The problem is not just finding a possible fix.
The real problem is knowing:
- what evidence matters
- what is probably noise
- what the safest first action is
- what missing information would raise confidence
- what change could accidentally make things worse
- how to explain the issue to the team afterward
That is where BugTheatre AI fits.
It treats a bug like a case file.
Example Case: Laravel 419 / CSRF Failure
For one demo, I used a PHP/Laravel admin bug.
The user is on an order details page and tries to update an order status from Processing to Shipped.
The UI shows:
Failed to update order status.
Page expired. Please refresh and try again.
DevTools shows:
POST /admin/orders/1024/status -> 419 Page Expired
CSRF token mismatch
TokenMismatchException
VerifyCsrfToken.php
Session driver: redis
A generic chatbot might say:
Try refreshing the page.
BugTheatre AI builds a deeper investigation:
- Prime suspect: expired or stale Laravel CSRF/session token during admin inactivity
- Evidence: HTTP 419, CSRF mismatch, page idle time, failed status update request
-
Missing evidence: Laravel logs,
VerifyCsrfToken.php,SESSION_DRIVER, cookie/session settings, Redis health - False leads to avoid: database issue, payment issue, order calculation bug
- Safe patch path: refresh/regenerate CSRF token before submission, handle 419 gracefully, preserve unsaved intent, verify session/cookie/CSRF middleware
- Validation: reproduce idle session, submit form, verify re-authentication or token refresh path
- Rollback: revert token-refresh handling if it blocks valid admin submissions
This is the kind of output I wanted: not just a fix, but a safe debugging path.
How It Works
BugTheatre AI follows a structured investigation pipeline:
User evidence
-> privacy guard and secret redaction
-> screenshot encoding, if provided
-> strict investigation prompt
-> local Gemma 4 through Ollama
-> structured JSON case file
-> rendered debugging views
-> Markdown / JSON export
The model is not used as a free-form chat box.
Instead, Gemma is asked to return a structured case file with fields like:
- case title
- summary
- severity
- confidence
- affected layer
- prime suspect
- evidence found
- missing evidence
- false leads
- fix plan
- validation steps
- rollback plan
- postmortem
That JSON is then rendered into dedicated product views.
Product Screens
BugTheatre AI has separate screens for separate debugging jobs.
Dashboard
The dashboard acts as the command center.
It shows:
- current active investigation
- saved cases
- export readiness
- local AI mode
- demo story
- next best action
This helps judges or teammates understand the workflow quickly.
New Case
The New Case screen captures messy evidence:
- case title
- language/framework
- environment
- short description
- expected behavior
- actual behavior
- logs / stack trace
- code or config snippet
- screenshot evidence
Only the title is required, but more evidence improves the investigation.
Sample Cases
This is the demo gallery.
It includes prebuilt examples such as:
- React hydration mismatch
- FastAPI dependency mismatch
- Docker port mismatch
This makes the project easy to test without inventing a bug first.
Investigation Board
This is the main screen.
It shows:
- prime suspect
- confidence
- severity
- affected layer
- fastest fix
- executive debug summary
- evidence found
- missing evidence
- case snapshot
This is where the product stops feeling like a chatbot and starts feeling like a real debugging assistant.
Patch Room
Patch Room converts diagnosis into action.
It includes:
- quick patch
- clean fix
- prevention
- validation
- safe patch review
- suggested commands
- action plan
- rollback plan
For risky areas such as auth, session handling, dependency changes, deployment, payment, or data paths, BugTheatre is intentionally conservative. It avoids unsafe recommendations and pushes the user toward validation before change.
Postmortem
The postmortem view turns the debugging session into a workplace artifact.
It includes:
- technical artifacts
- summary
- impact
- root cause
- detection
- resolution
- prevention
- follow-up actions
Exports are available as Markdown and JSON.
Why This Is Not Just a Chatbot
A chatbot usually gives a prose answer.
BugTheatre AI turns one messy bug into a reusable investigation artifact.
That difference matters.
A bug investigation needs structure:
- What is the strongest hypothesis?
- What evidence supports it?
- What evidence is missing?
- What should we avoid changing first?
- What is the safest patch path?
- How do we validate the fix?
- What do we tell the team afterward?
BugTheatre AI gives each of those questions a dedicated place in the workflow.
The result feels less like asking AI for help and more like opening a debugging case file.
Output Quality and Safety Guardrails
One thing I learned quickly: debugging recommendations can be dangerous if they are too confident or too generic.
So BugTheatre AI is designed to be conservative around risky areas.
For example:
- Auth/session bugs should not blindly recommend shorter session timeouts.
- Dependency bugs should tie recommendations to observed versions and lockfiles.
- Deployment bugs should validate runtime facts before changing app code.
- Payment or data-loss paths should recommend verification before modification.
In the Laravel 419 case, the safer recommendation is not “reduce the session timeout.”
The safer path is:
- refresh or regenerate the CSRF token before submission
- warn the user before session expiry
- handle 419 gracefully with re-authentication or refresh
- preserve unsaved intent where possible
- verify Laravel session, cookie, CSRF middleware, and Redis/session configuration
This is the kind of caution I wanted the product to show.
Limitations
BugTheatre AI is a proof of concept, not an autonomous production fixer.
It does not modify code directly.
It does not deploy fixes.
It does not replace developer judgment.
Gemma helps structure the evidence and suggest likely patch paths, but developers should still validate fixes with logs, tests, and environment-specific checks.
Some limitations:
- local model output can vary
- screenshot-only analysis depends on readable text
- strict JSON generation needs validation and repair loops
- production use would need stronger permissions, audit trails, and integrations
- a production UI would likely benefit from a React/Next.js frontend instead of Streamlit
What Worked Well
A few things worked better than expected:
- Gemma 4 could reason over mixed debugging evidence.
- Local inference made the privacy story much stronger.
- Screenshot input made the demo feel realistic.
- The investigation-board format made the output easier to trust.
- Patch Room helped convert diagnosis into action.
- Postmortem export turned a debugging session into something a team could actually use.
The biggest win was this:
BugTheatre did not just explain the bug.
It organized the debugging process.
Challenges
The hardest part was not building the UI.
The hardest part was controlling the output.
For a debugging product, the model needs to be useful without being reckless.
That meant spending time on:
- structured prompts
- strict JSON response shape
- evidence vs inference separation
- missing evidence fields
- safer patch language
- conservative recommendations for auth/session/deployment cases
Another challenge was screenshot analysis. If the screenshot is readable, Gemma can extract useful signals. But if the screenshot is noisy or cropped, the model needs to admit uncertainty instead of pretending it knows everything.
That is why “missing evidence” is a first-class section in the product.
Future Improvements
If I continue the project, I would like to add:
- React/Next.js frontend
- FastAPI backend
- SQLite case history
- JSON schema validation and retry/repair loop
- GitHub issue export
- Slack summary export
- PDF reports
- saved case detail pages
- team workspaces
- project-specific debugging playbooks
- stronger secret redaction
- automatic evidence quality scoring
A more advanced version could also support repository-level investigation, where BugTheatre reads the relevant files and connects the screenshot/log evidence back to actual code paths.
Closing
BugTheatre AI is built around a simple belief:
Debugging is not just about getting an answer.
It is about building a path from symptom to evidence to action.
That is why I built BugTheatre AI as an investigator, not a chatbot.
Gemma 4 made that practical locally by combining screenshot understanding, evidence extraction, and structured reasoning into a private debugging workflow.
Do not build an AI that merely answers bugs. Build an AI that investigates bugs.




Top comments (0)