DEV Community: dueprincipati

Gemma 4 Multimodal Reasoner: Local Visual CoT for Charts, Math, and UI Analytics

dueprincipati — Sun, 17 May 2026 08:57:29 +0000

*This is a submission for the Gemma 4 Challenge: Build with Gemma 4*

What I Built

Gemma 4 Multimodal Reasoner is a production-ready, highly modular multimodal reasoning engine designed to unlock advanced visual and textual analysis. While traditional vision-language pipelines often treat perception and text generation as detached layers, this library deeply integrates Gemma 4's native chain-of-thought capabilities into a unified Python API and a seamless Command Line Interface (CLI).

The engine abstracts the complexity of configuring diverse inference providers and handles advanced visual token budget allocations out of the box by providing four specialized analytical tools:

Document Analysis (DocumentParser): Tailored prompt structures to read and convert complex data tables into Markdown, extract specific line items from invoices or receipts, parse filled form fields, analyze workflow diagrams, and transcribe handwritten annotations.
Chart & Graph Interpretation (ChartAnalyzer): Automatically maps axis boundaries, intervals, legends, and general trends from complex graphs, isolating numerical values and outliers into structured CSV or Markdown formats.
Visual Math Solver (MathSolver): Capitalizes on the model's strong logical capacity to break down complex mathematical formulas or geometric problems directly from an image, outputting comprehensive, step-by-step breakdowns.
Screen & UI Understanding (ScreenAnalyzer): Built specifically for agentic workflows, it maps interactable interface elements, estimates relative component coordinates, and performs automated accessibility (WCAG) contrast and alt-text compliance reviews.

Demo

The application features a built-in interactive CLI that makes analyzing images or running multi-turn conversations completely immediate from the terminal:

# Start an interactive multi-turn chat session with a visual context
gemma4 chat --image ./data/app_screenshot.png

Additionally, a complete Jupyter Notebook walkthrough is available in the repository (notebooks/demo.ipynb), making it fully optimized for deployment on cloud-hosted environments like Google Colab.

Code

The full open-source package, architecture layers, standalone container deployment profiles, and automated test suites are hosted publicly here:

👉 GitHub Repository: dueprincipati/gemma4-reasoner

How I Used Gemma 4

When building a multimodal reasoning engine meant to scale seamlessly—from edge installations to massive cloud-hosted processing systems—model selection requires a careful balance between VRAM overhead, context windows, and absolute logical performance.

While our repository abstractly supports the full Gemma 4 family (E2B up to 31B Dense), we selected Gemma 4 E4B as our primary flagship local model.

The Core Case for Gemma 4 E4B

High Cognitive Density: Traditionally, small-footprint "edge" models fail at advanced visual reasoning tasks. Gemma 4 E4B completely shatters this stereotype, scoring an incredible 42.5% on the highly challenging AIME 2026 visual mathematics benchmark. This enabled our MathSolver and ChartAnalyzer tools to work locally with extreme precision without forcing a hard dependency on an expensive third-party cloud API.
Optimal VRAM Footprint: For an engine to be truly production-ready, it must execute on consumer hardware. In native BF16 precision, E4B takes up 15.0 GB of VRAM, fitting perfectly inside mid-tier consumer graphics cards and standard developer cloud nodes. Quantized down to q4_0, it drops to a meager 5.0 GB, turning regular laptops into air-gapped document parsing servers via local Ollama wrappers.
Massive Multi-Image Context Window: Taking advantage of the model's native 128K context window (131,072 tokens), the engine easily manages simultaneous multi-image comparison tasks (compare_images) without running into abrupt truncation issues or losing conversation state.

Technical Capabilities Unlocked by Gemma 4's Architecture:

1. Isolated Local Chain-of-Thought (CoT) Streams

Gemma 4 introduces native system control tokens. By injecting the <|think|> block directly into our system prompt constructor, we trigger internal chain-of-thought routing. Our abstract backend layer custom-parses the model's output streams, separating the internal <|channel>thought string from the final <|channel>analysis answer. This allows us to display a dedicated, beautifully formatted reasoning block in our terminal CLI while returning clean markdown answers to the user without verbose conversational overhead.

2. Dynamic Variable Token Allocation

Gemma 4's vision encoder allows flexible image resolutions through fine-grained token budgeting. Our ImageProcessor implements this natively to match specific computational needs:

MIN (70 visual tokens) allows minimal compute overhead for rapid structural framing or sequential video frame categorizations.
HIGH/MAX (560 to 1120 tokens) forces sub-segment upsampling (Pan & Scan). This completely unlocked high-fidelity OCR, making dense table cell data extractions, invoice reading, and raw handwriting transcription extremely robust.

3. Clean History Multi-Turn Scaling

Per Gemma 4's official specifications, passing historical reasoning channels inside active conversational arrays degrades subsequent generation quality. Our multi-turn state manager automatically strips historical thinking output sequences before packaging the next chat context turn. This ensures our interactive chatbot tracks conversational context flawlessly over long multi-turn lengths without performance drops.

4. Architecture Agnostic Scalability

The structural decoupling of our processing tools from the inference backend ensures developers are never locked into a single ecosystem. Thanks to multi-backend runtime mapping (BACKEND=ollama | huggingface | openai_compat), if a production workflow requires moving to the max-tier Gemma 4 31B Dense model (boosting AIME performance to an unmatched 89.2%), swapping backends is as simple as updating a single environment configuration string.

Hermes Sentry-Core: The Autonomous On-Call Engineer

dueprincipati — Sat, 16 May 2026 13:36:48 +0000

What I Built
I built Hermes Sentry-Core, an autonomous, self-healing Site Reliability Engineering (SRE) and infrastructure optimization engine.

Traditional monitoring relies on passive alerting—when something breaks, a system pings a human on-call who then has to wake up, pull logs, diagnose the issue, and manually restart services. Hermes Sentry-Core flips this paradigm by acting as an automated "first-responder." Operating as a headless background daemon, it continuously monitors system health. When a failure occurs, it intercepts container error states, uses an LLM to cognitively parse and diagnose the root cause, executes targeted, low-risk remediation scripts (like cycling Docker containers), and finally broadcasts a concise, cryptographic-grade operational triage report straight to a Telegram channel.

It solves the problem of alert fatigue and reduces mean-time-to-recovery (MTTR) for known or transient infrastructure failures.

Demo

Code
https://github.com/dueprincipati/hermes-sentry-core

My Tech Stack
Agent Framework: Hermes Agent Framework
LLM Core: Anthropic Claude 3.5 Sonnet (Fallback: Claude 3 Haiku)
Infrastructure Management: Docker, Kubernetes (via kubectl)
Communication Gateway: Telegram Bot API
System Tools: standard POSIX utilities (grep, awk, cat, systemctl)

How I Used Hermes Agent
Hermes Agent was foundational in designing the secure, autonomous loop of this project. I leaned heavily on the following agentic capabilities:

Natural Language Cron Trigger Subsystem: I utilized the Hermes abstraction layer to create continuous, intelligent monitoring loops that watch system wellness vectors without writing complex bash watchdogs.
Deterministic Guardrails & Security: This is where Hermes truly shines for an SRE tool. I configured the agent with strict conservative_auto boundaries in config.json. By strictly allowing only non-destructive commands (docker, kubectl, grep, systemctl) and locking scope to parsing only the last 200 lines of logs, Hermes Agent allowed me to safely grant an LLM terminal access without fear of it executing dangerous operations (like rm or database wipes).
Headless Gateways: I used the Hermes gateway system to bypass traditional web UIs entirely, routing the agent's diagnostic Markdown reports directly into a Telegram control feed, perfectly mimicking how SRE teams already communicate. Hermes provided the exact balance of AI autonomy and rigid security boundaries necessary to build a self-healing infrastructure tool.

node.js

dueprincipati — Sat, 17 Jan 2026 05:17:30 +0000

Submission for the Neon Open Source Starter Kit Challenge: Ultimate Starter Kit

dueprincipati — Tue, 27 Aug 2024 17:19:29 +0000

This is a submission for the Neon Open Source Starter Kit Challenge : Ultimate Starter Kit

My Kit

Introducing NeonStack, the ultimate open source starter kit for building modern, scalable web applications. NeonStack combines the power of Next.js, TypeScript, Tailwind CSS, Prisma, and Postgres on Neon to provide a robust foundation for your next project.
Key features of NeonStack include:

Next.js for server-side rendering and API routes
TypeScript for type-safe code
Tailwind CSS for rapid UI development
Prisma as the ORM for database operations
Postgres on Neon for a scalable and efficient database solution
NextAuth.js for authentication
tRPC for end-to-end typesafe APIs
Zod for schema validation
Jest and React Testing Library for testing
ESLint and Prettier for code quality and formatting

NeonStack is designed to help developers quickly bootstrap their projects with a modern, maintainable architecture. Whether you're building a small side project or a large-scale application, NeonStack provides the tools and structure you need to succeed.

Link to Kit

You can find the NeonStack starter kit on GitHub: https://github.com/dueprincipati/neonstack
Our repository includes a comprehensive README with detailed instructions on how to set up and use the starter kit.

My Journey

When designing NeonStack, our goal was to create a starter kit that would empower developers to build modern, scalable applications with ease. We chose this particular stack for several reasons:

Next.js: We selected Next.js for its excellent developer experience, built-in API routes, and server-side rendering capabilities.
TypeScript: TypeScript adds type safety to our JavaScript code, catching errors early and improving overall code quality and maintainability.
Tailwind CSS: We chose Tailwind for its utility-first approach, which allows for rapid UI development and easy customization.
Prisma: Prisma's type-safe database access and migrations make it an excellent choice for working with databases in TypeScript.
Postgres on Neon: Neon's serverless Postgres offering is a game-changer for developers. It provides the power and reliability of Postgres with the scalability and ease of use of a cloud-native solution.
tRPC: We included tRPC to enable end-to-end typesafe APIs. This ensures that our frontend and backend are always in sync, reducing errors and improving developer productivity.

Throughout the process of building NeonStack, we learned several valuable lessons:

Integration is key: Ensuring that all these technologies work well together required careful consideration and testing.
Documentation matters: Clear and comprehensive documentation is crucial for an open source project.
Flexibility is important: While we've made opinionated choices in our stack, we've also ensured that developers can easily modify or extend the starter kit to suit their specific needs.
Performance considerations: We learned the importance of optimizing for performance from the start.
Community feedback is invaluable: We've set up the project to welcome contributions and feedback from the community.