DEV Community

Cover image for Everbench: A document management system with Local Intelligence
Jordan Henderson
Jordan Henderson

Posted on

Everbench: A document management system with Local Intelligence

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

Everbench

For more background, here's a link to an extended blog post.

Everbench is a low-cost, efficient document research platform for those concerned about privacy.

I've been working on a project I called Everknown. It would be an Open Source Evernote replacement. Lately, I've stalled on that, having discovered a commercial service that had most of what I wanted, but I have the bones of the app developed and, when I saw this challenge, I decided to put together a small version of Everknown for link/bookmark management and page summarization functions, two of the things that Everknown was going to do for me.

Thus, the odd name Everbench. It's a "workbench" for Everknown. It has a very simple architecture, but that's good! Small, composable software that can be modified easily to fit needs.

Everbench conveniently captures web pages, efficiently converts them to Markdown for storage in an Obsidian Vault, creating a summary and tags for categorization. It uses an efficient HTML->MD conversion written in C with a Gemma 4 quality gate to check if the conversion was successful. Some pages can't be converted (paywalls, login walls, empty SPA shells, mostly-navigation pages, etc.) but Gemma 4 can quickly determine that and characterize the failures. I've found that if the conversion fails, the page has serious problems.

Using a deterministic C parser isn't just about extraction quality; it's also a small security boundary. The parser strips <script>, <style>,
<noscript>, and CSS-hidden content before anything reaches Gemma 4, so the model never sees what the page is hiding from the user. Feeding raw HTML to an LLM is an open invitation for prompt injection via hidden divs, alt text, JavaScript-emitted content, or whatever the next clever trick happens to be. Prompt injection can be a significant challenge, but we have a place here in processing to insert heuristics to actively guard against it. Gumbo lets me reason about what crosses that boundary.

Demo

<!-- Embed a video walkthrough or share a link to your deployed project. --> Here's a link to the video walkthrough

Code

Everbench

How I Used Gemma 4

Gemma 4 is used for document summarization and categorization, but the novel use of it is as a quality gate for the output from the Gumbo C HTML parser.

The prompt given to Gemma 4 is currently:

You are evaluating whether a web page was successfully extracted into readable Markdown.

URL: <captured url>
Title: <captured title>

Extracted Markdown (first 2000 chars):
<extracted markdown>

Decide if this extraction is GOOD or BAD.

GOOD means: the main article content is present and readable.
BAD means: the content is mostly navigation, advertising, login walls, JavaScript placeholders, or otherwise unusable as a reference.

Respond in exactly this format:
VERDICT: GOOD|BAD
REASON: <one short sentence>
Enter fullscreen mode Exit fullscreen mode

The model is not the processing pipeline; it is the judge inside the pipeline.

In Everknown, I had intended to use local and cloud models for LLM work, interchangeably, configured where it made sense. In Everbench, I just needed an LLM that could categorize (via tags) and summarize documents well. I found Gemma-4-26B-E4B to be excellent at that. The smaller models didn't do a very good job at some of the things I needed an LLM for, and 31B was too slow and not notably better.

Locally, I can only run one model at a time and I'm hoping that Gemma-4-26B-E4B with its MoE architecture will work out as a good general-purpose local model that I might be able to get some agentic tool-using work out of as I expand projects.

Top comments (0)