Everbench: A document management system with Local Intelligence

Jordan Henderson — Mon, 25 May 2026 05:38:48 +0000

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

Everbench

For more background, here's a link to an extended blog post.

Everbench is a low-cost, efficient document research platform for those concerned about privacy.

I've been working on a project I called Everknown. It would be an Open Source Evernote replacement. Lately, I've stalled on that, having discovered a commercial service that had most of what I wanted, but I have the bones of the app developed and, when I saw this challenge, I decided to put together a small version of Everknown for link/bookmark management and page summarization functions, two of the things that Everknown was going to do for me.

Thus, the odd name Everbench. It's a "workbench" for Everknown. It has a very simple architecture, but that's good! Small, composable software that can be modified easily to fit needs.

Everbench conveniently captures web pages, efficiently converts them to Markdown for storage in an Obsidian Vault, creating a summary and tags for categorization. It uses an efficient HTML->MD conversion written in C with a Gemma 4 quality gate to check if the conversion was successful. Some pages can't be converted (paywalls, login walls, empty SPA shells, mostly-navigation pages, etc.) but Gemma 4 can quickly determine that and characterize the failures. I've found that if the conversion fails, the page has serious problems.

Using a deterministic C parser isn't just about extraction quality; it's also a small security boundary. The parser strips <script>, <style>,
<noscript>, and CSS-hidden content before anything reaches Gemma 4, so the model never sees what the page is hiding from the user. Feeding raw HTML to an LLM is an open invitation for prompt injection via hidden divs, alt text, JavaScript-emitted content, or whatever the next clever trick happens to be. Prompt injection can be a significant challenge, but we have a place here in processing to insert heuristics to actively guard against it. Gumbo lets me reason about what crosses that boundary.

Demo

Here's a link to the video walkthrough

Code

Everbench

How I Used Gemma 4

Gemma 4 is used for document summarization and categorization, but the novel use of it is as a quality gate for the output from the Gumbo C HTML parser.

The prompt given to Gemma 4 is currently:

You are evaluating whether a web page was successfully extracted into readable Markdown.

URL: <captured url>
Title: <captured title>

Extracted Markdown (first 2000 chars):
<extracted markdown>

Decide if this extraction is GOOD or BAD.

GOOD means: the main article content is present and readable.
BAD means: the content is mostly navigation, advertising, login walls, JavaScript placeholders, or otherwise unusable as a reference.

Respond in exactly this format:
VERDICT: GOOD|BAD
REASON: <one short sentence>

The model is not the processing pipeline; it is the judge inside the pipeline.

In Everknown, I had intended to use local and cloud models for LLM work, interchangeably, configured where it made sense. In Everbench, I just needed an LLM that could categorize (via tags) and summarize documents well. I found Gemma-4-26B-E4B to be excellent at that. The smaller models didn't do a very good job at some of the things I needed an LLM for, and 31B was too slow and not notably better.

Locally, I can only run one model at a time and I'm hoping that Gemma-4-26B-E4B with its MoE architecture will work out as a good general-purpose local model that I might be able to get some agentic tool-using work out of as I expand projects.

An Introduction, and my Gemma 4 Challenge Submission

Jordan Henderson — Sun, 24 May 2026 15:16:21 +0000

Introduction

This is not my Gemma 4 challenge submission. This is my first blog post and a pointer to my Gemma 4 challenge submission.

I wanted to rant freely about background and philosophy and not tire the contest judges.

Submission Link
Video Link
Repo Link

About Me

First, a little about me. I’m an old programmer. I learned programming in Pascal, Fortran 77, and Lisp back in the late 70s. I got involved in C and Unix early and then spent years doing VMS (later named OpenVMS) and Unix in control-system software.

In the intervening years, I've used SNOBOL4, C++, Java, JS, Perl, Tcl, Lua, and probably a number of other languages that don't immediately come to mind.

I've seen a lot of fads and trends pass.

Churn, Objects, and Interfaces

One thing that has disturbed me is all the churn in the software landscape. Just when a language or technology starts to get traction, when the tooling gets really good and it seems like progress can be made, it gets supplanted by the latest thing that offers not much more than novelty.

I think software vendors and academia are somewhat to blame here. Vendors make more money on new systems and academics can't publish on established technologies.

Open source has helped counteract this. Commercial software often has to sell the next migration. Academic work often has to emphasize novelty. Open source can preserve useful work long after the fashion cycle has moved on.

A good open-source library can be improved, audited, ported, wrapped, and reused for decades. That matters to me. Some of the most valuable software in the world is not glamorous. It is boring, portable, well-tested code that sits underneath everything else.

That is one reason I keep coming back to C libraries. A good C library is often boring in the best sense. It does one job, has been used for years, and can be wrapped without dragging in a whole new world.

I'll say it out loud: object-oriented programming as the only paradigm is oversold. Everything does not need to be modeled as an object. Object-oriented languages may treat numbers, strings, and functions as objects, and that can make for pleasant syntax. But pleasant syntax is not the same thing as a good system model.

Object-oriented languages have their place. They grew partly out of simulation work, and simulations are a natural home for them.

I won't deny some good things have come from the obsession with object orientation: It has led to a deep commitment to abstraction and encapsulation. It has also led to massive brittle hierarchies of method invocations and factories that obscure solutions. A deep commitment to abstraction is what we need. Deep object hierarchies that are only very narrowly useful are not what we need.

I prefer abstract data types, ADTs, over classical object hierarchies. I like Rob Pike's software design principles (https://www.lysator.liu.se/c/pikestyle.html).
Here we read:

"I argue that clear use of function pointers is the heart of object-oriented programming. Given a set of operations you want to perform on data, and a set of data types you want to respond to those operations, the easiest way to put the program together is with a group of function pointers for each type. This, in a nutshell, defines class and method. The OO languages give you more of course - prettier syntax, derived types and so on - but conceptually they provide little extra."

That preference also fits Tony Hoare’s maxim:

"There are two methods in software design. One is to make the program so simple, there are obviously no errors. The other is to make it so complicated, there are no obvious errors."

APIs are a good direction in software. They generally implement ADTs in one way or another, so that's a good thing.

Why Scheme and C

Lately, I've been working with Agents in hobbyist programming, mostly Claude, but I've also used codex, Gemini and even OpenCode with various LLMs backing them.

I've always been attracted to the simplicity of Lisp-like languages and Scheme. They are excellent for programming with ADTs.

Legacy Software is a Great Good.

I know "legacy" is often used as a pejorative. But software that has survived real use, real bugs, real ports, real users, and real abuse has earned a kind of trust that new software has not earned yet. That is one reason I keep coming back to proven Open Source C libraries.

Putting this all together, I’ve been building a platform of Scheme with C FFI (Foreign Function Interfaces) into simple composable software. I chose Chibi-Scheme for the high-level language and settled on libraries that are compatible with C11 as a minimum standard there.

The Opportunity with Agent-Assisted Programming

I'm impressed by the abilities of LLMs. They code well enough and handle so many of the chores of programming that often bog down development. I think I make great progress toward goals using agent assistants.

The Chrome extension in this project is a small concrete example. MV3,
~150 lines of JavaScript, hotkey capture and a JSON POST to the local server — I one-shotted it with Claude. It took longer to test than to
build. That kind of leverage on a small, well-defined component is where
I think these tools shine. Not "build me a system," but "build me this
sharp tool that does one thing." Everbench is several of those stitched
together.

I'm not a believer in the imminent arrival of the Singularity or that LLMs will replace developers. LLMs don't want anything and goals are a bad substitute for desires. I believe humans will be in the loop creating with these new tools. If they aren't, we're obsolete anyway. Either way, we may as well act like humans still matter.

I see the opportunity with these tools is not creating grander and more elaborate software, but rather to simplify, to create powerful tools for specific purposes, run endless tests and learning loops, finding what works best and modifying code to improve it along every desired axis.

I hope there's a hint of that in my submission.

Everknown and Everbench

Mostly, I've just tooled around, first trying this thing and then that. I started working on a project I called Everknown. It would be an Open Source Evernote replacement. Lately, I've stalled on that, having discovered a commercial service that had most of what I wanted, but I have the bones of the app developed and, when I saw this challenge, I decided to put together a small version of Everknown for link/bookmark management and page summarization functions, two of the things that Everknown was going to do for me.

Thus, the odd name Everbench. It's a "workbench" for Everknown. It has a very simple architecture, but that's good! Small, composable software that can be modified easily to fit needs.

Gemma 4

In Everknown, I had intended to use local and cloud models for LLM work, interchangeably, configured where it made sense. Here, I just needed an LLM that could categorize (via tags) and summarize documents well. I found Gemma-4-26B-E4B to be excellent at that. The smaller models didn't do a very good job at some of the things I needed an LLM for, and 31B was too slow and not notably better.

Everbench Architecture

Below is a simple overview of the Everbench architecture.

It's all very simple, composed of small pieces I've built on top of well-established C libraries.

   Chrome extension          Local HTTP server        SQLite queue
   +--------------+          +--------------+         +--------------+
   | Ctrl+Shift+E | -------> |  /capture    | ------> |   queue      |
   | grabs HTML   |  POST    |  /export/:id |         |  (pending /  |
   +--------------+          |  /retry/:id  |         |   done /     |
                             |  review UI   | <------ |   failed)    |
                             +--------------+  reads  +--------------+
                                                            |
                                                            | drains
                                                            v
                              +-----------------------------------------+
                              |             Worker daemon               |
                              |                                         |
                              |   Gumbo_MD -> Markdown                  |
                              |   Gemma 4   - quality gate (GOOD / BAD) |
                              |   Gemma 4   - summarize                 |
                              |   Gemma 4   - tag                       |
                              +-----------------------------------------+
                                                            |
                                                            | approved
                                                            v
                                                  +------------------+
                                                  |  Obsidian vault  |
                                                  |  *.md + YAML     |
                                                  +------------------+

Chrome Extension

MV3, ~150 lines of JavaScript. Ctrl+Shift+E grabs the active tab's HTML and POSTs {url, title, html} to the local server. One-shotted with Claude (see above).

Bloschi

Bloschi is my port of Blosxom, a very old, very simple blogging platform built on files, not databases, but supporting page skins. For this project, I added some POST endpoints that the Scheme processing loop of Bloschi handles.

SQLite interfaces

SQLite is a prime example of the legacy of powerful C code that is very well-tested and reliable. It is now the most popular DB software in the world, used in literally millions of desktop and mobile applications.

Note that in the LLM agent programming world, if you find you need a different RDBMS, it is typically a few hours' work with an agent to drop in PostgreSQL or something else.

Gumbo

I believe Gumbo is one of the best C libraries for HTML parsing. I've created a Scheme FFI to Gumbo and also modified the C core to perform robust HTML-to-Markdown conversion. I call this piece Gumbo_MD.

In developing this app, I reviewed many web pages and refined Gumbo_MD to perform better extractions. I believe that a goal-directed loop could be set up with agentic programming to refine this further, but at this point, Gumbo_MD appears to do a great job.

Quality Gate

This is an important use of Gemma 4 in this project. I could have had Gemma 4 produce the Markdown, but would it have been any good, and would I have had to employ a more powerful model to judge it? By producing Markdown deterministically with Gumbo_MD and only calling on the model to judge the result, I believe I've produced a sharp tool in Gumbo_MD and saved work.

Summarize and Tagging

Gemma-4-26B-E4B does an excellent job summarizing articles and generating tags. I mentioned this above, but it bears repeating: Gemma 4 31B did a good job here, but not notably better, and it was much slower.

Even if you can't run Gemma-4-26B-E4B on your own hardware, models in this class should be inexpensive to use through inference providers. For the document conversions I'm doing, the API cost should be a fraction of a penny per article. These models will probably keep getting cheaper and more capable.

If you need local inference for privacy and control, the hardware to support this is increasingly in reach.

The three Gemma 4 calls per page are:

Quality gate: Is this actually an article, or is it ads, a login wall, an empty SPA shell, or otherwise unusable? It returns GOOD or BAD with a specific reason.
Summarize: Distill the page into a short paragraph.
Tag: Generate three to five specific topical tags.

Approved entries get a button to export to an Obsidian vault as Markdown with YAML frontmatter. Failed entries show why they were rejected, so I can see when the gate is working and when the extractor needs tuning.

Obsidian was a natural target because the notes are plain Markdown files. I wanted Everbench to produce something I could keep, search, edit, and move around without trapping the result inside another application.

There is interesting work going on around hooking Obsidian into agentic workflows, and I hope to take advantage of that later.

DEV Community: Jordan Henderson

Everbench: A document management system with Local Intelligence

What I Built

Everbench

Demo

Code

How I Used Gemma 4

An Introduction, and my Gemma 4 Challenge Submission

Introduction

About Me

Churn, Objects, and Interfaces

Why Scheme and C

The Opportunity with Agent-Assisted Programming

Everknown and Everbench

Gemma 4

Everbench Architecture

Chrome Extension

Bloschi

SQLite interfaces

Gumbo

Quality Gate

Summarize and Tagging