I created Agentis Lux for the purposes of entering H0 Hackathon (Vercel + AWS Databases). #H0Hackathon See Agentis Lux's Devpost.com entry.
It started with a comment at a hackathon.
A you.com employee said the thing out loud: the web has a second audience now. When you ask ChatGPT or Perplexity a question, a retrieval agent fetches a page and reads its HTML to answer you. Not the laid-out site with the buttons and the hero image. The markup underneath. These agents arrive by the million, and many of them rely on the raw or minimally rendered HTML rather than running your JavaScript, so they often see far less of your page than a person does.
That comment sent me to build. My first answer to it was Hermes Clew, for the GitLab Duo Agent Platform Challenge. Hermes lived inside GitLab Duo Chat, no frontend, no database: a Python engine that scanned the HTML, JSX, and TSX files in a repo, scored them across six categories, and let an LLM reason over the findings. It proved the core idea. It also told developers how to fix things, lived inside one vendor's chat, and only worked on files in a repo.
Agentis Lux is what happened when I took that idea to the open web and rebuilt it with a different stance. Any live URL, not just repo files. Its own product on a real cloud architecture, not a chat window. And no fix suggestions, on purpose, where Hermes used to hand them out. Same six-category bones, a new body, a sharper philosophy. It scans your site and shows you what that second audience experiences when it tries to read it.
What it does
You paste a URL to Agentis Lux. You get a report. The report is written from the agent's point of view.
Not "this is broken." More like: "an agent landing on this page can't tell which element starts checkout, because it's a styled div and not a button."
It reports findings. It does not suggest fixes, and that is on purpose. I know what the agent sees, not what you should change. That is the whole value: visibility, and you decide what to do with it. Awareness, not judgment.
Six deterministic checks score the frontend out of 100: semantic HTML, form accessibility, ARIA, structured data, content in the HTML, and link and navigation. A parallel set of six API checks runs on the backend.
The one idea the architecture is built on
Structure is deterministic. Flavor is AI.
The checks and the scoring are pattern matching. No model touches the number. Same input, same score, every time. I only spend AI in two places where a regex can't help: a Bedrock call writes the one-line plain-language verdict, and a second Bedrock layer runs an agent simulation, reasoning about what a retrieval agent would experience on the page and reporting what it could and could not accomplish. Not an autonomous agent clicking around. A simulation of the experience.
The AI is constrained, not creative. Low temperature, capped tokens, and a system prompt that encodes the product's own rules: no fixes, no judgment words, no em dashes. The simulation returns structured JSON, and any finding it references is filtered against the deterministic findings, so the model can't invent something the math didn't catch. If it fails validation, it falls back to a template. Math for trust, and the AI is fenced into exactly the two jobs where judgment helps.
Math stays math, so you can trust the number. Language and judgment are where AI earns its place.
This sounds like a philosophy choice. It ended up being an economics choice that fell out of the architecture. The deterministic core runs at any scale for almost nothing, so the free tier can stay free. I only pay for model tokens on the sentence and the simulation, the two places a human reads. I didn't design that in a spreadsheet. It just dropped out of keeping the math and the AI in separate boxes.
Why DynamoDB, and how I used it
The hackathon stack is Vercel on the frontend and AWS on the back, with DynamoDB as the data layer. I wanted to use DynamoDB as a deliberate data model, not a key-value afterthought, because every access pattern in this product is a single key lookup. That is exactly what it is built for.
Five tables, each with one job:
- ScanCache, 15-minute TTL, keyed by a hash of the URL, dedupes repeat fetches and keeps Bedrock cost down.
- ScanResults, 24-hour TTL, keyed by an opaque id, anonymous, results that expire on their own.
- BenchmarkScans, the 50-site dataset, with a GSI on vertical, rewritten monthly by an EventBridge refresh.
- ScanCounters, server-side counts, no PII. Reserved for the team tier.
- Users, reserved for signed-in history. A stub.
Two of those are live on every scan, one holds the benchmark, and two are reserved stubs for later. Two TTLs, two lifetimes, two reasons. Per-vertical rollups use the GSI, not a second database. No joins, no migrations, no idle server.
The write on a live scan is fail-soft and async. The scan returns to you whether or not the write lands, and a failed write goes to CloudWatch instead of your screen. The scan result is the product. Persistence is a side effect.
(The product is Agentis Lux. The engine is Perseus Clew, part of my Clew suite, which is why the AWS tables carry the PerseusClew prefix.)
The bet I made in public
Before the engine scanned anything, I wrote down what I expected it to find across 50 sites and committed it to the repo with a timestamp. Predictions first, data later, so I couldn't move the goalposts.
Then I scanned ten sites each across e-commerce, SaaS, content and media, US government, and indie builder projects.
Indie builders won. Mean score 77 out of 100, ahead of government, SaaS, and e-commerce. The single highest score in the whole run was a personal developer portfolio at 91. Scores ran from 34 to 91. Four sites blocked the scan at the door, including OpenAI.
I missed three of my six predictions. That is the point of pre-registering them. If I had gone six for six you should distrust me, because it would mean I only predicted what I already knew. The misses are where I learned something: that craft beats compliance, that the API is the real blind spot, and that a hand-built personal site reads cleaner to an agent than most of the web's biggest companies.
The full dataset, including the sites that blocked me, is in the repo.
The gaps
Fetching arbitrary user-supplied URLs on a public endpoint is a security problem before it is a feature. The backend does full DNS resolution, blocks private and reserved IPs, validates every redirect hop, forces HTTPS, and caps size and time. That hardening took as long as some of the checks did.
Bedrock had to be allowed to fail. If the model is slow or errors, the report still renders, because the AI verdict has a deterministic template under it as a floor. The hero line never breaks, because the score under it was never AI in the first place.
And also: this is a solo build on a deadline. The backend is JavaScript, not TypeScript. The benchmark page serves a published snapshot instead of querying DynamoDB live. The results view still has heading-hierarchy work. All of it is written down in KNOWN-LIMITATIONS.md, as choices, with reasons. On a product whose whole thesis is readability, hiding the gaps would be the one move I could not make.
Where this sits next to the other tools
Scrunch, recently acquired by Sitecore, works on AI search visibility: whether your brand gets cited when someone asks an AI a question. That is about being found. Agentis Lux is about whether an agent can read and use what it finds. Visibility, not operability.
Google's experimental Agentic Browsing audit in Lighthouse (May 2026) checks the agent-as-actor surface: WebMCP and whether a browser-driving agent can operate your page. Agentis Lux goes deeper on the agent-as-reader surface, the raw HTML a retrieval agent forms an impression from before it ever acts. Different door.
The agentic web is new enough that Google only added experimental, unscored checks two months ago. That is not a reason this is unoriginal. It is evidence the lane is open.
What the tool says about itself
Agents are not one reader. They are a spectrum, from the retrieval crawler that never runs your JavaScript to the browser-driving agent that does. The interesting output is the gap between them, and that is where this goes next: live benchmark querying, score history, and a render mode that shows the delta between what a non-JS agent sees and what a JS-capable one sees.
The tool scans its own site and publishes the result. It went from 70 to 96 after I fixed what it found, with one finding still open and shown anyway. Because if I scrubbed my own site to a perfect 100, you would have every reason not to trust the number on yours.
Try it on your own site: agentislux.io. The code, the methodology, and the raw benchmark data are in the repo.
For your second audience.
Links
- Live: agentislux.io
- Demo video (2:57): youtube.com/watch?v=bv56_XB1E_c
- Code (Perseus Clew engine): github.com/earlgreyhot1701D/perseus-clew
- The earlier proof of concept, Hermes Clew: github.com/earlgreyhot1701D/hermes-clew
- H0 Hackathon: h01.devpost.com
- More from Clew Labs: earlgreyhot1701d.github.io/Clew-Labs
AI assisted. Human approved. Powered by NLP.




Top comments (0)