L. Cordero

Posted on Jun 28 • Edited on Jul 3

Can retrieval agents like ChatGPT and Perplexity read your website? Agentis Lux sees what they see.

#ai #aws #showdev #webdev

I created Agentis Lux for the purposes of entering H0 Hackathon (Vercel + AWS Databases). #H0Hackathon See Agentis Lux's Devpost.com entry.

It started with a comment at a hackathon.

A you.com employee said the thing out loud: the web has a second audience now. When you ask ChatGPT or Perplexity a question, a retrieval agent fetches a page and reads its HTML to answer you. Not the laid-out site with the buttons and the hero image. The markup underneath. These agents arrive by the million, and many of them rely on the raw or minimally rendered HTML rather than running your JavaScript, so they often see far less of your page than a person does.

That comment sent me to build. My first answer to it was Hermes Clew, for the GitLab Duo Agent Platform Challenge. Hermes lived inside GitLab Duo Chat, no frontend, no database: a Python engine that scanned the HTML, JSX, and TSX files in a repo, scored them across six categories, and let an LLM reason over the findings. It proved the core idea. It also told developers how to fix things, lived inside one vendor's chat, and only worked on files in a repo.

Agentis Lux is what happened when I took that idea to the open web and rebuilt it with a different stance. Any live URL, not just repo files. Its own product on a real cloud architecture, not a chat window. And no fix suggestions, on purpose, where Hermes used to hand them out. Same six-category bones, a new body, a sharper philosophy. It scans your site and shows you what that second audience experiences when it tries to read it.

What it does

You paste a URL to Agentis Lux. You get a report. The report is written from the agent's point of view.

Not "this is broken." More like: "an agent landing on this page can't tell which element starts checkout, because it's a styled div and not a button."

It reports findings. It does not suggest fixes, and that is on purpose. I know what the agent sees, not what you should change. That is the whole value: visibility, and you decide what to do with it. Awareness, not judgment.

Six deterministic checks score the frontend out of 100: semantic HTML, form accessibility, ARIA, structured data, content in the HTML, and link and navigation. A parallel set of six API checks runs on the backend.

The one idea the architecture is built on

Structure is deterministic. Flavor is AI.

The checks and the scoring are pattern matching. No model touches the number. Same input, same score, every time. I only spend AI in two places where a regex can't help: a Bedrock call writes the one-line plain-language verdict, and a second Bedrock layer runs an agent simulation, reasoning about what a retrieval agent would experience on the page and reporting what it could and could not accomplish. Not an autonomous agent clicking around. A simulation of the experience.

Vercel runs the entire frontend and the edge layer. The Next.js App Router app deploys to Vercel with the /api/scan route as a serverless proxy in front of the AWS backend, so the browser never talks to Lambda directly. Preview deployments on every push meant I could see each change live before it merged, which is most of how a solo builder keeps quality up without a QA team. The custom domain, HTTPS, and CDN were Vercel defaults I didn't have to think about, which kept my attention on the scan engine.

The AI is constrained, not creative. Low temperature, capped tokens, and a system prompt that encodes the product's own rules: no fixes, no judgment words, no em dashes. The simulation returns structured JSON, and any finding it references is filtered against the deterministic findings, so the model can't invent something the math didn't catch. If it fails validation, it falls back to a template. Math for trust, and the AI is fenced into exactly the two jobs where judgment helps.

Math stays math, so you can trust the number. Language and judgment are where AI earns its place.

This sounds like a philosophy choice. It ended up being an economics choice that fell out of the architecture. The deterministic core runs at any scale for almost nothing, so the free tier can stay free. I only pay for model tokens on the sentence and the simulation, the two places a human reads. I didn't design that in a spreadsheet. It just dropped out of keeping the math and the AI in separate boxes.

Why DynamoDB, and how I used it

The hackathon stack is Vercel on the frontend and AWS on the back, with DynamoDB as the data layer. I wanted to use DynamoDB as a deliberate data model, not a key-value afterthought, because every access pattern in this product is a single key lookup. That is exactly what it is built for.

Five tables, each with one job:

ScanCache, 15-minute TTL, keyed by a hash of the URL, dedupes repeat fetches and keeps Bedrock cost down.
ScanResults, 24-hour TTL, keyed by an opaque id, anonymous, results that expire on their own.
BenchmarkScans, the 50-site dataset, with a GSI on vertical, rewritten monthly by an EventBridge refresh.
ScanCounters, server-side counts, no PII. Reserved for the team tier.
Users, reserved for signed-in history. A stub.

Two of those are live on every scan, one holds the benchmark, and two are reserved stubs for later. Two TTLs, two lifetimes, two reasons. Per-vertical rollups use the GSI, not a second database. No joins, no migrations, no idle server.

The write on a live scan is fail-soft and async. The scan returns to you whether or not the write lands, and a failed write goes to CloudWatch instead of your screen. The scan result is the product. Persistence is a side effect.

(The product is Agentis Lux. The engine is Perseus Clew, part of my Clew suite, which is why the AWS tables carry the PerseusClew prefix.)

The bet I made in public

Before the engine scanned anything, I wrote down what I expected it to find across 50 sites and committed it to the repo with a timestamp. Predictions first, data later, so I couldn't move the goalposts.

Then I scanned ten sites each across e-commerce, SaaS, content and media, US government, and indie builder projects.

Indie builders won. Mean score 77 out of 100, ahead of government, SaaS, and e-commerce. The single highest score in the whole run was a personal developer portfolio at 91. Scores ran from 34 to 91. Four sites blocked the scan at the door, including OpenAI.

I missed three of my six predictions. That is the point of pre-registering them. If I had gone six for six you should distrust me, because it would mean I only predicted what I already knew. The misses are where I learned something: that craft beats compliance, that the API is the real blind spot, and that a hand-built personal site reads cleaner to an agent than most of the web's biggest companies.

The full dataset, including the sites that blocked me, is in the repo.

The gaps

Fetching arbitrary user-supplied URLs on a public endpoint is a security problem before it is a feature. The backend does full DNS resolution, blocks private and reserved IPs, validates every redirect hop, forces HTTPS, and caps size and time. That hardening took as long as some of the checks did.

Bedrock had to be allowed to fail. If the model is slow or errors, the report still renders, because the AI verdict has a deterministic template under it as a floor. The hero line never breaks, because the score under it was never AI in the first place.

And also: this is a solo build on a deadline. The backend is JavaScript, not TypeScript. The benchmark page serves a published snapshot instead of querying DynamoDB live. The results view still has heading-hierarchy work. All of it is written down in KNOWN-LIMITATIONS.md, as choices, with reasons. On a product whose whole thesis is readability, hiding the gaps would be the one move I could not make.

Where this sits next to the other tools

Scrunch, recently acquired by Sitecore, works on AI search visibility: whether your brand gets cited when someone asks an AI a question. That is about being found. Agentis Lux is about whether an agent can read and use what it finds. Visibility, not operability.

Google's experimental Agentic Browsing audit in Lighthouse (May 2026) checks the agent-as-actor surface: WebMCP and whether a browser-driving agent can operate your page. Agentis Lux goes deeper on the agent-as-reader surface, the raw HTML a retrieval agent forms an impression from before it ever acts. Different door.

The agentic web is new enough that Google only added experimental, unscored checks two months ago. That is not a reason this is unoriginal. It is evidence the lane is open.

What the tool says about itself

Agents are not one reader. They are a spectrum, from the retrieval crawler that never runs your JavaScript to the browser-driving agent that does. The interesting output is the gap between them, and that is where this goes next: live benchmark querying, score history, and a render mode that shows the delta between what a non-JS agent sees and what a JS-capable one sees.

The tool scans its own site and publishes the result. It went from 70 to 96 after I fixed what it found, with one finding still open and shown anyway. Because if I scrubbed my own site to a perfect 100, you would have every reason not to trust the number on yours.

Try it on your own site: agentislux.io. The code, the methodology, and the raw benchmark data are in the repo.

For your second audience.

Links

Live: agentislux.io
Demo video (2:57): youtube.com/watch?v=bv56_XB1E_c
Code (Perseus Clew engine): github.com/earlgreyhot1701D/perseus-clew
The earlier proof of concept, Hermes Clew: github.com/earlgreyhot1701D/hermes-clew
H0 Hackathon: h01.devpost.com

- More from Clew Labs: earlgreyhot1701d.github.io/Clew-Labs

AI assisted. Human approved. Powered by NLP.

Top comments (6)

Hossein Yazdi • Jun 30 • Edited

This is an interesting angle. I think a lot of people still optimize mainly for human visitors, but with AI search becoming more common, it's worth asking what retrieval agents actually see.

There are quite a few AI SEO & web analysis tools, but this one is looking at a different part of the problem, which makes it stand out. Curious to see how the benchmark evolves over time.

L. Cordero • Jul 1

Thanks, that's the distinction I care about too. The benchmark runs on a scheduled monthly refresh (EventBridge), and the table keys each scan by site and timestamp, so scores accumulate rather than overwrite. The data's building a time series. What I haven't built yet is the layer that reads that history and shows the trend, whether a vertical is getting more or less agent-readable as sites react to AI traffic. That's the next thing, and your comment helped surface how much the tracking matters, not just the refresh. Appreciate that. 💡👩🏽‍💻

Julian Neagu • Jul 1

this is the part most people miss , agents don’t see your UI, they see your DOM.
we’ve hit this same thing building AI scraping flows, JS-heavy sites just fall apart.
semantic HTML is slowly becoming a runtime requirement again, not just “good practice”.

L. Cordero • Jul 2

Exactly this. Back to basics. Glad it landed with someone who's seen it firsthand.

Mudassir Khan • Jul 5

the 'agent as reader' versus 'agent as actor' framing is exactly the thing i keep trying to explain to people who only know Lighthouse. two entirely different problem statements.

the indie builder result makes sense — hand crafted HTML tends to be purpose structured, not wrapper stacked. we've seen it on Next.js App Router: server rendered content shows up clean to retrieval agents, but anything in a client component that only exists post hydration is invisible to most crawlers. cold fetch is the real test.

did you find any correlation between structured data presence and how well the Bedrock simulation scored the page?

L. Cordero • Jul 5

Good question. I didn't cross-analyze those two, I have both scores per site but never correlated structured data against the simulation results. So I can't give you a real answer, only a guess, and I'd rather run it than guess. Adding it to the list!