Alechko

Posted on Dec 12, 2025

# Runtime Snapshots #7 — Inside SiFR: The Schema That Makes LLMs See Web UIs

#webdev #ai #sifr #devops

Raw HTML is noise. Screenshots burn tokens. Accessibility trees lose visual context.

So we built SiFR — a structured format that gives LLMs usable runtime UI context.

This post explains what's inside.

What is SiFR?

SiFR stands for Semantic Information for Representation.

(And yes — it's also meant to sound like "see far".)

SiFR is a JSON schema that captures the runtime state of a web page in a way that's:

Token-efficient (often 10–50× smaller than raw HTML on complex pages)
Semantically structured (models can reason over it without reconstructing the UI from markup)
Visually aware (preserves layout relationships without pixels)

It's not a scraper. It's not an accessibility tree.

It's a preprocessing layer that sits between the DOM and your AI — turning "what the browser rendered" into "what the model can reason about".

Why not just send HTML?

Let's use a real-world example: large e-commerce pages.

Raw HTML commonly contains:

deeply nested layout wrappers
duplicated markup for responsive layouts
client-side frameworks with non-semantic containers
hidden / disabled / off-screen elements that still exist in the DOM

So when you send HTML to an LLM, you're asking it to do two jobs:

reconstruct runtime UI state
then solve the task

That's where most failures happen.

Here's what a typical "find the button" path looks like in raw markup:

div > div > div > div > div > div > ... > button

With SiFR, the same interface becomes "structure first, then the important elements".

For example:

{
  "id": "btn042",
  "text": "Add to Cart",
  "actions": ["clickable"],
  "salience": "high",
  "cluster": "product-actions"
}

The LLM sees what it is, how it behaves, and which part of the page it belongs to — without reverse-engineering UI meaning from markup.

Anatomy of a SiFR Document

Every SiFR snapshot has five sections:

1) METADATA

Page-level context: URL, viewport size, capture timestamp, and capture stats.

{
  "url": "https://www.costco.com/...",
  "viewport": { "width": 1920, "height": 1080 },
  "stats": {
    "totalNodes": 2847,
    "salienceCounts": { "high": 12, "med": 89, "low": 2746 }
  }
}

This is the "frame" the model needs before it reads anything else.

2) NODES

The structural skeleton — hierarchy without heavy details.

Think of it as the page's table of contents: what regions exist, what contains what, and what the high-level UI shape is.

3) SUMMARY

High-level layout blocks. This is where SiFR becomes "structure-first".

{
  "layoutBlocks": [
    { "role": "header", "contains": ["logo", "nav", "search"] },
    { "role": "sidebar", "contains": ["filters", "categories"] },
    { "role": "main", "contains": ["product-grid"] }
  ]
}

Before the model sees thousands of elements, it already has the page skeleton:
header at top, sidebar on the side, main content in the center.

4) DETAILS

Element-specific data: selectors, text, runtime visibility, interaction state, and relevant computed info.

{
  "btn042": {
    "selector": "button.add-to-cart",
    "text": "Add to Cart",
    "actions": ["clickable"],
    "styles": { "visible": true, "disabled": false }
  }
}

This is where "runtime truth" matters: visible vs hidden, enabled vs disabled, actual text content, etc.

5) RELATIONS

Spatial relationships between important elements.

Not pixel coordinates — semantic positioning.

{
  "btn042": {
    "inside": "card-product-123",
    "below": "price-display",
    "rightOf": "quantity-selector"
  }
}

The model can reason: "the Add to Cart button is inside the product card, below the price" — without seeing a single pixel.

Key Concepts

Visual Salience

Not all nodes matter equally.

SiFR assigns salience so the model focuses on signal:

High: primary actions, main content, user inputs
Medium: secondary nav, supporting info
Low: wrappers, containers, decorative elements

This is one of the biggest reasons SiFR stays usable on very large pages.

Layout Block Summarization

Instead of listing 3000 elements immediately, SiFR begins with a map:

PAGE STRUCTURE:
├── Header (logo, nav, search, cart)
├── Sidebar (filters)
└── Main
    ├── Breadcrumbs
    ├── Product Grid (24 items)
    └── Pagination

Models don't "scan HTML". They build mental structure.
This gives them the structure up front.

Adaptive Complexity

A simple blog post doesn't need the same capture density as a complex dashboard.

SiFR adjusts automatically — more detail where it matters, less where it doesn't.

The goal is stable signal-to-noise, not maximal completeness.

Real Numbers

Here are representative examples from our internal benchmarks (token counts vary by capture options and page state):

Site	HTML Tokens	SiFR Tokens	Compression
Costco	~1,280,000	~24,000	~53×
Amazon	~600,000	~50,000	~12×

On complex pages, SiFR makes LLM workflows practical where raw HTML often doesn't fit in context.

Try It Yourself

SiFR is implemented in Element to LLM — a free browser extension:

Chrome Web Store (also works on Arc, Brave, Edge)
Firefox Add-ons

If you want to stress-test the format, try these two pages:

costco.com — a realistic, framework-heavy enterprise UI
arngren.net — extreme visual density and chaotic layout

Capture a snapshot and share:

what compression ratio did you get?
could your LLM reason about the structure?
did you find a site where SiFR struggles?

If it breaks — that's useful data. Seriously.

What SiFR Enables

With structured runtime UI context, LLMs can:

Debug layouts — paste JSON → spot z-index / visibility / layout issues
Generate selectors — Playwright/Cypress tests based on real DOM structure
Navigate autonomously — agents that understand "where to click" without screenshots
Recreate components — translate UI structure into React/Tailwind scaffolds

The Standard Question

We're actively developing SiFR as an open specification. Current version: v2.

The schema is strict and versioned, designed for automation pipelines — not just one-off prompt experiments.

If you're building LLM-powered UI tools, I'd love feedback on the format:

What feels missing?
What feels redundant?
What would make this more useful in your workflow?

Series Index

Links

Found a site that breaks SiFR? Drop it in the comments. That's the fastest way to improve the spec.

DEV Community