DEV Community

Cover image for Don't Feed HTML to Your Agents
Jung Sungwoo
Jung Sungwoo

Posted on

Don't Feed HTML to Your Agents

Why Complex SaaS Needs a White Box Protocol for AI

Beyond UI Generation — where humans and AI communicate through meaning, not pixels


TL;DR — Give AI a White Box, Not a Black Box

Most AI agents interact with web apps through Black Box methods: consuming DOM dumps or screenshots, then guessing what to click.

But HTML was never designed for machines. From the AI's perspective, the DOM is noise where business logic is faintly buried.

This essay argues for a White Box approach:

Instead of making agents reverse-engineer the UI, expose a Semantic State Layer that reveals the application's structure, rules, state, and valid transitions directly.

This is not about replacing UI. It's about giving AI agents a proper interface — what I call an Intelligence Interface (II) — alongside the traditional User Interface.

This post introduces Manifesto, an open-source engine that implements this philosophy with a concrete protocol: @manifesto-io/*.

Manifesto Playground Demo

🎮 Try it yourself → Manifesto Playground


1. Black Box: The Current State of AI + Web Apps

Here's how most teams "add AI" to their web apps today:

  • Use LangChain, AutoGPT, or browser automation
  • Drive Playwright or Puppeteer
  • Dump the DOM or screenshot into the model
  • Hope it figures out what to click

This is the Black Box approach. The agent sees only the rendered surface and must infer everything else.

What's Wrong with DOM Dumps?

Consider a typical Material UI form field:

<div class="MuiFormControl-root css-1u3bzj6">
  <label class="MuiInputLabel-root">Product Name</label>
  <div class="MuiInputBase-root">
    <input aria-invalid="false" type="text" class="MuiInputBase-input" value="" />
  </div>
  <p class="MuiFormHelperText-root">This field is required.</p>
</div>
Enter fullscreen mode Exit fullscreen mode

From an agent's perspective:

Problem Impact
Token waste 90% of tokens are class names and wrappers
Missing constraints Is it required? What's the max length?
No dependencies Does this field depend on others?
No causality Submit is disabled — but why?

The agent is forced to guess. A CSS refactor breaks everything. A layout change confuses the model. The logic was never exposed — only its visual projection.

Signal < 10%. Noise > 90%.


2. White Box: Exposing the Application's Brain

The alternative is a White Box protocol.

Instead of showing HTML, the engine exposes a Semantic Snapshot — a structured representation of the application's internal state that agents can read directly.

{
  "topology": {
    "viewId": "product-create",
    "mode": "create",
    "sections": [
      { "id": "basic", "title": "Basic Info", "fields": ["name", "productType"] },
      { "id": "shipping", "title": "Shipping", "fields": ["shippingWeight"] }
    ]
  },
  "state": {
    "form": { "isValid": false, "isDirty": false },
    "fields": {
      "name": {
        "value": "",
        "meta": { "valid": false, "hidden": false, "disabled": false, "errors": ["Required"] }
      },
      "productType": {
        "value": "PHYSICAL",
        "meta": { "valid": true, "hidden": false, "disabled": false, "errors": [] }
      },
      "shippingWeight": {
        "value": null,
        "meta": { "valid": true, "hidden": false, "disabled": false, "errors": [] }
      }
    }
  },
  "constraints": {
    "name": { "required": true, "minLength": 2, "maxLength": 100 },
    "shippingWeight": { "min": 0, "max": 2000, "dependsOn": ["productType"] }
  },
  "interactions": [
    { "id": "updateField:name", "intent": "updateField", "target": "name", "available": true },
    { "id": "updateField:productType", "intent": "updateField", "target": "productType", "available": true },
    { "id": "submit", "intent": "submit", "available": false, "reason": "Name is required" }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Now the agent has:

  • Topology: Screen structure, sections, field hierarchy
  • State: Current values, validity, visibility, errors — per field
  • Constraints: Required, min/max, dependencies
  • Interactions: What actions are available, and why some are blocked

No guessing. No inference. The agent reads the application's brain directly.


3. A Real Use Case: "Where Do I Select the Week?"

🎮 See it in action: Manifesto Playground — try changing field values and watch the semantic state update in real-time.

Here's a scenario from a complex SaaS scheduling interface:

User: "I see a date picker, but where do I select which week?"

AI Chatbot: "The week selector only appears when you set frequency to 'Weekly'. Right now it's set to 'Daily'. Should I change it for you?"

For this to work, the AI needs to know:

  1. A field called weekSelector exists
  2. It's currently hidden
  3. It becomes visible when frequency === 'WEEKLY'
  4. The current value of frequency is 'DAILY'

No amount of DOM parsing gives you this reliably. But a Semantic Snapshot does:

{
  "fields": {
    "frequency": {
      "value": "DAILY",
      "meta": { "hidden": false }
    },
    "weekSelector": {
      "value": null,
      "meta": { "hidden": true },
      "visibleWhen": "frequency === 'WEEKLY'"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

The AI reads this and knows — without inference — exactly why the field is hidden and what would make it appear.


4. The Protocol Loop

Manifesto implements a continuous feedback loop between the engine and AI agents:

┌─────────────────────────────────────────────────────────────────────┐
│                                                                     │
│  [Context Injection] → [Reasoning] → [Action Dispatch] → [Delta]    │
│          ▲                                                  │       │
│          └─────────────── Continuous Snapshots ─────────────┘       │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Step by Step:

  1. Context Injection: Engine exports a Semantic Snapshot

    • Topology (sections, fields, hierarchy)
    • State (values, validity, visibility)
    • Constraints (what's blocked and why)
    • Interactions (available intents with reasons)
  2. Reasoning: Agent plans next action based on snapshot

  3. Action Dispatch: Agent calls abstract intents, not DOM events

    • updateField, submit, reset, validate
  4. Delta Feedback: Engine returns what changed

    • Not just "success" — the actual state diff
    • Agent learns causality: "I changed X, and Y became hidden"
  5. Loop continues with updated snapshot

This is fundamentally different from "click and hope." The agent operates on structured meaning with predictable feedback.


5. The API: Exploration and Execution

Manifesto exposes this protocol through @manifesto-io/ai:

Exploration Mode: "What can I do here?"

import { createInteroperabilitySession } from '@manifesto-io/ai'

const session = createInteroperabilitySession({
  runtime,       // FormRuntime instance
  viewSchema,    // View definition
  entitySchema,  // Entity definition
})

// Get current semantic snapshot
const snapshot = session.snapshot()

// snapshot.interactions tells the agent:
// - submit: available=false, reason="Name is required"
// - updateField:name: available=true
// - updateField:productType: available=true
Enter fullscreen mode Exit fullscreen mode

The agent now knows the current state and exactly what actions are valid.

Execution Mode: "Change it to digital"

const result = session.dispatch({
  type: 'updateField',
  fieldId: 'productType',
  value: 'DIGITAL',
})

if (result._tag === 'Ok') {
  const { snapshot, delta } = result.value

  // delta shows exactly what changed:
  // {
  //   fields: {
  //     productType: { value: 'DIGITAL' },
  //     shippingWeight: { hidden: true },
  //     fulfillmentType: { hidden: true }
  //   },
  //   interactions: {
  //     'updateField:shippingWeight': { available: false, reason: 'Field is hidden' }
  //   }
  // }
}
Enter fullscreen mode Exit fullscreen mode

The agent doesn't just get "success." It gets a delta showing the causal chain: changing productType to DIGITAL caused shippingWeight to become hidden.

LLM Tool Export

Convert the snapshot into OpenAI/Claude-compatible function definitions:

import { toToolDefinitions } from '@manifesto-io/ai'

const tools = toToolDefinitions(snapshot, { omitUnavailable: true })

// Returns JSON-Schema tool definitions:
// - updateField (with enum of available fields)
// - submit (if form is valid)
// - reset
// - validate
Enter fullscreen mode Exit fullscreen mode

This enables agents to interact with forms through standard function-calling interfaces.


6. Safety Rails: The Hallucination Firewall

The Problem with Black Box

When an agent manipulates DOM directly:

  • It can click anything, including elements it shouldn't
  • It can input invalid values
  • It can trigger actions outside its permission
  • Failures are silent or cryptic

Manifesto's Safety Rails

Hallucination Firewall: Every agent action is validated before execution.

const result = session.dispatch({
  type: 'updateField',
  fieldId: 'nonexistent',  // ❌ Unknown field
  value: 'test',
})

// result._tag === 'Err'
// result.error === 'Field not found: nonexistent'
// State unchanged — no side effects
Enter fullscreen mode Exit fullscreen mode

What gets rejected:

  • Unknown fields → Err
  • Type mismatches (string to number field) → Err
  • Hidden field updates → Err
  • Disabled field updates → Err
  • Unauthorized actions → Err

Atomic Rollback: On any failure, the previous snapshot remains intact. No partial mutations.

Deterministic Contracts: Same input + same state = same output. Agents can plan reliably.

This is capability-based access control for AI. The agent only sees and can only act on what's explicitly permitted.


7. The Schema Layer

The Semantic Snapshot is derived from a declarative schema. Here's how it looks:

Entity Schema (Domain truth)

import { entity, field, enumValue } from '@manifesto-io/schema'

export const productTypes = [
  enumValue('PHYSICAL', 'Physical Product'),
  enumValue('DIGITAL', 'Digital Product'),
] as const

export const productEntity = entity('product', 'Product', '1.0.0')
  .fields(
    field.string('name', 'Product Name')
      .required('Product name is required')
      .min(2).max(100)
      .build(),

    field.enum('productType', 'Product Type', productTypes)
      .required()
      .defaultValue('PHYSICAL')
      .build(),

    field.number('shippingWeight', 'Shipping Weight (kg)')
      .min(0).max(2000)
      .build(),
  )
  .build()
Enter fullscreen mode Exit fullscreen mode

View Schema (UI behavior)

import {
  view, section, layout, viewField,
  on, actions, $, fieldEquals,
} from '@manifesto-io/schema'

export const productCreateView = view('product-create', 'Create Product', '1.0.0')
  .entityRef('product')
  .mode('create')
  .sections(
    section('basic')
      .title('Basic Information')
      .layout(layout.grid(2, '1rem'))
      .fields(
        viewField.textInput('name', 'name')
          .label('Product Name')
          .build(),

        viewField.select('productType', 'productType')
          .label('Product Type')
          .reaction(
            on.change()
              .when(fieldEquals('productType', 'DIGITAL'))
              .do(
                actions.updateProp('shippingWeight', 'hidden', true)
              )
          )
          .reaction(
            on.change()
              .when(['!=', $.state('productType'), 'DIGITAL'])
              .do(
                actions.updateProp('shippingWeight', 'hidden', false)
              )
          )
          .build(),

        viewField.numberInput('shippingWeight', 'shippingWeight')
          .label('Shipping Weight (kg)')
          .dependsOn('productType')
          .props({ min: 0, max: 2000 })
          .build(),
      )
      .build(),
  )
  .build()
Enter fullscreen mode Exit fullscreen mode

The schema captures:

  • Dependencies: .dependsOn('productType')
  • Reactions: on.change().when(...).do(...)
  • Business rules: DIGITAL hides shipping fields

All of this is introspectable. The engine reads the schema, builds a DAG of dependencies, and exports the current state as a Semantic Snapshot.


8. Why Not Existing Tools?

Tool Strength Gap
XState State machines No UI semantics, no agent protocol
Zod Validation No field dependencies, no visibility rules
React Hook Form Form state Business logic buried in components
MCP Tool invocation No UI domain logic, no snapshot protocol

The missing piece is a layer that captures:

  • Why a field is hidden (not just that it is)
  • What conditions enable an action
  • How fields relate to each other
  • What changed after an action (delta feedback)

This is UI domain logic. None of the above expose it in a machine-readable protocol.

Manifesto fills that gap.


9. UI for Humans, II for Agents

For decades we've built User Interfaces:

  • Look good on screen
  • Feel responsive
  • Work across devices

That still matters. But it's no longer enough.

Software now needs both a UI for humans and an II — Intelligence Interface — for agents.

Layer Consumer Content
UI Humans Pixels, clicks, visual feedback
II Agents Semantic Snapshot, intent dispatch, delta feedback

Manifesto's architecture:

┌─────────────────────────────────────────────────────────────┐
│                    Schema Layer                             │
│  ┌─────────────┬─────────────┬─────────────────────────┐    │
│  │   Entity    │    View     │   Reactions & Rules     │    │
│  └─────────────┴─────────────┴─────────────────────────┘    │
├─────────────────────────────────────────────────────────────┤
│                    Engine (DAG Runtime)                     │
├───────────────────────┬─────────────────────────────────────┤
│      UI Renderer      │       AI Protocol (@manifesto/ai)   │
│   (React/Vue/etc)     │    Snapshot + Dispatch + Delta      │
└───────────────────────┴─────────────────────────────────────┘
        ↓                              ↓
      Humans                        Agents
Enter fullscreen mode Exit fullscreen mode

Define the schema once. Generate both UI and II from it.


10. What Is an "AI-Native Application"?

To me, an AI-native application has these properties:

  1. White Box, not Black Box — The engine exposes semantic state, not just rendered output

  2. UI is a projection — A visual representation of state, not the source of truth

  3. Agents interact with meaning — Through structured snapshots and intent dispatch

  4. Protocol over DOM — Actions are validated, deterministic, and return deltas

  5. Safety by design — Hallucination firewall, atomic rollback, capability-based access

This doesn't mean abandoning UI. It means recognizing that UI alone is insufficient when your users include both humans and machines.


The Road Ahead

We're at an inflection point.

For decades, software was built for human consumption. UI was the interface, and it was enough.

Now, AI agents are becoming first-class users. They don't need pixels and click events. They need:

  • Structured state
  • Explicit constraints
  • Causal feedback
  • Safe execution boundaries

The teams that build for this will have AI integrations that are:

  • Interpretable: Agents understand intent, not just surface
  • Deterministic: Same input, same output
  • Debuggable: Trace exactly what changed and why
  • Safe: Hallucinations rejected, not silently executed

The teams that don't will find their AI integrations perpetually fragile — dependent on screenshot parsing and prompt hacks that break with every redesign.


Conclusion

HTML is a great language for humans.

For AI, it's a noisy encoding of things it shouldn't have to reverse-engineer.

AI doesn't need your pixels. It needs your meaning.

That meaning should be exposed as a Semantic State Layer — a White Box protocol where agents can read state, dispatch intents, and receive causal feedback.

Manifesto is my attempt to build that layer.


GitHub: github.com/eggplantiny/manifesto-io

Playground: manifesto-io-playground.vercel.app

Package: @manifesto-io/* — The interoperability protocol for agents


Don't feed HTML to your agents.

Give them a White Box: state, intent, and semantics.

Top comments (0)