Jung Sungwoo

Posted on Dec 4

Don't Feed HTML to Your Agents

#agents #saas #ai #architecture

Why Complex SaaS Needs a White Box Protocol for AI

Beyond UI Generation — where humans and AI communicate through meaning, not pixels

TL;DR — Give AI a White Box, Not a Black Box

Most AI agents interact with web apps through Black Box methods: consuming DOM dumps or screenshots, then guessing what to click.

But HTML was never designed for machines. From the AI's perspective, the DOM is noise where business logic is faintly buried.

This essay argues for a White Box approach:

Instead of making agents reverse-engineer the UI, expose a Semantic State Layer that reveals the application's structure, rules, state, and valid transitions directly.

This is not about replacing UI. It's about giving AI agents a proper interface — what I call an Intelligence Interface (II) — alongside the traditional User Interface.

This post introduces Manifesto, an open-source engine that implements this philosophy with a concrete protocol: @manifesto-io/*.

🎮 Try it yourself → Manifesto Playground

1. Black Box: The Current State of AI + Web Apps

Here's how most teams "add AI" to their web apps today:

Use LangChain, AutoGPT, or browser automation
Drive Playwright or Puppeteer
Dump the DOM or screenshot into the model
Hope it figures out what to click

This is the Black Box approach. The agent sees only the rendered surface and must infer everything else.

What's Wrong with DOM Dumps?

Consider a typical Material UI form field:

<div class="MuiFormControl-root css-1u3bzj6">
  <label class="MuiInputLabel-root">Product Name</label>
  <div class="MuiInputBase-root">
    <input aria-invalid="false" type="text" class="MuiInputBase-input" value="" />
  </div>
  <p class="MuiFormHelperText-root">This field is required.</p>
</div>

From an agent's perspective:

Problem	Impact
Token waste	90% of tokens are class names and wrappers
Missing constraints	Is it required? What's the max length?
No dependencies	Does this field depend on others?
No causality	Submit is disabled — but why?

The agent is forced to guess. A CSS refactor breaks everything. A layout change confuses the model. The logic was never exposed — only its visual projection.

Signal < 10%. Noise > 90%.

2. White Box: Exposing the Application's Brain

The alternative is a White Box protocol.

Instead of showing HTML, the engine exposes a Semantic Snapshot — a structured representation of the application's internal state that agents can read directly.

{
  "topology": {
    "viewId": "product-create",
    "mode": "create",
    "sections": [
      { "id": "basic", "title": "Basic Info", "fields": ["name", "productType"] },
      { "id": "shipping", "title": "Shipping", "fields": ["shippingWeight"] }
    ]
  },
  "state": {
    "form": { "isValid": false, "isDirty": false },
    "fields": {
      "name": {
        "value": "",
        "meta": { "valid": false, "hidden": false, "disabled": false, "errors": ["Required"] }
      },
      "productType": {
        "value": "PHYSICAL",
        "meta": { "valid": true, "hidden": false, "disabled": false, "errors": [] }
      },
      "shippingWeight": {
        "value": null,
        "meta": { "valid": true, "hidden": false, "disabled": false, "errors": [] }
      }
    }
  },
  "constraints": {
    "name": { "required": true, "minLength": 2, "maxLength": 100 },
    "shippingWeight": { "min": 0, "max": 2000, "dependsOn": ["productType"] }
  },
  "interactions": [
    { "id": "updateField:name", "intent": "updateField", "target": "name", "available": true },
    { "id": "updateField:productType", "intent": "updateField", "target": "productType", "available": true },
    { "id": "submit", "intent": "submit", "available": false, "reason": "Name is required" }
  ]
}

Now the agent has:

Topology: Screen structure, sections, field hierarchy
State: Current values, validity, visibility, errors — per field
Constraints: Required, min/max, dependencies
Interactions: What actions are available, and why some are blocked

No guessing. No inference. The agent reads the application's brain directly.

3. A Real Use Case: "Where Do I Select the Week?"

🎮 See it in action: Manifesto Playground — try changing field values and watch the semantic state update in real-time.

Here's a scenario from a complex SaaS scheduling interface:

User: "I see a date picker, but where do I select which week?"

AI Chatbot: "The week selector only appears when you set frequency to 'Weekly'. Right now it's set to 'Daily'. Should I change it for you?"

For this to work, the AI needs to know:

A field called weekSelector exists
It's currently hidden
It becomes visible when frequency === 'WEEKLY'
The current value of frequency is 'DAILY'

No amount of DOM parsing gives you this reliably. But a Semantic Snapshot does:

{
  "fields": {
    "frequency": {
      "value": "DAILY",
      "meta": { "hidden": false }
    },
    "weekSelector": {
      "value": null,
      "meta": { "hidden": true },
      "visibleWhen": "frequency === 'WEEKLY'"
    }
  }
}

The AI reads this and knows — without inference — exactly why the field is hidden and what would make it appear.

4. The Protocol Loop

Manifesto implements a continuous feedback loop between the engine and AI agents:

┌─────────────────────────────────────────────────────────────────────┐
│                                                                     │
│  [Context Injection] → [Reasoning] → [Action Dispatch] → [Delta]    │
│          ▲                                                  │       │
│          └─────────────── Continuous Snapshots ─────────────┘       │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Step by Step:

Context Injection: Engine exports a Semantic Snapshot
- Topology (sections, fields, hierarchy)
- State (values, validity, visibility)
- Constraints (what's blocked and why)
- Interactions (available intents with reasons)
Reasoning: Agent plans next action based on snapshot
Action Dispatch: Agent calls abstract intents, not DOM events
- updateField, submit, reset, validate
Delta Feedback: Engine returns what changed
- Not just "success" — the actual state diff
- Agent learns causality: "I changed X, and Y became hidden"
Loop continues with updated snapshot

This is fundamentally different from "click and hope." The agent operates on structured meaning with predictable feedback.

5. The API: Exploration and Execution

Manifesto exposes this protocol through @manifesto-io/ai:

Exploration Mode: "What can I do here?"

import { createInteroperabilitySession } from '@manifesto-io/ai'

const session = createInteroperabilitySession({
  runtime,       // FormRuntime instance
  viewSchema,    // View definition
  entitySchema,  // Entity definition
})

// Get current semantic snapshot
const snapshot = session.snapshot()

// snapshot.interactions tells the agent:
// - submit: available=false, reason="Name is required"
// - updateField:name: available=true
// - updateField:productType: available=true

The agent now knows the current state and exactly what actions are valid.

Execution Mode: "Change it to digital"

const result = session.dispatch({
  type: 'updateField',
  fieldId: 'productType',
  value: 'DIGITAL',
})

if (result._tag === 'Ok') {
  const { snapshot, delta } = result.value

  // delta shows exactly what changed:
  // {
  //   fields: {
  //     productType: { value: 'DIGITAL' },
  //     shippingWeight: { hidden: true },
  //     fulfillmentType: { hidden: true }
  //   },
  //   interactions: {
  //     'updateField:shippingWeight': { available: false, reason: 'Field is hidden' }
  //   }
  // }
}

The agent doesn't just get "success." It gets a delta showing the causal chain: changing productType to DIGITAL caused shippingWeight to become hidden.

LLM Tool Export

Convert the snapshot into OpenAI/Claude-compatible function definitions:

import { toToolDefinitions } from '@manifesto-io/ai'

const tools = toToolDefinitions(snapshot, { omitUnavailable: true })

// Returns JSON-Schema tool definitions:
// - updateField (with enum of available fields)
// - submit (if form is valid)
// - reset
// - validate

This enables agents to interact with forms through standard function-calling interfaces.

6. Safety Rails: The Hallucination Firewall

The Problem with Black Box

When an agent manipulates DOM directly:

It can click anything, including elements it shouldn't
It can input invalid values
It can trigger actions outside its permission
Failures are silent or cryptic

Manifesto's Safety Rails

Hallucination Firewall: Every agent action is validated before execution.

const result = session.dispatch({
  type: 'updateField',
  fieldId: 'nonexistent',  // ❌ Unknown field
  value: 'test',
})

// result._tag === 'Err'
// result.error === 'Field not found: nonexistent'
// State unchanged — no side effects

What gets rejected:

Unknown fields → Err
Type mismatches (string to number field) → Err
Hidden field updates → Err
Disabled field updates → Err
Unauthorized actions → Err

Atomic Rollback: On any failure, the previous snapshot remains intact. No partial mutations.

Deterministic Contracts: Same input + same state = same output. Agents can plan reliably.

This is capability-based access control for AI. The agent only sees and can only act on what's explicitly permitted.

7. The Schema Layer

The Semantic Snapshot is derived from a declarative schema. Here's how it looks:

Entity Schema (Domain truth)

import { entity, field, enumValue } from '@manifesto-io/schema'

export const productTypes = [
  enumValue('PHYSICAL', 'Physical Product'),
  enumValue('DIGITAL', 'Digital Product'),
] as const

export const productEntity = entity('product', 'Product', '1.0.0')
  .fields(
    field.string('name', 'Product Name')
      .required('Product name is required')
      .min(2).max(100)
      .build(),

    field.enum('productType', 'Product Type', productTypes)
      .required()
      .defaultValue('PHYSICAL')
      .build(),

    field.number('shippingWeight', 'Shipping Weight (kg)')
      .min(0).max(2000)
      .build(),
  )
  .build()

View Schema (UI behavior)

import {
  view, section, layout, viewField,
  on, actions, $, fieldEquals,
} from '@manifesto-io/schema'

export const productCreateView = view('product-create', 'Create Product', '1.0.0')
  .entityRef('product')
  .mode('create')
  .sections(
    section('basic')
      .title('Basic Information')
      .layout(layout.grid(2, '1rem'))
      .fields(
        viewField.textInput('name', 'name')
          .label('Product Name')
          .build(),

        viewField.select('productType', 'productType')
          .label('Product Type')
          .reaction(
            on.change()
              .when(fieldEquals('productType', 'DIGITAL'))
              .do(
                actions.updateProp('shippingWeight', 'hidden', true)
              )
          )
          .reaction(
            on.change()
              .when(['!=', $.state('productType'), 'DIGITAL'])
              .do(
                actions.updateProp('shippingWeight', 'hidden', false)
              )
          )
          .build(),

        viewField.numberInput('shippingWeight', 'shippingWeight')
          .label('Shipping Weight (kg)')
          .dependsOn('productType')
          .props({ min: 0, max: 2000 })
          .build(),
      )
      .build(),
  )
  .build()

The schema captures:

Dependencies: .dependsOn('productType')
Reactions: on.change().when(...).do(...)
Business rules: DIGITAL hides shipping fields

All of this is introspectable. The engine reads the schema, builds a DAG of dependencies, and exports the current state as a Semantic Snapshot.

8. Why Not Existing Tools?

Tool	Strength	Gap
XState	State machines	No UI semantics, no agent protocol
Zod	Validation	No field dependencies, no visibility rules
React Hook Form	Form state	Business logic buried in components
MCP	Tool invocation	No UI domain logic, no snapshot protocol

The missing piece is a layer that captures:

Why a field is hidden (not just that it is)
What conditions enable an action
How fields relate to each other
What changed after an action (delta feedback)

This is UI domain logic. None of the above expose it in a machine-readable protocol.

Manifesto fills that gap.

9. UI for Humans, II for Agents

For decades we've built User Interfaces:

Look good on screen
Feel responsive
Work across devices

That still matters. But it's no longer enough.

Software now needs both a UI for humans and an II — Intelligence Interface — for agents.

Layer	Consumer	Content
UI	Humans	Pixels, clicks, visual feedback
II	Agents	Semantic Snapshot, intent dispatch, delta feedback

Manifesto's architecture:

┌─────────────────────────────────────────────────────────────┐
│                    Schema Layer                             │
│  ┌─────────────┬─────────────┬─────────────────────────┐    │
│  │   Entity    │    View     │   Reactions & Rules     │    │
│  └─────────────┴─────────────┴─────────────────────────┘    │
├─────────────────────────────────────────────────────────────┤
│                    Engine (DAG Runtime)                     │
├───────────────────────┬─────────────────────────────────────┤
│      UI Renderer      │       AI Protocol (@manifesto/ai)   │
│   (React/Vue/etc)     │    Snapshot + Dispatch + Delta      │
└───────────────────────┴─────────────────────────────────────┘
        ↓                              ↓
      Humans                        Agents

Define the schema once. Generate both UI and II from it.

10. What Is an "AI-Native Application"?

To me, an AI-native application has these properties:

White Box, not Black Box — The engine exposes semantic state, not just rendered output
UI is a projection — A visual representation of state, not the source of truth
Agents interact with meaning — Through structured snapshots and intent dispatch
Protocol over DOM — Actions are validated, deterministic, and return deltas
Safety by design — Hallucination firewall, atomic rollback, capability-based access

This doesn't mean abandoning UI. It means recognizing that UI alone is insufficient when your users include both humans and machines.

The Road Ahead

We're at an inflection point.

For decades, software was built for human consumption. UI was the interface, and it was enough.

Now, AI agents are becoming first-class users. They don't need pixels and click events. They need:

Structured state
Explicit constraints
Causal feedback
Safe execution boundaries

The teams that build for this will have AI integrations that are:

Interpretable: Agents understand intent, not just surface
Deterministic: Same input, same output
Debuggable: Trace exactly what changed and why
Safe: Hallucinations rejected, not silently executed

The teams that don't will find their AI integrations perpetually fragile — dependent on screenshot parsing and prompt hacks that break with every redesign.

Conclusion

HTML is a great language for humans.

For AI, it's a noisy encoding of things it shouldn't have to reverse-engineer.

AI doesn't need your pixels. It needs your meaning.

That meaning should be exposed as a Semantic State Layer — a White Box protocol where agents can read state, dispatch intents, and receive causal feedback.

Manifesto is my attempt to build that layer.

GitHub: github.com/eggplantiny/manifesto-io

Playground: manifesto-io-playground.vercel.app

Package: @manifesto-io/* — The interoperability protocol for agents

Don't feed HTML to your agents.

Give them a White Box: state, intent, and semantics.

DEV Community