DEV Community: Christian Bromann

Build Better Agent UX: Streaming Progress, Status, and File Ops with LangChain

Christian Bromann — Thu, 15 Jan 2026 18:38:38 +0000

If you’ve built an agent UI before, you know the uncomfortable truth: most “progress” indicators are vibes.

A spinner means something is happening… probably.

But users don’t need theater — they need truth: what’s running right now, how far along it is, and what the agent is doing to their filesystem.

In this short video, I show how to stream custom, TypeScript-safe events from a LangGraph tool call directly into a React UI.

🎥 Video: https://youtu.be/3daSUNpWErQ

What you’ll build

A simple pattern:

1) Your tool emits events (progress/status/file ops) while it runs

2) The frontend subscribes and renders those events immediately

3) Type guards keep the UI logic safe and predictable

No polling loops. No guessing. No “thinking…” placeholders.

1) Emit typed custom events from a tool call

Inside the tool call, write custom events as the work progresses:

config.writer?.({
  type: "progress",
  id: analysisId,          // stable id => update in place
  step: steps[i].step,
  message: steps[i].message,
  progress: Math.round(((i + 1) / steps.length) * 100),
  totalSteps: steps.length,
  currentStep: i + 1,
  toolCall: config.toolCall,
} satisfies ProgressData);

This is the key shift: tools aren’t just functions — they’re event producers.

2) Receive those events in React

In the UI, pass a handler into the stream hook:

onCustomEvent: handleCustomEvent,

Now every event emitted by your tool arrives in the client as it happens.

3) Narrow event types and update UI state predictably

Treat incoming events as unknown, then narrow with type guards and update state maps keyed by id:

if (isProgressData(data)) { /* update progress */ }
else if (isStatusData(data)) { /* update status */ }
else if (isFileStatusData(data)) { /* update file ops */ }

This keeps the frontend stable:

progress updates in place
minimal re-renders
no stringly-typed event spaghetti

Why it matters (beyond “nice UI”)

When the UI reflects real execution:

users trust the agent more
debugging becomes dramatically easier
failures are understandable without digging through logs
you can build better UX: step indicators, timelines, file operation feeds, etc.

If you build something with this pattern, I’d love to see it — share a screenshot or link in the comments.

Giving a Chat App Operational Access to My Cloudflare Account with MCP

Christian Bromann — Thu, 18 Dec 2025 16:02:24 +0000

I recently built a small demo that changed how I think about agent tooling and platform integrations.

In short, I gave a chat-based application operational access to my entire Cloudflare account by connecting Cloudflare’s managed MCP servers to a LangChain agent powered by Claude.

No custom APIs.
No giant tool schemas.
No hand-written adapters.

Just MCP.

The idea

Most agent setups today treat tools as static. You define them upfront, wire them into prompts, and hope the model uses them correctly.

That works for small demos. It starts breaking down once you point an agent at a real platform like Cloudflare, which exposes hundreds of possible operations across many services.

MCP changes that model.

Instead of shipping a massive list of tools, the agent connects to MCP servers that describe available capabilities. The model can then discover tools dynamically at runtime and load only what it actually needs.

What I built

In the demo, I connect a LangChainJS agent to multiple Cloudflare MCP servers, including:

the Browser MCP to fetch and convert live web pages
the Docs MCP for up-to-date Cloudflare documentation
Cloudflare’s GraphQL MCP for analytics queries

Using Anthropic’s native MCP toolset and tool search, Claude explores the available tools, decides what to use, and executes them on demand.

From the outside, it just looks like a chat interface. Under the hood, that chat app has real operational access to my Cloudflare account.

Why this felt different

What stood out to me while building this is how little glue code was needed.

I didn’t have to:

model every API endpoint as a tool
maintain schemas as the platform evolves
preload hundreds of tools into the prompt

Instead, the agent discovers capabilities as it goes. Tools feel less like rigid contracts and more like something the agent can explore.

That’s a big shift if you’re thinking about agents interacting with real infrastructure, not just answering questions.

Why MCP matters

MCP is interesting not because it’s another protocol, but because it enables a different mental model:

platforms expose capabilities, not static tools
agents discover and load functionality dynamically
providers like Cloudflare can safely expose large surfaces without overwhelming the model

This feels much closer to how we’ll want agents to interact with production systems going forward.

The code and the video

I walk through the full setup, including the LangChain agent, MCP configuration, and dynamic tool discovery, in this video:

🎥 https://www.youtube.com/watch?v=n-Hw_K_GsOg
👉 https://github.com/christian-bromann/langchat

If you’re experimenting with MCP or thinking about giving agents access to real platforms, I’d love to hear what you’re building.

Automating a Browser with Anthropic’s Computer Use to Play Tic-Tac-Toe

Christian Bromann — Tue, 16 Dec 2025 18:29:45 +0000

For years, the “agent” story was mostly text → API calls → text. That works when software exposes clean endpoints, but the real world is full of:

Legacy UIs with no API
SaaS products where the API is incomplete or locked down
Workflows that span apps (browser + spreadsheet + admin UI)
Tasks where the UI is the source of truth (what’s visible, what’s enabled, what error banners appear)

Provider-native computer use tools are a response to that gap: they let a model operate software the same way a human does—by seeing the screen and performing input actions.

OpenAI frames this as a “Computer-Using Agent” capability aimed at controlling real interfaces and measuring progress on benchmarks like OSWorld (a sign they’re treating UI control as a first-class modality, not a hack) (OpenAI: Computer-Using Agent). Anthropic positions “computer use” as enabling Claude to interact with existing interfaces directly while highlighting operational safety concerns (e.g., isolate execution in a dedicated environment) (Anthropic computer use docs, Anthropic announcement).

Under the hood, the important idea is standardization:

Providers define a tool schema (action types, fields, image formats).
They train (and safety-tune) models to reliably emit that schema.
They enforce constraints (environment type, context handling) that make the loop workable in production.

That’s why these tools matter: you’re not just “running Selenium with an LLM”—you’re using a model/tool pair designed together as a control system.

What “computer use” enables at a technical level

Provider computer-use is basically a minimal OS/UI control API with three properties:

1) A perception channel grounded in pixels

The model can request a screenshot and interpret UI state: text, layout, icons, highlights, banners, disabled buttons, etc. This is the “state observation” step in a control loop.

2) A constrained action vocabulary

Instead of arbitrary code execution, the model emits actions like:

click / move / drag
type / keypress
scroll
wait
screenshot (again)

This constraint is good: fewer degrees of freedom means fewer unsafe/irreversible actions and more predictable orchestration.

3) Closed-loop autonomy

The model can iterate: observe → act → observe, handling uncertainty and recovery:

“Did my click land?”
“Did the UI change?”
“Do I need to wait for the next state?”

This is what makes “computer use” different from one-shot vision: it’s not just recognition; it’s interactive control.

How your Tic-Tac-Toe project leverages these tools (and what it demonstrates)

This demo is valuable because it isolates the core computer-use loop without lots of app complexity—and still exposes the hard parts.

1) The UI becomes the “API surface”

Your agent does not get a structured board array. It must infer the board from screenshots and interact via clicks. That’s the entire point of computer-use: operate systems where the UI is the interface.

To make that reliable, the project adds an important “agent affordance”: cell labels (TOP-LEFT, CENTER, …). This is a general pattern: if you want robust UI control, you design UI elements that are easy for vision models to anchor on (stable text, consistent placement, clear state cues).

2) You turn the model into a controller, not a narrator

The implementation forces an explicit loop:

Take screenshot
Choose a move
Click
Take screenshot to verify
Wait for opponent
Repeat

That “verify after action” step is the difference between a demo that “usually works” and one that can recover from inevitable UI mistakes.

3) You anchor termination to UI truth (critical for reliability)

Both prompts insist the agent must only end the game when it sees the on-screen banner (“Player X wins!”, “It’s a draw!”), not when it believes it has three in a row.

This is a broadly applicable safety/reliability pattern for computer-use:

Never end (or submit, pay, delete, send) based on internal inference alone
Require screen evidence for critical transitions

It reduces hallucinated “success” and makes runs auditable.

4) You surface real provider constraints (OpenAI truncation, Anthropic context bloat)

Provider-native tools come with operational requirements that show up immediately in multi-step UI loops:

OpenAI: your agent sets truncation: "auto" because OpenAI’s computer-use flow expects automatic truncation to keep long interactive sessions viable (OpenAI computer use guide). This is a concrete example of “provider tool != generic LLM call”; there are mode-specific runtime contracts.
Anthropic: your agent uses middleware to clear old tool uses (screenshots). That’s essentially context garbage collection—and it’s not optional in screenshot-heavy loops. Without pruning, you hit context limits or degrade performance as stale observations pile up.

This is one of the biggest “why computer-use is hard” lessons: the environment is unstructured, and the data (images) is heavy.

5) You demonstrate why providers add more than computer control: persistent memory

The Anthropic player adds a native memory tool and stores learnings as markdown (strategy, opponent patterns, mistakes). In practice, this turns a single-session controller into something that can:

review prior outcomes before starting
encode opponent-specific openings
avoid repeating mistakes across games

The demo’s memory files show exactly the value proposition: the agent loses once due to a missed threat, then blocks the same pattern next game. That’s a minimal but real example of “agent improvement” that’s hard to get from prompts alone.

Why this matters beyond Tic-Tac-Toe

This project is a good representation of where computer-use shines and where it bites:

Shines when you need to automate UI-only workflows quickly, without building bespoke integrations.
Bites because reliability depends on:
- UI stability and “readability”
- verification loops
- context management
- isolation/sandboxing (providers explicitly recommend this for safety) (Anthropic computer use docs)

In other words: computer-use is best understood as a systems discipline—a control loop combining model behavior, tool constraints, UI design, and runtime safeguards.

Thanks for reading!

Keep Your Apps Accessible and Your e2e Tests Stable With WebdriverIOs New Accessibility Selector

Christian Bromann — Mon, 05 Sep 2022 17:35:55 +0000

Fetching elements within e2e tests can sometimes be very hard. Complex CSS paths or arbitrary test ids make them either less readable or prone to failures. The disappointment we experience when our test fail is by far not comparable to a the bad experience people have when they need to use assistent devices like screen readers on applications build without accessibility in mind.

With the accessibility selector introduced in version v7.24.0 WebdriverIO now provides a powerful way to fetch various of elements containing a certain accessibility name. Rather than applying arbitrary data-testId properties to elements which won't be recognised by assistent devices, developers or QA engineers can now either apply a correct accessibility name to the element themselves or ask the development team to improve the accessibility so that writing tests becomes easier.

WebdriverIO internally uses a chain of xPath selector conditions to fetch the correct element. While the framework has no access to the accessibility tree of the browser, it can only guess the correct name here. As accessibility names are computed based on author supplied names and content names, WebdriverIO fetches an element based in a certain order:

First we try to find an element that has an aria-labelledBy or aria-describedBy property pointing to an element containing a valid id, e.g.:

   <h2 id="social">Social Media</h2>
   <nav aria-labelledBy="social">...</nav>

So we can fetch a certain link within our navigation via:

   await $('aria/Social Media').$('a=API').click()

Then we look for elements with a certain aria-label, e.g.:

   <button aria-label="close button">X</button>

Rather than using X to fetch the element or applying a test id property we can just do:

   await $('aria/close button').click()

Well defined HTML forms provide a label to every input element, e.g.:

   <label for="username">Username</label>
   <input id="username" type="text" />

Setting the value of the input can now be done via:

   await $('aria/Username').setValue('foobar')

Less ideal but still working are placeholder or aria-placeholder properties:

   <input placeholder="Your Username" type="text" />

Which can now be used to fetch elements as well:

   await $('aria/Your Username').setValue('foobar')

Furthermore if an image tag provides a certain alternative text, this can be used to query that element as well, e.g.:

   <img alt="A warm sommer night" src="..." />

Such an image can be now fetched via:

   await $('aria/A warm sommer night').getTagName() // outputs "img"

Lastly, if no proper accessibility name can be derived, it is computed by its accumulated text, e.g.:

   <h1>Welcome!</h1>

Such a heading tag can be now fetched via:

   await $('aria/Welcome!').getTagName() // outputs "h1"

As you can see, there are a variety of ways to define the accessibility name of an element. Many of the browser debugging tools provide handy accessibility features that help you to find the proper name of the element:

For more information check out the Chrome DevTools or Firefox Accessibility Inspector docs.

Accessibility is not only a powerful tool to create an inclusive web, it can also help you write stable and readable tests. While you should not go ahead and give every element an aria-label, this new selector can help you build web applications with accessibility in mind so that writing e2e tests for it later on will become much easier.

Thanks for reading!