vishalmysore

Posted on Jun 22

RecursiveMAS WebLLM: A Browser-Native Runtime for Latent-State Multi-Agent Reasoning

#ai #llm #webdev #agents

Recursive Multi-Agent Systems (RecursiveMAS) reframes multi-agent collaboration as a unified latent-space recursive computation, where heterogeneous agents exchange hidden states through lightweight RecursiveLink modules instead of text-only prompts. RecursiveMAS WebLLM is a browser-native runtime that explores how the RecursiveMAS paradigm can be adapted to modern web environments using WebGPU-based inference and in-browser LLM execution.

Existing browser LLM runtimes such as WebLLM are optimized for local inference and hardware acceleration, but they primarily expose token-level outputs rather than a direct latent-state communication path between agents. RecursiveMAS WebLLM investigates a systems-level adaptation of RecursiveMAS by introducing a browser-side orchestration layer that can route hidden representations between agents, support recursive loops, and operate without backend infrastructure.

The goal of this work is not to propose RecursiveMAS itself, but to explore how a RecursiveMAS-style architecture can be implemented in the browser for privacy-preserving, local-first, and decentralized AI experimentation.

Demo https://vishalmysore.github.io/recursiveMASDemo
Code https://github.com/vishalmysore/recursiveMASDemo
Model Code https://github.com/vishalmysore/recursiveMASWebLLM
Model Weights https://huggingface.co/VishalMysore/RecursiveMAS-0.5B-MLC/

1. Introduction

Large language models are increasingly used as building blocks in multi-agent systems, where multiple specialized agents collaborate to solve complex tasks. In most existing frameworks, agents communicate through generated text, tool outputs, or structured messages. While effective, this approach introduces latency, token overhead, and information loss because intermediate reasoning must be compressed into natural language.

RecursiveMAS proposes a different view: instead of passing text between agents, the system treats collaboration as a latent-space recursive process. Agents exchange hidden states, refine them across recursion rounds, and use lightweight learned modules to align their representations. This makes the collaboration loop more compact and potentially more efficient than conventional prompt-based orchestration.

At the same time, browser-native inference has matured significantly. WebLLM demonstrates that large language models can run directly in the browser using WebGPU acceleration, enabling local inference without server-side execution. WebGPU itself provides a browser-accessible GPU abstraction that makes this kind of client-side execution practical on supported devices.

This creates an interesting systems question: can RecursiveMAS-style latent collaboration be brought into the browser?

RecursiveMAS WebLLM explores that question by designing a browser-native runtime for recursive multi-agent reasoning. The system focuses on:

hidden-state routing between agents,
browser-side orchestration of recursive loops,
local-first execution with no backend dependency.

2. Background

2.1 RecursiveMAS

The RecursiveMAS paper introduces a multi-agent framework that extends recursion from single-model reasoning to the agent collaboration level. Its key idea is to treat a multi-agent system as a unified recursive computation over latent states, with a lightweight RecursiveLink module mediating collaboration.

According to the paper, this architecture can improve efficiency over standard text-based multi-agent systems and reports gains in accuracy, speed, and token usage reduction.

2.2 Browser-Native LLM Inference

WebLLM is a high-performance in-browser inference engine that uses WebGPU for hardware acceleration and supports local execution of language models directly in the browser. WebGPU is the web standard that exposes GPU access through browser APIs such as navigator.gpu and GPUDevice, making it possible to perform compute-heavy workloads on the client side.

Browser-native inference offers several benefits:

lower deployment friction,
stronger privacy,
reduced backend cost,
fully local execution.

However, most browser LLM runtimes still expose the model primarily as a token generator. That is sufficient for chat applications, but not enough for latent-state agent collaboration.

2.3 Why Latent States Matter

Text is a compressed interface. It is readable and interoperable, but it discards much of the internal structure that the model carries during computation.

Hidden states preserve richer intermediate representations, including semantic abstractions and contextual structure. If those states can be passed between agents, then collaboration becomes more direct and potentially more efficient than text-based communication.

That is the core motivation behind this work. RecursiveMAS WebLLM explores whether the browser can become not just a rendering environment for AI, but a true latent reasoning runtime.

3. Problem Statement

Current browser-based LLM runtimes are optimized for:

prompt input,
token generation,
client-side inference.

They are not designed for:

direct hidden-state extraction,
latent-state injection,
agent-to-agent communication in latent space.

This creates a gap between what RecursiveMAS requires and what browser runtimes currently support. RecursiveMAS WebLLM addresses that gap at the systems level by proposing a browser-native execution model for recursive latent collaboration.

4. System Overview

RecursiveMAS WebLLM is organized into three major components:

4.1 WebLLM Runtime Layer

This layer provides the base browser inference engine. It is responsible for:

loading the model,
executing WebGPU-backed inference,
exposing runtime hooks for latent-state access.

4.2 RecursiveLink Adapter

RecursiveLink is the latent transformation layer between agents. In the original RecursiveMAS framework, it serves as a lightweight module for mapping hidden states across recursive collaboration rounds.

In this browser-native adaptation, RecursiveLink acts as the bridge between agent representations inside the JavaScript orchestration layer.

4.3 Browser Orchestration Layer

This layer manages:

agent scheduling,
recursive execution,
hidden-state routing,
loop control.

All of this runs entirely inside the browser, which removes the need for a server, cloud GPU, or backend inference service.

5. Architecture

The architecture treats the browser as a recursive execution environment. Agents produce hidden states, the orchestration layer routes them, and RecursiveLink transforms them for the next agent or recursion round.

A browser-native architecture of this kind emphasizes:

hidden-state routing,
low-latency recursive flow control,
browser-local tensor transformation,
final decode only at output time.

6. Latent-State Interface

A browser-native RecursiveMAS implementation needs two core capabilities:

Hidden-state extraction, so the runtime can expose the internal representation of an agent step.
Hidden-state injection, so another agent can receive a transformed latent representation instead of text.

A conceptual API might look like this:

const hA = await agentA.getHiddenState();
const hMapped = recursiveLink.forward(hA);
await agentB.injectHiddenState(hMapped);
const output = await agentB.generate();

This is the key difference from prompt-based multi-agent orchestration. Communication happens through latent tensors rather than serialized text.

7. RecursiveLink in the Browser

RecursiveLink is the component that makes latent collaboration workable. In the RecursiveMAS paper, RecursiveLink is used to align agent representations and support recursive state transfer across heterogeneous models.

In a browser-native setting, the same idea becomes a practical adapter that can stabilize the transfer of hidden states between in-browser agents.

A browser-friendly RecursiveLink should aim to:

normalize latent distributions,
reduce instability across recursion rounds,
preserve enough semantic structure for downstream reasoning.

A simple formulation can be:

h' = W3 σ(W2 σ(W1 h))

where:

h is the source hidden state,
h' is the transformed state,
W1, W2, W3 are learned projection matrices,
σ is a nonlinear activation.

This is a practical abstraction, not a claim that the exact same transformation must be used in every implementation.

8. Browser Runtime Flow

A typical recursive reasoning loop may look like this:

Agent A processes the input and emits a hidden state.
RecursiveLink transforms that hidden state into a compatible latent format.
Agent B receives the transformed state and continues reasoning.
The loop repeats for one or more recursion rounds.
A final decode step produces the visible text output.

This flow keeps the intermediate reasoning inside the browser and only surfaces the final answer when needed.

9. Why This Matters

The main value of this work is not simply that it runs locally. It is that it brings a richer coordination mechanism into a browser-native environment.

That matters for several reasons:

Privacy: data stays on-device.
Deployment simplicity: no backend orchestration is required.
Portability: users can run the system from a browser.
Research value: latent collaboration can be studied in a lightweight environment.
Decentralization: browser clients can potentially participate in distributed AI workflows.

RecursiveMAS WebLLM therefore sits at the intersection of browser AI, agent systems, and latent computation.

10. Limitations

This browser-native adaptation also has clear constraints:

Hidden-state manipulation is technically complex.
Browser memory and compute budgets are limited.
WebGPU performance varies by device and browser support.
Latent transfer can become unstable without careful normalization.

The system is a prototype and should not be treated as a full replacement for server-side training or large-scale agent orchestration.

These limitations are important to acknowledge because they define the realistic scope of the project.

11. Future Work

Several extensions are worth exploring next:

browser-to-browser latent communication,
dynamic agent graphs,
stronger RecursiveLink training strategies,
recursive memory modules,
evaluation across multiple browser/device classes.

A particularly interesting direction is to test whether browser-native latent recursion can preserve some of the efficiency benefits reported in the original RecursiveMAS paper when run on consumer hardware.

12. Project Context

This repository serves as a build pipeline for a latent-transfer-capable WebLLM model. It demonstrates how a compiled WebGPU model can expose last-layer hidden states and how a trained RecursiveLink can be assembled and consumed by a browser application.

Key implementation artifacts in this repo include:

expose_hidden.py — automated patcher for exposing hidden states in an MLC model definition.
build.sh — pipeline script for converting weights, generating config, and compiling a WebGPU runtime.
train_recursivelink.py — optional training script for RecursiveLink projection weights.

13. Conclusion

RecursiveMAS WebLLM is a browser-native exploration of RecursiveMAS-style latent collaboration. My work is based on RecursiveMAS (https://arxiv.org/abs/2604.25917) as the core idea, and adapts it into a WebGPU-backed runtime that runs entirely inside the browser.

The central idea is simple: if multi-agent reasoning can be expressed as latent-state recursion, then the browser may be able to host that process locally, privately, and without backend infrastructure. That makes the browser not just a user interface, but a viable execution layer for advanced agent research.

References

Recursive Multi-Agent Systems, arXiv:2604.25917 https://recursivemas.github.io/ Demo https://vishalmysore.github.io/recursiveMASDemo

DEV Community