vishalmysore

Posted on Jun 22

Bringing Recursive Multi-Agent Systems to the Browser with WebLLM and WebGPU

#agents #ai #llm #showdev

Most multi-agent AI systems have a hidden inefficiency.

Every time agents collaborate, they typically communicate by generating text, passing that text to another agent, and then re-processing it again. While this works, it's expensive, slow, and burns through tokens quickly.

What if agents could communicate without generating text at all?

That's the idea behind RecursiveMAS, a recent research framework that allows AI agents to collaborate directly through their internal latent representations instead of exchanging natural language.

Inspired by this research, I built recursiveMASWebLLM — a build pipeline that brings RecursiveMAS-style latent collaboration directly into the browser using WebLLM, MLC-LLM, and WebGPU.

The result is a fully client-side experimental platform for running recursive multi-agent systems on consumer hardware without requiring cloud GPUs.

The Problem with Traditional Multi-Agent Systems

Most agent frameworks operate like this:

Agent A → generates text
         ↓
Agent B → reads text and generates more text
         ↓
Agent C → reads text and generates final answer

Every handoff requires:

Token generation
Token transmission
Token re-processing

As the number of agents increases, the overhead grows rapidly.

A significant portion of the computation is spent translating thoughts into text and then converting that text back into internal representations.

This works, but it's not how neural networks naturally communicate.

What Is RecursiveMAS?

RecursiveMAS takes a different approach.

Instead of exchanging generated text, agents exchange their last-layer hidden states (latent representations).

Think of hidden states as the model's internal reasoning space before words are produced.

Agent A Hidden State
         ↓
RecursiveLink
         ↓
Agent B Hidden State
         ↓
RecursiveLink
         ↓
Agent C Hidden State

The entire multi-agent system becomes a recursive computation graph operating in latent space.

The original research introduces a lightweight component called RecursiveLink, which acts as a bridge between agents.

Rather than training or fine-tuning the underlying LLMs, only these small link modules are trained while the base models remain frozen.

This allows multiple agents to collaboratively refine reasoning before any text is generated.

Core Concepts

RecursiveLink

A lightweight residual network that transforms and transfers latent representations between agents.

Instead of passing:

"What is the answer?"

agents pass:

[hidden_state_vector]

This dramatically reduces communication overhead.

Inner Link

Allows an agent to recursively refine its own latent reasoning.

Agent
   ↓
Hidden State
   ↓
RecursiveLink
   ↓
Back Into Agent

This creates iterative self-improvement loops before decoding text.

Outer Link

Enables latent communication between different agents.

Agent A
   ↓
RecursiveLink
   ↓
Agent B

The research demonstrates that even heterogeneous models can participate in these recursive workflows.

System-Level Recursion

The entire multi-agent system can execute multiple refinement passes.

Pass 1
   ↓
Pass 2
   ↓
Pass 3
   ↓
Final Decode

Instead of generating intermediate text after every step, the system performs latent collaboration first and produces text only at the end.

Why This Matters

According to the RecursiveMAS research, latent-space collaboration delivers:

Higher benchmark accuracy
Reduced token consumption
Faster end-to-end inference
Better scalability across multiple agents

Reported results include:

Up to 75% reduction in token usage
1.2×–2.4× faster inference
Average accuracy improvements across reasoning, coding, science, and medical benchmarks

The key insight is that agents can collaborate more efficiently when communication occurs inside the neural representation space rather than through natural language.

The Challenge: Running RecursiveMAS in the Browser

The original RecursiveMAS implementation targets server environments and GPU inference stacks such as vLLM.

Browser-based AI introduces a major limitation:

WebLLM models do not normally expose internal hidden states.

Without access to hidden states, latent recursion is impossible.

That became the motivation for this project.

Introducing recursiveMASWebLLM

recursiveMASWebLLM is a specialized build pipeline for creating WebLLM models capable of latent-state transfer.

It extends the browser AI stack to expose the information required for RecursiveMAS-style recursion.

The goal is simple:

Research Paper
      ↓
Server GPU Implementation
      ↓
Browser-Compatible Runtime
      ↓
Accessible to Everyone

What This Project Adds

Hidden State Extraction

MLC-LLM is patched to expose:

get_last_hidden()

This allows browser applications to access last-layer hidden states directly during inference.

Without this capability, RecursiveMAS cannot function.

RecursiveLink Training Pipeline

The repository includes tooling to train and package RecursiveLinks.

train_recursivelink.py

Generated links are exported as:

recursivelink.json

These lightweight modules can then be loaded by browser-based agent systems.

Automated Browser Model Builds

The build pipeline supports:

Model conversion
Quantization
WebGPU compilation
WASM generation
Release packaging

Even small models can be built entirely through GitHub Actions without requiring local GPUs.

Browser Deployment

Outputs include:

.wasm
weights
recursivelink.json

These artifacts can be hosted on:

GitHub Releases
Hugging Face
Static web hosting

and loaded directly into browser applications.

Project Architecture

recursiveMASWebLLM
        │
        ▼
Build Pipeline
        │
        ▼
.wasm + weights + recursivelink.json
        │
        ▼
Hosted Artifacts
        │
        ▼
RecursiveMAS Playground
        │
        ▼
Browser-Based Recursive Agents

The builder generates everything needed for latent recursive collaboration in WebLLM-powered applications.

Why Browser-Based Recursive Agents Are Interesting

1. Democratizing Advanced AI Research

Researchers and developers can experiment with RecursiveMAS techniques without expensive cloud infrastructure.

If a device supports WebGPU, it can participate.

2. Interactive Experimentation

Developers can modify:

Recursion depth
Agent roles
Collaboration patterns
Prompt strategies

and immediately observe how latent collaboration affects outcomes.

3. Education

RecursiveMAS introduces a fundamentally different way of thinking about multi-agent systems.

Running it locally in a browser makes it easier to understand and teach.

4. Lower Latency

Reducing intermediate token generation is especially valuable in browser environments where responsiveness matters.

5. Future Extensions

Exposing hidden states opens the door to:

Latent planning systems
Browser-side distillation
Neural memory systems
Hybrid cloud/browser agents
Experimental reasoning architectures

RecursiveMAS is just one possible application.

Getting Started

Repository:

https://github.com/vishalmysore/recursiveMASWebLLM

The project includes:

Local build instructions
GitHub Actions workflows
RecursiveLink training utilities
Model packaging tools
Integration guidance for the RecursiveMAS playground

Looking Ahead

This project is still early, but it establishes the foundation for browser-native latent multi-agent systems.

Future work includes:

Larger model support
Improved model sharding
Additional collaboration patterns
Better WebGPU optimizations
Community-created RecursiveLinks
Integration with other browser AI frameworks

As browser AI continues to mature, I believe we'll see more experimentation move from cloud infrastructure to client-side environments.

RecursiveMAS demonstrates that some of the most interesting ideas in AI may not require massive server clusters—they may eventually run directly in the browser.

What do you think?

Could latent-space multi-agent systems become the next evolution of browser AI experimentation?

https://github.com/vishalmysore/recursiveMASWebLLM
https://recursivemas.github.io/
https://huggingface.co/VishalMysore/RecursiveMAS-0.5B-MLC

DEV Community