Most multi-agent AI systems have a hidden inefficiency.
Every time agents collaborate, they typically communicate by generating text, passing that text to another agent, and then re-processing it again. While this works, it's expensive, slow, and burns through tokens quickly.
What if agents could communicate without generating text at all?
That's the idea behind RecursiveMAS, a recent research framework that allows AI agents to collaborate directly through their internal latent representations instead of exchanging natural language.
Inspired by this research, I built recursiveMASWebLLM — a build pipeline that brings RecursiveMAS-style latent collaboration directly into the browser using WebLLM, MLC-LLM, and WebGPU.
The result is a fully client-side experimental platform for running recursive multi-agent systems on consumer hardware without requiring cloud GPUs.
The Problem with Traditional Multi-Agent Systems
Most agent frameworks operate like this:
Agent A → generates text
↓
Agent B → reads text and generates more text
↓
Agent C → reads text and generates final answer
Every handoff requires:
- Token generation
- Token transmission
- Token re-processing
As the number of agents increases, the overhead grows rapidly.
A significant portion of the computation is spent translating thoughts into text and then converting that text back into internal representations.
This works, but it's not how neural networks naturally communicate.
What Is RecursiveMAS?
RecursiveMAS takes a different approach.
Instead of exchanging generated text, agents exchange their last-layer hidden states (latent representations).
Think of hidden states as the model's internal reasoning space before words are produced.
Agent A Hidden State
↓
RecursiveLink
↓
Agent B Hidden State
↓
RecursiveLink
↓
Agent C Hidden State
The entire multi-agent system becomes a recursive computation graph operating in latent space.
The original research introduces a lightweight component called RecursiveLink, which acts as a bridge between agents.
Rather than training or fine-tuning the underlying LLMs, only these small link modules are trained while the base models remain frozen.
This allows multiple agents to collaboratively refine reasoning before any text is generated.
Core Concepts
RecursiveLink
A lightweight residual network that transforms and transfers latent representations between agents.
Instead of passing:
"What is the answer?"
agents pass:
[hidden_state_vector]
This dramatically reduces communication overhead.
Inner Link
Allows an agent to recursively refine its own latent reasoning.
Agent
↓
Hidden State
↓
RecursiveLink
↓
Back Into Agent
This creates iterative self-improvement loops before decoding text.
Outer Link
Enables latent communication between different agents.
Agent A
↓
RecursiveLink
↓
Agent B
The research demonstrates that even heterogeneous models can participate in these recursive workflows.
System-Level Recursion
The entire multi-agent system can execute multiple refinement passes.
Pass 1
↓
Pass 2
↓
Pass 3
↓
Final Decode
Instead of generating intermediate text after every step, the system performs latent collaboration first and produces text only at the end.
Why This Matters
According to the RecursiveMAS research, latent-space collaboration delivers:
- Higher benchmark accuracy
- Reduced token consumption
- Faster end-to-end inference
- Better scalability across multiple agents
Reported results include:
- Up to 75% reduction in token usage
- 1.2×–2.4× faster inference
- Average accuracy improvements across reasoning, coding, science, and medical benchmarks
The key insight is that agents can collaborate more efficiently when communication occurs inside the neural representation space rather than through natural language.
The Challenge: Running RecursiveMAS in the Browser
The original RecursiveMAS implementation targets server environments and GPU inference stacks such as vLLM.
Browser-based AI introduces a major limitation:
WebLLM models do not normally expose internal hidden states.
Without access to hidden states, latent recursion is impossible.
That became the motivation for this project.
Introducing recursiveMASWebLLM
recursiveMASWebLLM is a specialized build pipeline for creating WebLLM models capable of latent-state transfer.
It extends the browser AI stack to expose the information required for RecursiveMAS-style recursion.
The goal is simple:
Research Paper
↓
Server GPU Implementation
↓
Browser-Compatible Runtime
↓
Accessible to Everyone
What This Project Adds
Hidden State Extraction
MLC-LLM is patched to expose:
get_last_hidden()
This allows browser applications to access last-layer hidden states directly during inference.
Without this capability, RecursiveMAS cannot function.
RecursiveLink Training Pipeline
The repository includes tooling to train and package RecursiveLinks.
train_recursivelink.py
Generated links are exported as:
recursivelink.json
These lightweight modules can then be loaded by browser-based agent systems.
Automated Browser Model Builds
The build pipeline supports:
- Model conversion
- Quantization
- WebGPU compilation
- WASM generation
- Release packaging
Even small models can be built entirely through GitHub Actions without requiring local GPUs.
Browser Deployment
Outputs include:
.wasm
weights
recursivelink.json
These artifacts can be hosted on:
- GitHub Releases
- Hugging Face
- Static web hosting
and loaded directly into browser applications.
Project Architecture
recursiveMASWebLLM
│
▼
Build Pipeline
│
▼
.wasm + weights + recursivelink.json
│
▼
Hosted Artifacts
│
▼
RecursiveMAS Playground
│
▼
Browser-Based Recursive Agents
The builder generates everything needed for latent recursive collaboration in WebLLM-powered applications.
Why Browser-Based Recursive Agents Are Interesting
1. Democratizing Advanced AI Research
Researchers and developers can experiment with RecursiveMAS techniques without expensive cloud infrastructure.
If a device supports WebGPU, it can participate.
2. Interactive Experimentation
Developers can modify:
- Recursion depth
- Agent roles
- Collaboration patterns
- Prompt strategies
and immediately observe how latent collaboration affects outcomes.
3. Education
RecursiveMAS introduces a fundamentally different way of thinking about multi-agent systems.
Running it locally in a browser makes it easier to understand and teach.
4. Lower Latency
Reducing intermediate token generation is especially valuable in browser environments where responsiveness matters.
5. Future Extensions
Exposing hidden states opens the door to:
- Latent planning systems
- Browser-side distillation
- Neural memory systems
- Hybrid cloud/browser agents
- Experimental reasoning architectures
RecursiveMAS is just one possible application.
Getting Started
Repository:
https://github.com/vishalmysore/recursiveMASWebLLM
The project includes:
- Local build instructions
- GitHub Actions workflows
- RecursiveLink training utilities
- Model packaging tools
- Integration guidance for the RecursiveMAS playground
Looking Ahead
This project is still early, but it establishes the foundation for browser-native latent multi-agent systems.
Future work includes:
- Larger model support
- Improved model sharding
- Additional collaboration patterns
- Better WebGPU optimizations
- Community-created RecursiveLinks
- Integration with other browser AI frameworks
As browser AI continues to mature, I believe we'll see more experimentation move from cloud infrastructure to client-side environments.
RecursiveMAS demonstrates that some of the most interesting ideas in AI may not require massive server clusters—they may eventually run directly in the browser.
What do you think?
Could latent-space multi-agent systems become the next evolution of browser AI experimentation?
https://github.com/vishalmysore/recursiveMASWebLLM
https://recursivemas.github.io/
https://huggingface.co/VishalMysore/RecursiveMAS-0.5B-MLC
Top comments (0)