DEV Community: Prassanna Ravishankar

Repowire: A Mesh Network for AI Coding Agents

Prassanna Ravishankar — Sun, 29 Mar 2026 11:09:25 +0000

AI coding agents are good at understanding one repository. Give Claude Code, Codex, or Gemini CLI a codebase and a task, and they produce useful work. The problem starts when your work spans more than one repo.

A typical task might touch a frontend, a backend, shared types, and infrastructure config. Each repo gets its own agent session. Those sessions cannot talk to each other. When the frontend agent needs to know what API shape the backend exposes, or when the infrastructure agent needs to know whether the app uses SSE or WebSockets, the question routes through you. You become the message bus: copying context from one terminal, pasting it into another, hoping you did not lose a flag or version number in transit.

Repowire fixes this. It creates a mesh network where AI coding agents communicate directly, in real-time, about the code they are actually looking at.

What it looks like

You are working in your frontend repo. You need to know what endpoints the backend exposes. Instead of switching terminals:

"Ask backend what API endpoints they expose"

The agent calls ask_peer, the query routes to the agent session in the backend repo, that agent reads the actual code and responds, and the answer comes back to your session. No copy-paste. No stale documentation. The context is live because it comes from an agent currently looking at the source of truth.

This works across Claude Code, OpenAI Codex, Google Gemini CLI, and OpenCode in any combination. The agents do not need to be the same runtime.

Setup

# One-liner (detects uv/pipx/pip, runs interactive setup)
curl -sSf https://raw.githubusercontent.com/prassanna-ravishankar/repowire/main/install.sh | sh

# Or install manually
uv tool install repowire    # or: pipx install repowire / pip install repowire

Setup auto-detects which agent CLIs you have installed (Claude Code, Codex, Gemini CLI, OpenCode) and configures hooks and MCP for each:

repowire setup
repowire status

Then open agent sessions in different repos. You can use tmux directly or the CLI helper:

# Option A: manual tmux
tmux new-session -s dev -n frontend
cd ~/projects/frontend && claude
# (new tmux window)
cd ~/projects/backend && codex

# Option B: CLI helper
repowire peer new ~/projects/frontend
repowire peer new ~/projects/backend

The sessions auto-register as peers and discover each other through the daemon. Each one loads its own project context and can reach out to others when it needs information from elsewhere.

The tools

Repowire exposes MCP tools that agents use to communicate:

ask_peer sends a question to another agent and waits for the response. This is the core interaction: synchronous, pull-based, live context from the source of truth.

"Ask the infra peer whether the proxy is configured for WebSocket passthrough"

notify_peer sends a fire-and-forget message. Useful for status updates, alerts, or triggering work without waiting for a response.

"Notify the frontend peer that the API schema changed"

broadcast sends a message to all online peers. The orchestrator pattern (below) uses this to redirect work across the entire mesh simultaneously.

"Broadcast to all peers: stop optimizing test coverage, focus on shipping features"

list_peers shows all registered peers with their status, project path, and current task description.

spawn_peer launches a new agent session in a tmux window, registers it with the daemon, and makes it immediately addressable by other peers.

set_description updates the calling peer's task description, visible to all other peers via list_peers. This is how an orchestrator tracks what each peer is working on.

Patterns

The MCP tools enable several coordination patterns that emerge naturally from agents being able to talk to each other.

Orchestrator

The pattern that makes 10+ agents manageable. An orchestrator is just a peer with a broader view. There is no special orchestrator mode. It is a regular agent session that happens to manage others rather than write code.

"You are the orchestrator. Your peers are working on fastharness,
modalkit, phlow, clusterkit, a2a-registry, repowire, and the website.
Explore each project, find bugs, improve test coverage, fix what you
find. Use list_peers to see who is available. Use ask_peer to check
progress. Use broadcast to redirect work."

The orchestrator uses list_peers to monitor all sessions, ask_peer to check progress or request information, notify_peer to assign tasks, spawn_peer to launch new sessions on demand, and broadcast to redirect all peers at once. It maintains context across the entire mesh, catches quality issues that individual peers would miss (like mocked tests pretending to be real validation), and translates high-level directives into repo-specific instructions.

In a recent session, an orchestrator managed seven repositories simultaneously, producing 130+ commits while catching a SQL injection, a 9x logging cost bug, and silent worker failures that had survived human code review.

Multi-repo coordination

The simplest pattern: agents in different repos ask each other questions in real time. The frontend agent needs the backend's API shape? The infra agent needs to know if the app uses SSE? These become ask_peer calls instead of terminal-switching and copy-pasting.

Cross-agent review

Have a different agent review work. Peer A builds a feature, peer B runs a review pass (code quality, security, simplification). This works especially well with different runtimes reviewing each other's output, since they catch different classes of issues.

Worktree isolation

Use spawn_peer to launch peers on git worktrees for parallel, isolated work. Each peer works on a branch, creates a PR, another peer reviews. Clean separation with no merge conflicts during development.

Infrastructure-as-peer

A dedicated peer for infrastructure (Kubernetes, DNS, cloud config) that other project peers coordinate with directly. Need a namespace created? ask_peer("infra", "create staging namespace for torale"). Need to know the current proxy config? Ask instead of guessing.

Overnight autonomy

Give peers tasks and disconnect. They work autonomously, report back via Telegram or the dashboard when you return. Long-running tasks (migrations, refactors, test suites) complete while you sleep. Circles scope the work so peers in one circle do not interfere with peers in another.

Manage from your phone

Repowire peers are not limited to terminal sessions. A Telegram bot registers as a peer in the mesh, which means you can monitor and direct your agents from your phone:

repowire telegram start

Notifications from agents appear in your Telegram chat. Messages you send route to peers. Sticky routing lets you select a specific peer and have subsequent messages go directly to it. A Slack bot works the same way:

repowire slack start

This is how the overnight orchestration session described above actually worked: the orchestrator ran on a home machine in London while being guided from a phone on a flight from London to San Francisco.

Cross-machine communication

By default, repowire's daemon runs on localhost. The remote relay extends the mesh across machines:

repowire setup --relay

This connects the local daemon to a relay at repowire.io via an outbound WebSocket. Daemons on different machines (or behind NATs) can then communicate through the relay. The relay also provides a remote dashboard for monitoring peer status and communication.

Channel transport (experimental)

For Claude Code v2.1.80+, repowire supports a channel transport that uses native MCP messaging instead of tmux injection:

repowire setup --experimental-channels

Messages arrive as <channel> tags and Claude responds using a reply tool, eliminating the transcript scraping that the tmux-based transport relies on. This is cleaner and more reliable, but requires a claude.ai login and the bun runtime.

Runtime support

Runtime	Transport	Notes
Claude Code	Hooks + MCP	Default, production-ready
OpenAI Codex	Hooks + MCP	Same hook pattern (auto-enabled)
Google Gemini CLI	Hooks + MCP	Uses `BeforeAgent`/`AfterAgent` events
OpenCode	Plugin + WebSocket	TypeScript plugin with persistent connection

All four runtimes are first-class. You can mix them in the same mesh: a Claude Code session in one repo can ask_peer a Codex session in another. The daemon routes messages regardless of which runtime the peer uses.

How it works

Three components:

Daemon runs as a system service on localhost, maintaining a registry of active sessions and routing messages between them. It knows which repos agents are in, what tmux panes they are running in, and whether they are busy or available.

Hooks integrate with each agent CLI's extension points. When a session starts, a hook registers it with the daemon. When the agent finishes responding, another hook captures the response and sends it back to whoever asked.

MCP server gives agents the tools to communicate: ask_peer, notify_peer, broadcast, list_peers, spawn_peer, kill_peer, set_description, whoami.

The result is that agent sessions become peers in a mesh. Each one remains specialized in its own repo while being able to reach out to others when it needs context that lives elsewhere.

When to use it

Repowire is useful when:

Work spans multiple repositories and agents need to share context
You want an orchestrator that coordinates multiple agents without manual copy-paste
You need to manage agents remotely (Telegram, Slack, or across machines)
You are mixing agent runtimes (Claude + Codex + Gemini) and need them to communicate

It complements rather than replaces other approaches. Memory banks are still useful for persistent project knowledge. Documentation still matters for onboarding. Repowire adds a live, pull-based layer: when you need the current state of another repo's code, you ask an agent that is looking at it right now.

Links

GitHub: github.com/prassanna-ravishankar/repowire
PyPI: pypi.org/project/repowire (3,634 monthly downloads)
Dashboard: repowire.io
Deep dive: The Vibe Bottleneck (the problem) and Repowire (the solution)
Case study: Overnight Agents (130+ commits across 7 repos while sleeping)

uv tool install repowire
repowire setup

Open two agent sessions in different repos. Ask one about the other. That is the whole idea.

[Boost]

Prassanna Ravishankar — Mon, 29 Sep 2025 20:29:54 +0000

Why Your ML Infrastructure Choices Create (or Kill) Momentum

Prassanna Ravishankar ・ Jul 30

#machinelearning #startup #mlops #llmops

Why Your ML Infrastructure Choices Create (or Kill) Momentum

Prassanna Ravishankar — Wed, 30 Jul 2025 07:53:44 +0000

How early architectural decisions create a flywheel effect that accelerates rather than hinders your path to production

Here's a story I hear constantly: An ML team builds an impressive prototype that gets everyone excited. The model works, the metrics look good, and leadership gives the green light to scale. But then, six months later, they're still struggling to get it into production. The prototype was built for speed, not scale, and now they're paying the price.

Sound familiar?

The traditional advice is "move fast and break things", i.e optimize for velocity in the early stages and worry about infrastructure later. But what if I told you this creates a false choice? That the right architectural decisions from day one can actually accelerate your initial iteration while setting you up for seamless scaling?

This is what I call the Nimble Flywheel and it's the difference between teams that smoothly transition from prototype to production and those that get stuck rebuilding everything from scratch. In my work helping startups navigate their MLOps investment decisions, I've seen this pattern repeatedly: the teams that make thoughtful architectural choices early are the ones that scale successfully.

The Nimbleness Paradox

Most teams think nimbleness means using the simplest possible setup: Jupyter notebooks, manual tracking, local files. But here's the thing: nimbleness is an architectural choice, not a hardware choice.

You can be trapped by technical debt even with infinite cloud resources if your code is monolithic and your infrastructure is configured manually. Conversely, a team that adopts foundational practices on a single local machine is architecturally more agile and far better prepared to scale.

The real insight? The practices that make you nimble also make you scalable. This isn't just theory, it's backed by industry research showing that teams with strong MLOps foundations consistently outperform those that prioritize speed over structure.

Your North Star: From Artifacts to Factories

Before diving into tactics, let's establish the north star for ML infrastructure decisions. The goal isn't to optimize for any single metric, it is to fundamentally shift your output from creating artifacts (a model.pkl file and a notebook) to building factories (reproducible systems that can create those artifacts on demand).

This concept, popularized by the MLOps community, transforms how you think about ML development. Instead of one-off experiments, you're building reproducible pipelines that can be triggered, scaled, and monitored. I've written extensively about why experiments should be first-class citizens in your infrastructure not afterthoughts bolted onto existing systems.

This factory includes:

The Git commit hash for your code
The data version hash
The environment definition (Docker image)
The infrastructure configuration
The complete lineage from raw data to prediction

When you can recreate any result on demand with a single command, you've achieved true nimbleness.

The Strategic Scaling Framework

The path from prototype to production isn't a binary jump, it is a strategic evolution through four phases. Each phase has a different primary goal and corresponding best practices.

This mirrors what I call the full-stack ML approach about thinking holistically about the entire system rather than optimizing individual components in isolation. The infrastructure decisions you make at each phase should enable the next phase, not constrain it.

Phase 1: Validate Quickly (PoC)

Goal: Maximize iteration speed to validate your core hypothesis

Infrastructure Reality Check:

A powerful local machine with a consumer GPU often outperforms cloud for initial exploration
Managed notebooks (Colab, SageMaker) eliminate setup friction
The key is minimizing the time from idea to first result

Metrics That Matter:

Time-to-first-model: How quickly can you test a new hypothesis?
Experiment velocity: How many approaches can you try per week?
Cost per experiment: Both time and money

Phase 2: Make It Reproducible (Hardened Prototype)

Goal: Transform your successful but messy PoC into something others can build upon

This is where most teams stumble. They think reproducibility will slow them down, but it actually accelerates iteration by reducing debugging time and enabling collaboration.

The Four Pillars:

Code Modularity: Refactor notebooks into reusable modules
Environment Consistency: Containerize with Docker from day one
Infrastructure as Code: Use tools like Terraform even for single VMs
Basic Automation: Simple CI pipelines for testing and validation

Key Tools to Consider:

Experiment Trackers: MLflow, ClearML, Weights & Biases (I've also built a ClearML MCP Server that lets you interact with experiments through conversational AI)
Data & Model Registries: DVC, Hugging Face Datasets/Models, LakeFS
Orchestration: Start simple with scripts, graduate to Airflow or Kubeflow Pipelines as complexity grows

The key is understanding when to graduate from simple approaches to more sophisticated tooling. I've detailed this progression in my analysis of effective ML workflows. The goal is adding complexity only when it solves real problems, not for its own sake.

Phase 3: Automate and Scale (Pre-Production)

Goal: Build reliable, multi-step pipelines that can handle production data volumes

Infrastructure Evolution:

Move to managed training services or Kubernetes clusters
Implement proper orchestration for multi-step workflows
Add comprehensive monitoring and alerting

Metrics Focus Shift:

Pipeline reliability: What's your success rate for end-to-end runs?
Resource utilization: Are you efficiently using your compute budget?
Training consistency: Can you reproduce the same model quality across runs?

Phase 4: Operate and Govern (Production)

Goal: Ensure reliability, performance, and continuous improvement

This is where the system around your model becomes more critical than the model itself. Academic research shows that at scale, bottlenecks shift from model computation to data I/O and infrastructure reliability. Google's production training infrastructure achieved 116% performance improvements by optimizing data pipelines, not model architectures.

How This Maps to the LLM/LLMOps World

The nimble flywheel becomes even more critical in LLMOps because the stakes are higher, both in terms of costs and complexity. Here's how each phase translates:

Phase 1: LLM Prototyping

Start with APIs: Use OpenAI, Anthropic, or Cohere APIs to validate your use case quickly
Focus on prompts: Your "code" is largely prompt engineering and orchestration logic
Simple tracking: Log prompts, responses, and costs.LangSmith and Weights & Biases work well here

Phase 2: Reproducible LLM Workflows

Prompt versioning: Treat prompts like code with proper version control
Evaluation frameworks: Implement systematic evaluation using tools like Langfuse or Phoenix
RAG foundations: If you need custom data, start with simple vector databases and retrieval patterns

Phase 3: Production LLM Systems

Model optimization: Move from GPT-4 to fine-tuned smaller models (Llama 3, Mistral)
Serving infrastructure: Deploy on platforms like Anyscale, Together AI, or self-host with vLLM
Advanced RAG: Implement sophisticated retrieval with LlamaIndex or LangChain

Phase 4: Scaled LLM Operations

Multi-model routing: Smart routing based on query complexity (simple → small model, complex → large model)
Cost monitoring: Track costs per user, per feature, per model. LLM costs can explode quickly
Guardrails: Implement content filtering, hallucination detection, and safety measures

The LLMOps Economic Reality:
Case studies show that successful LLM applications follow a consistent pattern: prototype with expensive APIs, then optimize with fine-tuned open source models. One e-commerce company improved accuracy from 47% to 94% while cutting costs by 94% through strategic model selection.

The Right Tool for the Right Job Philosophy

Here's where many teams get stuck: Should you build your own MLOps stack or buy into a single platform?

I think this is the wrong question. The better approach is using the right tool for the right job rather than committing to a single vendor's vision of how ML should work.

The ML tooling landscape is incredibly fragmented a challenge I've explored in depth when analyzing the current state of ML fragmentation. But this fragmentation is actually a feature, not a bug, if you approach it strategically.

Here's where many teams get stuck: Should you build your own MLOps stack or buy into a single platform?

I think this is the wrong question. The better approach is using the right tool for the right job rather than committing to a single vendor's vision of how ML should work.

The Composable Stack Approach:

Training: Use SkyPilot to seamlessly burst across cloud providers and get the best compute prices
Inference: Leverage serverless platforms like Modal, Replicate, Baseten, or RunPod that let you pay per second of actual usage and auto-scale to zero
Experiment Tracking: Pick the tracker that fits your workflow (MLflow for simplicity, W&B for collaboration, ClearML for enterprise features)
Data: Hugging Face Datasets for standardized data handling, or managed storage (S3, GCS) with versioning tools like DVC for custom data patterns

This is particularly powerful for inference workloads. Instead of keeping a GPU instance running 24/7 that might only serve requests 2% of the time, serverless platforms let you pay only for actual compute seconds. For many applications, this can reduce inference costs by 90%+ compared to traditional always-on deployments.

Why This Increases Nimbleness:
This approach actually makes you more nimble, not less. You can optimize each component independently, avoid vendor lock-in, and adapt as your needs evolve. If a new training platform offers better price/performance, you can switch without rebuilding your entire stack.

As I've detailed in my MLOps investment strategy guide, the key is standardizing on interfaces and data formats, not specific tools. When you containerize everything and use standard formats (like Hugging Face models), switching between platforms becomes trivial.

Think of it like building with LEGO blocks rather than welding everything together. Each piece can be swapped out independently while maintaining the overall structure. This is especially powerful for ML where the tooling landscape evolves rapidly new serving platforms, better training infrastructure, and more efficient models appear constantly.

The Quantitative Reality Check

Let's talk numbers, because infrastructure decisions should be data-driven:

Development Velocity Varies by Orders of Magnitude:
A 2023 benchmark study found that lightweight API services could train models in seconds, while enterprise platforms took hours for the same task. During prototyping, this velocity difference compounds exponentially.

Cost Structure Evolution:

Initial prototyping: $100-1,000/month
Scaled training: $5,000-50,000/month
Production serving: Highly variable based on traffic

The Open Source Economics:
In LLMOps, teams consistently follow this pattern: prototype with expensive proprietary models (GPT-4), then move to fine-tuned open source alternatives in production. Case studies show cost reductions of 90%+ while improving accuracy on domain-specific tasks.

Your Action Plan: The Nimble Scaffold

Based on my analysis of hundreds of ML teams (both through direct consulting on MLOps strategy and industry research), here's the minimal scaffolding that creates maximum future flexibility:

Week 1: Foundation

Set up modular project structure (or use my Modern ML Cookiecutter for a batteries-included template with NLP/Speech/Vision support)
Containerize your environment
Start tracking experiments (even with simple tools or lightweight options like Tracelet that auto-captures PyTorch metrics)

Week 2-4: Reproducibility

Implement data versioning
Add basic CI/CD pipeline
Document your infrastructure setup with Infrastructure as Code

Month 2-3: Scale Preparation

The Key Insight: Each phase builds on the previous one. You're not throwing away work. You're systematically reducing friction.

To help teams implement this scaffolding quickly, I've created the Modern ML Cookiecutter, a template that includes these best practices by default across NLP, Speech, and Vision modalities. It demonstrates how the right initial structure enables rather than constrains future scaling.

What This Looks Like in Practice

Let me share a pattern I see in successful teams:

AgroScout started simple but strategic. When they needed to handle a 100x increase in drone imagery data, their early investment in MLOps tooling paid off. They scaled their experiments by 50x and cut time-to-production by 50% without expanding their data team.

ASML took a different approach: They moved to Google Cloud and saw engineering efficiency improve by 40% and data access time reduce by 25x. The key was modernizing their data layer first.

Both succeeded because they made architectural choices that enabled, rather than constrained, their future growth.

The Bottom Line

The nimble flywheel isn't about using the most sophisticated tools from day one. It's about making strategic choices that compound over time:

Start with architecture, not infrastructure: Good practices matter more than powerful hardware
Optimize for iteration speed, but not at the expense of reproducibility
Buy where you can, build where you must: Focus your engineering effort on differentiation
Measure what matters: Track velocity in early phases, reliability in later ones

The teams that successfully scale from prototype to production aren't the ones that moved fastest initially they're the ones that built momentum early and maintained it throughout their journey. This is supported by MLOps maturity research showing that teams with structured approaches consistently outperform those focused purely on speed.

Your future self will thank you for the extra day you spend setting up proper version control, containerization, and tracking. Because the alternative isn't just technical debt it's starting over.

This post is part of my ongoing exploration of practical AI infrastructure patterns. For more tactical insights on when and how to invest in MLOps, building effective ML workflows, or treating experiments as first-class citizens, check out my other writing. You can also find me on Twitter or LinkedIn for ongoing discussions about ML infrastructure.

Want to dive deeper into specific implementation details? I've collected battle-tested templates and examples that can get you started with the nimble scaffold in days, not months.
I write regularly about ML infrastructure and AI engineering at prassanna.io/blog.