Harish Kotra (he/him)

Posted on Apr 25

Building Agent Arena: Using Valkey as the Nervous System for Multi-Agent AI

#ai #programming #python #dailybuild2026

Most AI agent demos prove intelligence. Very few prove coordination.

In this project, we built Agent Arena: Fact or Fake, a real-time multiplayer game where four autonomous agents collaborate through one shared substrate: Valkey.

This post walks through the architecture, implementation details, tradeoffs, and developer patterns you can reuse.

Problem Statement

An LLM can generate content. But production-grade multi-agent systems need more:

Shared state across independent workers
Event-driven handoffs without tight coupling
Long-term memory that informs future behavior
Observability and recovery under failure

Without these, agent systems become brittle chains of API calls.

System Overview

Agents in this app:

Researcher: generates factual/misleading claim candidates (Ollama)
Writer: rewrites claim into player-facing question (Ollama)
Editor: validates truth + confidence (OpenAI)
Game Master: orchestrates timed rounds, scoring, leaderboard

Players join over WebSocket and answer FACT or FAKE in real time.

Core Design Principle

No direct agent-to-agent calls.

Every handoff is done through Valkey:

State -> JSON keys
Orchestration -> Pub/Sub events
Long-term recall -> vector index (FT.CREATE / FT.SEARCH)

Architecture Diagram

Players (WS) -> FastAPI -> Valkey (JSON + Pub/Sub + Vector)
                                |      |           |
                                v      v           v
                           Researcher Writer     Editor
                                   \      |      /
                                    \     |     /
                                     -> Game Master

Implementation Breakdown

1) Shared State (Valkey JSON)

Agent outputs and game state are written into namespaced keys:

game:state:{room_id}
agent:researcher:output:{room}
agent:writer:draft:{room}
agent:editor:review:{room}
game:round:{room}:{round}

# backend/services/state_store.py
async def set_game_state(self, room_id: str, state: dict[str, Any]) -> None:
    await self.valkey.set_json(f'game:state:{room_id}', state)

The set_json/get_json layer supports fallback to SET/GET JSON strings if RedisJSON is unavailable, keeping local demos robust.

2) Event-Driven Orchestration (Valkey Pub/Sub)

Every workflow transition publishes an event envelope:

# backend/services/event_bus.py
await self.valkey.publish(channel, envelope.model_dump_json())

Each agent subscribes only to channels it cares about and reacts to events.

This enables:

decoupled scaling
independent process restarts
clean failure boundaries

3) Long-Term Memory (ValkeySearch vectors)

Questions are embedded and stored as memory documents:

# backend/services/vector_memory.py
await self.valkey.set_json(f'memory:question:{round_id}', {
  'question': question,
  'topic': topic,
  'difficulty': difficulty,
  'player_accuracy': player_accuracy,
  'embedding': emb,
})

Vector search retrieves similar prior questions to reduce repetition and improve topic progression.

Round Lifecycle

The event chain per round:

GAME_START / START_ROUND -> Researcher emits RESEARCH_DONE
Writer reacts -> emits DRAFT_READY
Editor reacts -> emits VALIDATION_COMPLETE
Game Master launches round -> emits NEW_QUESTION
Players answer via WebSocket
Game Master emits ROUND_RESULT + LEADERBOARD_UPDATE + ROUND_COMPLETE

We also added cycle_id propagation to guard against stale or duplicate downstream processing.

Reliability Improvements We Added

Preserved player score on reconnect (HSETNX)
Reset scores on a new game start in the same room
Event handler safety: per-event exceptions don’t kill whole agent loop
WebSocket payload validation (invalid_json, invalid_round_id, round_not_active)
Health endpoint checks Valkey reachability + JSON/Search capability

Operational Checks

Use this before demos:

curl -s http://127.0.0.1:8000/health | jq

You’ll see:

reachable
json_module
search_module

Environment Management (Varlock-first)

The app loads settings from process environment variables, which makes it a good fit for Varlock-managed secrets/config.

Example runtime:

varlock run -- uvicorn main:app --reload --port 8000
varlock run -- python scripts/run_agents.py --agents researcher writer editor game_master

Test Strategy

Minimal integration test included:

pytest -q tests/test_integration_round.py

It validates: start game -> first round -> leaderboard update.

Why This Pattern Matters

Compared with direct API chaining between agents, this design gives:

Better fault isolation
Better observability
Easier horizontal scaling
Simpler mental model for distributed workflows

What to Improve Next

Move orchestration from Pub/Sub to Valkey Streams (durable delivery)
Add event idempotency store + dead-letter handling
Add OpenTelemetry traces for event lifecycle
Add CI pipeline for contract tests + reliability tests

LLMs provide reasoning, but coordination makes systems reliable.

If you’re building multi-agent workflows, treat Valkey as your shared cognition fabric, not just cache.

Screenshots

*Github: *https://github.com/harishkotra/neuroloop

DEV Community