DEV Community

Cover image for How I Built Swarm DJ: A Multi-Agent AI System Performing Live Electronic Music 🎧
Harish Kotra (he/him)
Harish Kotra (he/him)

Posted on

How I Built Swarm DJ: A Multi-Agent AI System Performing Live Electronic Music 🎧

What happens when you give local Large Language Models (LLMs) the keys to a DJ booth?

That was the question that sparked Swarm DJ. I wanted to explore whether autonomous AI agents could collaborate in real-time to generate music, argue over creative directions, and actually make crowds danceβ€”without any human intervention.

The result is a distributed, multi-agent AI system powered by Ollama, MQTT, and real-time DSP audio generation, turning AI agents into a collective, autonomous DJ.

Watch the demo here:


System Architecture

Building an autonomous DJ meant bridging the gap between slow, token-by-token text generation and hard real-time audio constraints (where a missed buffer means an audible click).

To achieve this, the architecture separates the "thinking" from the "playing", using an MQTT broker as the central nervous system.

System Architecture

The Core Components:

  1. The Audio Engine: Built with pure NumPy for DSP synthesis (generating kicks, acid loops, and pads) and Spotify's pedalboard for real-time effects (reverb, delay, filters). It runs in an isolated, high-priority thread to prevent audio dropouts.
  2. The MIDI Clock: Emits a clock/bar_complete event strictly synchronized to the BPM. This keeps the LLM voting cycles perfectly matched to the music.
  3. The AI Agents: Three distinct personas powered by local Llama 3.2 models:
    • The Architect: Cares about structure, manipulating BPM and drop phases.
    • The Ghost: A moody, atmospheric agent controlling Reverb and Lowpass filters.
    • The Prankster: An agent born to disrupt, adding delays and literal vinyl tape-stop chaos.
  4. The Council: A Python orchestrator that runs the voting logic.

The "Dictatorship by Confidence" Protocol

Originally, I built a fully democratic voting system: every 8 bars, the agents would deliberate, propose parameter changes, and vote Yes/No/Abstain on each other's ideas.

The result was democratic gridlock. The Prankster would propose chaos, and the Architect would vote it down.

To make the music evolve dynamically (and cut cycle times down from 15 seconds to 5 seconds), I replaced democracy with a "Dictatorship by Confidence."
Every 4 bars, the agents generate a proposal with a self-assigned confidence score (0.0 to 1.0). The orchestrator listens, and the highest confidence instantly wins. This ensures fast, opinionated musical shifts over time.


🀯 Emergency Veto Powers

To make things interesting, each agent is granted one Emergency Veto per session. If an agent feels their vision is being completely ignored, they can bypass the voting cycle entirely:

  • The Architect can unleash a Tempo Lock, freezing the BPM for 32 bars.
  • The Ghost can cast an Ambient Wash, flooding the track with reverb and muting the bass.
  • The Prankster can trigger a Glitch Storm, randomizing audio parameters for a chaotic 8-bar drop.

Swarm DJ Output


πŸš€ What Can You Build With This Paradigm?

The Swarm DJ architecture (Real-time Engine + MQTT + Autonomous Agents) is extremely adaptable. If you take this codebase, you could build:

  1. AI Video Game Directors: Replace the Synthesizer with an Unreal/Unity engine integration. Let agents control enemy spawn rates, weather, and lighting based on player health.
  2. Autonomous Lighting Techs: Connect the MQTT output to DMX lighting fixtures. Have LLMs "listen" to a Spotify stream and argue over the stage light colors and strobe speeds.
  3. AI Stock Trading Ensembles: Replace the MIDI clock with a market data feed. Have conservative, aggressive, and contrarian AI agents debate and allocate portfolio percentages in real-time.
  4. Interactive Storytellers: Have agents control smart home IoT devices (hue lights, speakers, locks) while running a live D&D campaign audio feed in a haunted house attraction.

Building LLM tools that exist purely in chat interfaces is yesterday's news. Swarm DJ proves that we can break agents out of the chatbox and let them physically orchestrate the real world in real-time.

Want to run your own AI rave? Feel free to check out the repo and build your own agents!

Top comments (0)