I Built a 5-Agent AI Command Crew That Lives on My Ceiling

#ai #makers #python #homeautomation

I Built a 5-Agent AI Command Crew That Lives on My Ceiling

My desk has five AI agents projected onto it from a ceiling-mounted projector, and each one has a name, a voice, and a job. When I ask about the weather, a British woman named Vela answers. When I need a print job status, Aria picks it up. When something breaks, Ares runs diagnostics. None of them are the same voice, the same personality, or the same model configuration. And none of them would exist if I hadn't gotten frustrated with how useless a single chatbot felt for actually running a workshop.

This is the HoloDesk / SHARD Command Center. It's been fully operational since March 23rd. Here's how it works.

Why One AI Wasn't Enough

I tried the single-assistant approach for about three months. One voice, one wake word, one model handling everything from "what's the weather" to "is my K1 printer jammed" to "run a system health check." The problem wasn't the AI. The problem was that everything felt the same. Same tone, same cadence, same mental model for wildly different tasks. It flattened the work.

When you're in a real shop environment, different tools feel different. A multimeter isn't a drill press. I wanted my AI layer to work the same way.

So I split the crew:

SHARD — command hub, handles routing and orchestration, primary voice
ATLAS — ops and Home Assistant integration, knows my house and my systems
Ares — diagnostics, monitors hardware, runs checks, talks about problems directly
Aria — dedicated to my Creality K1 printer, knows the machine's state
Vela — weather, environmental conditions, British accent

Each agent has its own system prompt, its own TTS voice configuration, and its own domain. SHARD is the only one you talk to directly. The others get delegated to.

The Routing Architecture

The wake word kicks everything to SHARD. From there, SHARD's system prompt includes routing logic that determines which agent should actually handle the request. It's not magic — it's prompt engineering combined with a dispatcher function in Python that checks the intent classification SHARD returns.

A simplified version of how the dispatch looks:

def route_to_agent(intent: str, query: str):
    routing_map = {
        "weather": agents["vela"],
        "print": agents["aria"],
        "diagnostics": agents["ares"],
        "home_assistant": agents["atlas"],
    }
    agent = routing_map.get(intent, agents["shard"])
    return agent.respond(query)

SHARD classifies the intent, returns a routing token along with any direct response, and the dispatcher hands the query to the right agent. Each agent object carries its own context window, voice ID, and system prompt. When Vela responds, the TTS pipeline pulls her voice config. When Ares responds, you hear a different voice entirely.

It sounds like a crew because it is one. That distinction matters more than I expected.

Kinect V2 as the Input Layer

The projected surface is interactive. I'm using a Kinect V2 for depth-based touch detection and mid-air gesture recognition. This took longer to get right than the agent routing did.

For fingertip touch on the projected surface, I'm doing blob detection on the depth frame. The Kinect outputs a 512x424 depth image at 30fps. I threshold it to find anything within a narrow depth band just above my desk surface, then run contour detection to find fingertip contact points.

# Simplified depth blob touch detection
depth_frame = kinect.get_last_depth_frame()
depth_img = depth_frame.reshape((424, 512)).astype(np.float32)

# Isolate near-surface zone
touch_mask = cv2.inRange(depth_img, DESK_Z - 30, DESK_Z + 15)
contours, _ = cv2.findContours(touch_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

for c in contours:
    if cv2.contourArea(c) > MIN_BLOB_AREA:
        M = cv2.moments(c)
        cx = int(M["m10"] / M["m00"])
        cy = int(M["m01"] / M["m00"])
        register_touch(cx, cy)

For gesture input, I'm using the full skeleton tracking that Kinect V2 provides. Swipe left, swipe right, a grab motion, and what I call the scan beam — you raise your arm and sweep it across the surface, which triggers a system scan animation and kicks Ares. These are tracked via joint velocity deltas on the wrist and hand joints.

Calibrating the depth threshold to my actual desk surface took a full evening. The Kinect sees the desk at a slightly different Z depending on where you are in the frame due to lens angle. I ended up building a calibration routine that samples 20 points across the surface and builds a depth correction map. That killed the false positives.

The Neural Viz Sphere

The visual centerpiece is a 1000-node sphere rendered in real time using a Fibonacci sphere distribution. Every node is positioned using the golden angle to get even coverage, and the whole thing rotates continuously using numpy-accelerated matrix math.

def fibonacci_sphere(n=1000):
    golden = np.pi * (3 - np.sqrt(5))
    indices = np.arange(n)
    y = 1 - (indices / (n - 1)) * 2
    radius = np.sqrt(1 - y ** 2)
    theta = golden * indices
    x = np.cos(theta) * radius
    z = np.sin(theta) * radius
    return np.stack([x, y, z], axis=1)

The sphere changes behavior based on which agent is active and what state the system is in. SHARD's idle state is a slow blue pulse. Ares activates a red-tinted faster spin with visible arc connections between nodes. Aria goes warm orange. Vela goes light cyan. When a voice response is actively playing, the rotation speed scales with the amplitude of the audio output in real time.

It's not necessary. It's also the first thing anyone who sees this setup reacts to. The viz makes the crew feel present in a way that a text readout never would.

What's Still Rough

The depth calibration drifts slightly when the room temperature changes. In a Pennsylvania winter, that matters. I have a recalibration command I can trigger verbally but it shouldn't need to be manual.

Agent memory is session-only right now. Each agent context resets when the process restarts. ATLAS has a memory system via MEMORY.md files that I've built separately, but integrating that into the real-time agent dispatch layer cleanly is still on the list.

Aria's printer integration is read-heavy right now. She can tell me layer count, estimated time, nozzle temp. She cannot yet issue commands back to the K1. That's a firmware API question I haven't fully solved.

The scan beam gesture misfires occasionally if I'm reaching across the desk for something unrelated. Gesture intent classification is a harder problem than I gave it credit for at the start.

What's Next

The Pepper's Ghost display is already in the project notes — a physical holographic companion device for ATLAS built around that optical illusion. Hardware isn't built yet and I'm not starting software until it is. That's a lesson I learned the hard way on other projects.

The AIPI Lite ESP32-S3 voice intercept is also in progress. The goal there is rerouting a physical voice device through my local Claude pipeline instead of the vendor's cloud. DNS intercept is built. There's a stubborn MQTT hardcoded IP that's fighting me. That one will get its own writeup when it's done.

If you're building something in this direction, follow along. I document what I actually finish, not what I'm planning to finish.