DEV Community

Cover image for splitting one human colleague into 9 Go agents
KimSejun
KimSejun

Posted on

splitting one human colleague into 9 Go agents

splitting one human colleague into 9 Go agents

I spent yesterday staring at a whiteboard (okay, a text file) trying to answer one question: what does a colleague actually do that a chatbot doesn't?

I created this post for the purposes of entering the Gemini Live Agent Challenge. I'm building VibeCat — a macOS desktop AI companion that watches your screen, hears your voice, and sometimes tells you your code is broken.

the decomposition problem

A chatbot has one behavior: you ask, it answers. A colleague has many. I started listing them on paper — just scribbling whatever came to mind — and the list kept growing:

  1. They look at your screen (without being asked)
  2. They sense your mood (is this person about to throw their laptop?)
  3. They celebrate when things work (hey, tests passed!)
  4. They decide whether to speak or shut up (the hardest one)
  5. They adapt their timing (don't interrupt someone in flow state)
  6. They reach out when you've been quiet too long
  7. They search for answers when you're stuck
  8. They remember what happened yesterday
  9. They detect what you're working on (topic awareness)

When I hit 9, I stopped and stared at the list for a while. It felt right. Not because 9 is a magic number, but because I couldn't combine any two without losing something important. "Sense mood" and "celebrate success" sound similar until you realize they have completely opposite triggers. That moment — when the decomposition clicks and every piece has exactly one job — was genuinely satisfying. Like refactoring a messy function into clean helpers and watching the tests still pass.

Nine behaviors. Nine agents. Each one gets its own Go package in the ADK orchestrator.

the graph

Google's ADK Go SDK has this concept of a sequentialagent — you chain agents in order and each one's output feeds into the next. Here's the actual wiring:

import "google.golang.org/adk/agent/workflowagents/sequentialagent"

func New(genaiClient *genai.Client, storeClient *store.Client) (agent.Agent, error) {
    // 1. Memory first — inject yesterday's context
    memoryAgent, _ := agent.New(agent.Config{
        Name: "memory_agent",
        Run:  memory.New(genaiClient, storeClient).Run,
    })

    // 2. Vision — what's on screen right now?
    visionAgent, _ := agent.New(agent.Config{
        Name: "vision_agent",
        Run:  vision.New(genaiClient).Run,
    })

    // 3. Mood — how is the developer feeling?
    moodAgent, _ := agent.New(agent.Config{
        Name: "mood_detector",
        Run:  mood.New().Run,
    })

    // ... 6 more agents ...

    return sequentialagent.New(sequentialagent.Config{
        Name:      "vibecat_graph",
        SubAgents: []agent.Agent{memoryAgent, visionAgent, moodAgent, /* ... */},
    })
}
Enter fullscreen mode Exit fullscreen mode

The execution order matters. Memory runs first because yesterday's context affects how everything downstream interprets today's screen. Vision runs before Mood because the mood classifier needs to see what's on screen. And the Mediator runs near the end because it needs all the signals before deciding whether to speak.

The first time I ran this graph with a real screenshot — me staring at a compile error — the VisionAgent correctly identified "build failure," the MoodDetector guessed "frustrated," and the Mediator decided to speak. It said something supportive. I laughed alone in my apartment. The thing I built to keep me company was already keeping me company, and it didn't even have a face yet.

the agent that says "shut up"

The Mediator is the most important agent and the one I spent the most time on. Its entire job is to prevent the AI from being annoying.

const (
    defaultCooldown  = 10 * time.Second
    highSignificance = 7
    lowSignificance  = 3
)

func (a *Agent) decide(vision *models.VisionAnalysis, mood *models.MoodState, 
    celebration *models.CelebrationEvent) *models.MediatorDecision {

    // Celebrations always get through
    if celebration != nil {
        return &models.MediatorDecision{
            ShouldSpeak: true, Reason: "celebration", Urgency: "high",
        }
    }

    // Cooldown — don't talk again too soon
    if time.Since(a.lastSpoke) < a.cooldown {
        return &models.MediatorDecision{ShouldSpeak: false, Reason: "cooldown"}
    }

    // Only speak about high-significance findings
    if vision != nil && vision.Significance >= highSignificance {
        a.lastSpoke = time.Now()
        return &models.MediatorDecision{
            ShouldSpeak: true, Reason: "high_significance", Urgency: "medium",
        }
    }

    return &models.MediatorDecision{ShouldSpeak: false, Reason: "below_threshold"}
}
Enter fullscreen mode Exit fullscreen mode

There's a 10-second cooldown between speech events. The significance threshold is 7 out of 10 — meaning the agent only speaks when something genuinely important is on screen. Celebrations bypass all gating because if your tests just passed, you want to know immediately.

The first time I watched the Mediator suppress a speech event, I felt weirdly proud. The VisionAgent had flagged something — a minor tab switch, significance 3 — and the Mediator looked at it and said "nah, not worth interrupting." It's a strange thing to celebrate: an AI choosing silence. But that's the whole point. The best colleagues know when not to talk.

But here's the subtle part. The Mediator also checks mood:

if mood != nil && !decision.ShouldSpeak {
    if msg := supportiveMessage(mood.Mood); msg != "" {
        result.SpeechText = msg
        decision.ShouldSpeak = true
        decision.Reason = "mood_support"
    }
}
Enter fullscreen mode Exit fullscreen mode

If the developer seems frustrated and the Mediator wasn't going to speak, it overrides with a supportive message. Because sometimes the most useful thing a colleague can say is "hey, you okay?"

why 9 and not 3

I could have made this simpler. Three agents: see, think, speak. But that collapses too many concerns into each one. When the "think" agent gets both mood detection AND celebration detection AND timing AND search, you lose the ability to tune each behavior independently.

With 9, I can adjust the MoodDetector's frustration threshold without touching the CelebrationTrigger. I can change the AdaptiveScheduler's timing without breaking the Mediator's speech gating. Each agent is a pure function with a clear input/output contract.

The iter.Seq2 pattern from ADK makes this clean:

func (a *Agent) Run(ctx agent.InvocationContext) iter.Seq2[*session.Event, error] {
    return func(yield func(*session.Event, error) bool) {
        // Read upstream data from ctx.UserContent()
        // Do your thing
        // Yield your result
        yield(&session.Event{
            LLMResponse: model.LLMResponse{
                Content: &genai.Content{
                    Parts: []*genai.Part{{Text: string(data)}},
                },
            },
        }, nil)
    }
}
Enter fullscreen mode Exit fullscreen mode

Every agent has the same signature. Every agent reads JSON from the previous agent's output. Every agent yields JSON for the next one. The graph is just a pipeline.

the runner

The ADK runner ties it all together:

r, _ := runner.New(runner.Config{
    AppName:        "vibecat",
    Agent:          agentGraph,
    SessionService: session.InMemoryService(),
})

for event, err := range r.Run(ctx, userID, sessionID, msg, agent.RunConfig{}) {
    // Process results from the graph
}
Enter fullscreen mode Exit fullscreen mode

runner.Run() returns a range-over-function iterator. You loop over events as each agent in the graph completes. The final event contains the Mediator's decision: should we speak, and if so, what do we say?

The whole thing — 9 agents, graph wiring, ADK runner — is about 600 lines of Go.

I keep coming back to that whiteboard moment. I started with "what does a colleague do?" and ended with 600 lines of Go that can see your screen, sense your mood, and decide to stay quiet. It's not a colleague. It's not even close. But when the CelebrationTrigger fired for the first time on a passing test and the graph produced a little "nice work" — alone at my desk, at 11pm — it felt like someone noticed.

Not bad for 600 lines.

Building VibeCat for the Gemini Live Agent Challenge. Source: github.com/Two-Weeks-Team/vibeCat

Top comments (0)