Ricardo Sueiras for AWS

Posted on Jan 9

Experimenting with agentic player/coach workflows in Kiro CLI subagents

#kiro #aws

Over the Christmas period I spent some of my down time catching up on blog posts and open source projects that I had bookmarked to "read later". I have been interested in learning about opinionated workflows that use agentic tools like Kiro CLI to generate better output. One of those projects looked at how you can improve the output from agentic tools through something called player/coach (I think it might be similar to the actor/critic method that I used back in the reinforcement learning days of Deep Racer). The basic gist of it is this: you generate output from one agent (the player), which is then assessed and reviewed by another (the coach). The coach provides feedback back to the player based on the output generated by the player, and thus a (hopefully) virtuous circle is formed.

I am firm believer that the journey is more interesting than the destination, and I was sure I would encounter rough edges that would help me learn more about this area. As it turns out, I was looking for an opportunity to try out Kiro CLI's new subagent capability that allows Kiro to orchestrate subagents from your Kiro session. This blog post is a write up of how that went, what I learned and some of the sharp edges that I am still trying to figure out.

At the end of the post I share the code and resources so you can experiment for yourself. My hope is that some of you will be sufficiently interested to try and experiment for yourself and see what kind of workflows you might want to create for your use cases.

The plan

So my plan was to create three custom agents in Kiro:

the orchestrator - this would be where I start, and would be where I fire off tasks that I want done
the player - this would be a subagent called by the orchestrator that does all the work
the coach - this would be a subagent called by the player to review the work

Each of these would have their own specific system prompt and context optimised for the work they were doing. Specifically,

evaluation context - I would provide specific evaluation criteria ONLY to the coach subagent
learning context - I would provide a shared file that all the subagents would use to make sure that shared learnings would be available within their shared context
models - I would use different models between the player and coach subagents

I started by creating a new project workspace and the custom agents in Kiro CLI, each with their own system prompt.

.kiro
├── agents
│   ├── coach.json
│   ├── orch.json
│   └── player.json
└── shared
    ├── COACH.md
    ├── ORCH.md
    └── PLAYER.md

Each custom agent had its own system prompt (.kiro/shared), where I tuned the behaviour based on the role needed for this workflow. As I experimented with this approach, I did have to make changes. I don't they they are perfect, but I did not want to over-engineer this for this initial attempt.

Once I had this setup, I started the orchestrator using "kiro-cli --agent orch" which took me to Kiro CLIs prompt and from there I just could ask it to do anything and the workflow would kick in.

I spent around 2-3 hours setting this up and then probably another 2-3 hours testing and trying it out on different requests (all code related). I am going to see how this works on non code tasks too and see if I get a feel for use cases where this approach might be optimised. No answers or intuition yet.

Mind the Gap

Some of the sharp edges that I saw whilst I was this approach were:

"hanging" - frequently the coach or the player would start a server/service, and progress would stall - it was waiting for something that was never coming. I have actually seen this quite a bit in the agentic tools I use, and typically this needs a manual intervention. When this is happening in a subagent, typically a CTRL+C will be enough to break out of the loop. What I did find though is that sometimes it will leave those services running in the background, and so as it then goes to retry it gets port conflicts. It is smart enough to work around this and move ports though. I was able to mitigate and improve this with better steering documents to provide specific guidance
creating files in unexpected places - another issue that came up was that occasionally my workflow would not be followed. For example, I asked for all updates to happen to a file within a specific directory but they would be created in different locations.
one-shot execution - on a few occasions the workflow decided that rather than break down the request into a series of tasks, it was going to do it as a one-shot execution.
quality of output - I didn't spend enough time improving the coach context files so aside from picking up on a few tasks that had not been completed correctly, I am not entire sure how to evaluate whether the coach was improving the output or not. I have some ideas on how I could do this (for example, creating a baseline subagent that does not use the coach and then comparing the output between the two)
cost/efficiency - an iteration of this workflow, from request to completed task consumed 4.3 credits in Kiro. When I tried to do this one shot using standard vibe coding it was less than 1 credit, so I am not sure how cost effective this approach is.
subagent configuration - in the early versions of my subagents, I did not configure the tools permissions correctly and so I was forever being prompted. Once I resolved this it was ok, but it did then lead me to discover a current constraint of subagent tool calling - currently on a subset of tools can be used by subagents automatically (which you can read about here). This means that currently those subagents can't explore MCP tools or make web calls, but I suspect this will change over time (especially as the Kiro CLI team are on fire and released updates faster than I can keep up).

Improving this setup

As I was working through some examples, I did see some material improvements as I made some changes. With approaches like this, it does take some time and experimentation to see you can you can affect the output. Some of the things that had a big impact on the workflow were things like:

Evaluation - I am confident that going deeper with more specific (both deep and breadth) criteria would generate more meaningful review. At the moment the setup feels more like player/review than player/coach. I have seen some significant improvements when I have added items to the EVAL context, for example "After reviewing the submission from the player, review against Python PEP-8 and suggest the most impactful improvement they should make".
Steering and Context - I had some glitches during my experiments due to either contradicting or lack of context. I did resolves these by tuning the custom agent resource configurations, which allow you to precisely configure what you use as context. This had the biggest impact on changing the behaviour of the player, coach and orchestrator agents.
Creating a baseline - as mentioned above, understanding whether this approach generates better output is currently lacking in the approach I started with. I have started experimenting with ideas such as creating a baseline output (player, without coach), but I am going to look at other mechanisms for understand and evaluating the quality of the output.

It is also fair to say that I was just using the Kiro CLI tool "as is", operating within its capabilities. Some of the things that came out of the journey in creating this was that perhaps creating my own tool using something like Strands Agents might be a way to have more control and flexibility in how this workflow might work. Something I will be looking into as it happens, so keep your eyes posted for that one.

Conclusion

Whether the output generated using a player coach model is better than a more traditional linear approach is hard for me to tell. I started off with trying to explore and understand this model better, and after just a few hours work I feel that I have some new ideas and approaches that might be useful going forward. I think that is the key thing for me at the moment, where agentic AI is still so new, exploring new ideas and approaches might sometimes lead to great outcomes. I didn't get a major aha moment today, but I still learned something and I am happy with that.

I have shared the code in this GitHub repo where you can try this out for yourself. I have also put together a short video of this in action which you can see here.

In this player/coach workflow I asked it to create me a simple application.

Get started today

You can get started with Kiro CLI today for free. Download it from this link. I have created a workshop if you are new to Kiro CLI and the Kiro CLI workshop will walk you through getting started with the terminal based Kiro CLI tool, and provides a comprehensive overview as well as advanced topics.

Finally, if you did find this post interesting, helpful, or useful, I would love to get your feedback. Please use this 30 second feedback form and I will be forever grateful.

Made with ♥ from DevRel

DEV Community