DEV Community

Cover image for I Switched to the Agent Toolkit for AWS. Here's Why.

I Switched to the Agent Toolkit for AWS. Here's Why.

Rohini Gaonkar on June 12, 2026

I've been using AI coding agents like Kiro, Claude Code, with AWS for a while now. To connect them to my AWS account, I was running the community M...
Collapse
 
itskondrat profile image
Mykola Kondratiuk

zero audit trail is the real trap. lost visibility into what my agents actually did once and triaging anything after that was a nightmare. does the toolkit give per-action logs or just session-level?

Collapse
 
rohini_gaonkar profile image
Rohini Gaonkar AWS

Great question, and I feel that pain. Triaging after the fact with zero visibility is brutal.

The short answer: per-action.

Every individual API call the agent makes through the managed MCP Server shows up as its own CloudTrail event. Each s3:ListBuckets, each lambda:GetFunction, each one gets its own event record with the full detail you'd expect from CloudTrail (timestamp, parameters, source IP, etc.).

The key fingerprint is invokedBy: aws-mcp.amazonaws.com on every MCP-initiated event. So you can filter CloudTrail specifically for agent actions vs. your own CLI calls. So the triage path would be to filter by invokedBy field and you get the full ordered sequence of everything the agent touched.

CloudWatch metrics are there too for aggregate patterns (request counts, errors, latency). But for the "what exactly did my agent do at 2:47pm that broke everything" question, CloudTrail per-action events are what you want.

Collapse
 
itskondrat profile image
Mykola Kondratiuk

per-action via CloudTrail is clean if you're AWS-native - you get the full trace without custom instrumentation. the part that breaks down is correlation: knowing s3:ListBuckets and lambda:GetFunction happened in the same agent task means you need to tag session context upstream, otherwise you're still piecing it together manually after the fact. that correlation tagging is what finally made it legible for us.

Thread Thread
 
rohini_gaonkar profile image
Rohini Gaonkar AWS

You're right, and thanks for naming the actual gap. Per-action events are there, but correlation across a single task is not built-in.

Here's what I confirmed: the MCP Server does track sessions internally (there are UserSessionCount metrics in CloudWatch under AWS/Usage), but that session context doesn't propagate into the downstream CloudTrail events. So you get individual s3:ListBuckets and lambda:GetFunction events with invokedBy: aws-mcp.amazonaws.com, but nothing that groups them into "this was one logical agent task."

Right now, the best you get out of the box is filtering by IAM identity + invokedBy field + time window. If you only have one agent session running, that's workable. Two sessions in the same account at the same time? You're back to guessing.

The Well-Architected Agentic AI Lens actually calls this out explicitly, it recommends implementing your own correlation ID for end-to-end traceability across agent interactions. So it's recognized as a gap you fill upstream, not something the toolkit hands you today. I am going to make sure I pass on this feedback to the Agent Toolkit team.

I'd be curious what your correlation tagging approach looks like. Is it something you're injecting at the role assumption level, or higher up in the orchestration layer?

Thread Thread
 
itskondrat profile image
Mykola Kondratiuk

yeah that’s the friction point - you get event coverage but lose the task-as-unit view. feels like watching individual frames without the film. the correlation gap is what makes me skeptical of "we have full CloudTrail coverage" as an answer to agent observability.

Collapse
 
mininglamp profile image
Mininglamp

The 80/20 split between infrastructure concerns and actual agent logic is painfully accurate for anyone building agents from scratch. Timeouts, retries, state persistence, error recovery, these eat up most of the engineering effort. A dedicated toolkit handles that plumbing so you can focus on what the agent should actually do. One gap most toolkits still have is multi-agent coordination. When two agents need to hand off work to each other mid-task, you're usually back to writing custom glue code. That's where IM-based orchestration starts looking interesting, using existing messaging infrastructure as the coordination layer between agents.

Collapse
 
rohini_gaonkar profile image
Rohini Gaonkar AWS

The plumbing tax is real. That's exactly why I leaned into the toolkit.

Worth clarifying though: the Agent Toolkit for AWS is specifically the secure access layer between a coding agent and AWS (auth, sandboxing, audit trail). Multi-agent coordination is a different layer. On the AWS side, that's where Strands Agents SDK comes in, it has built-in patterns for agent handoff (Swarm, Agents-as-Tools, Graph, Workflow). Amazon Bedrock Agents also supports multi-agent collaboration natively. Different tools, different jobs.

What do you mean by "IM-based orchestration"? Is it literal messaging platforms (Slack/Teams as the coordination bus) or messaging infrastructure like queues and event buses?

Collapse
 
alexshev profile image
Alex Shev

The useful promise of an agent toolkit is not just faster scaffolding. It is safer access to the operational surface: accounts, permissions, logs, deployment state, and service-specific constraints. For cloud work, the guardrails and observability matter as much as the generation step.

Collapse
 
rohini_gaonkar profile image
Rohini Gaonkar AWS

Totally agree!!! Thank you for sharing!

Collapse
 
alexshev profile image
Alex Shev

Appreciate it, Rohini. The part I keep coming back to is that agent tooling is only useful if it makes the review path clearer too. Fast scaffolding is nice, but repeatable setup plus visible decisions is what makes the workflow trustworthy.

Collapse
 
mehmetcanfarsak profile image
Mehmet Can Farsak

Great breakdown of the Agent Toolkit — the "handing house keys to an enthusiastic intern" analogy is perfect. The same principle applies at the behavioral level: agents with no behavioral guardrails will jump to coding during what should be a brainstorming session. I built Brainstorm-Mode (mehmetcanfarsak on GitHub) as a plugin that uses hooks to enforce "ideation mode" — three modes (divergent, actionable, academic) that block tool calls until the thinking phase is done. It's like IAM condition keys but for agent behavior instead of AWS actions.

Collapse
 
rohini_gaonkar profile image
Rohini Gaonkar AWS • Edited

Have you made the switch yet? Tell me your experience, including wishlists!