Elizabeth Fuentes L for AWS

Posted on Jun 12 • Edited on Jul 1

AWS Agent Toolkit: Stop Your Coding Agent Hallucinating APIs

#ai #beginners #aws #tutorial

AI coding agent hallucinates AWS APIs because it's guessing from training data frozen in the past.

The Agent Toolkit for AWS fixes the source of truth: it gives any MCP-compatible agent live AWS docs, tested skills, and guardrails. Here's the before/after, and how to install it in one command.

Ask a coding agent to "set up an S3 bucket with sensible security defaults" and watch what happens.

It writes a bucket policy from memory. The policy references an API parameter that was renamed two releases ago. The deploy fails. The agent retries with a slightly different guess. That fails too. Three iterations later you have a bucket that technically exists, public access block half-configured, and a transcript that burned a few thousand tokens getting there.

AI coding agents don't fail loudly when they touch AWS. They fail plausibly. The code looks right, the service names are real, and the mistake only surfaces at deploy time, or worse, at security-review time.

Why do AI coding assistants hallucinate when writing AWS code?

Because the model is guessing from training data that's frozen in the past. AWS shipped new services and changed API surfaces after that cutoff, so the agent reaches for what it remembers, not what's true today. It doesn't know what it doesn't know, and it has no way to check before it writes.

What is the Agent Toolkit for AWS?

The Agent Toolkit for AWS is an official, AWS-supported toolkit that gives AI coding agents the tools, knowledge, and guardrails they need to build, deploy, and manage applications on AWS. The AWS MCP Server underneath it reached general availability on May 6, 2026. It's open source (Apache-2.0).

It has four components:

AWS MCP Server: a managed Model Context Protocol server. One endpoint with access to 15,000+ AWS API operations (via the call_aws tool, using your IAM credentials), plus sandboxed Python script execution and documentation search that needs no authentication.
Agent skills: curated packages of instructions, scripts, and reference material the agent loads on demand. The agent retrieves only what's relevant to the current task, so it doesn't burn context. Think "the tested procedure for setting up X," not a generic guess.
Plugins: single-install packages for Claude Code and Codex that bundle the MCP Server config plus a curated set of skills. aws-core is the one to start with.
Rules files: project-level config that tells the agent how to work in your project. Use the MCP Server, discover skills, search the docs before acting.

Why not just let the agent call AWS directly?

Because "directly" means "from memory." The MCP Server changes the source of truth from the model's training data to AWS's live documentation and APIs.

Two things matter here:

Documentation search needs no credentials. The agent can look up the current way to do something before it writes a line of code. No AWS account required for that part.
Script execution is sandboxed. When the agent runs Python against AWS, it runs isolated from your local filesystem and network, and every call is logged to CloudTrail with metrics in CloudWatch.

That second point is the part teams sleep on. The MCP Server adds two condition keys to every request, aws:ViaAWSMCPService and aws:CalledViaAWSMCP, so your IAM policies can tell an agent action apart from a human one. You can keep an agent read-only even when the underlying role allows writes. The agent gets capability; you keep control.

Before and after

Same prompt, same model. The only variable is the Toolkit.

	Agent alone	Agent + Toolkit
Source of truth	Training data (frozen)	Live AWS docs + APIs
Deprecated services	Picks them silently	Skills steer to current ones
Failed deploys	Retry, guess, retry	Validates against real docs first
Audit trail	None	CloudTrail + CloudWatch
Token cost	Burned on retries	Spent once, correctly

AWS frames the payoff as agents that build "with fewer errors, lower token costs, and enterprise-grade security controls." The mechanism behind that is the table above: the agent stops improvising from stale memory and starts acting on current docs and tested procedures.

Get it running in your agent

You need uv installed (that's the uvx command below) and, for anything that actually calls AWS, local AWS credentials. Documentation search and skill discovery work without credentials.

Claude Code. The claude-plugins-official marketplace ships by default, so a single command installs it:

plugin install aws-core

If it says "Plugin not found," refresh the marketplace first with /plugin marketplace update claude-plugins-official, then install with the explicit name aws-core@claude-plugins-official.

There are two more plugins worth knowing: aws-agents (building agents with Bedrock and AgentCore) and aws-data-analytics (S3 Tables, Glue, Athena). Start with aws-core.

Codex:

codex plugin marketplace add aws/agent-toolkit-for-aws

Then launch Codex and run /plugins to install aws-core.

Kiro (or any MCP-compatible agent). Add the server to .kiro/settings/mcp.json. Pin the version for reproducibility and supply-chain safety:

{
  "mcpServers": {
    "aws": {
      "command": "uvx",
      "args": [
        "mcp-proxy-for-aws@1.6.0",
        "https://aws-mcp.us-east-1.api.aws/mcp",
        "--metadata", "AWS_REGION=us-west-2"
      ]
    }
  }
}

And add the skills:

npx skills add aws/agent-toolkit-for-aws/skills

Cursor: Settings → Plugins → Team Marketplaces → Add Marketplace → Import from Repo, pointing at aws/agent-toolkit-for-aws.

It works with any MCP-compatible agent, and if you're building autonomous agents with frameworks like Strands, LangChain, or Bedrock AgentCore, the same MCP Server is the AWS interface you want underneath them.

Try the S3 prompt again

I installed aws-core and re-ran the exact same prompt. This time the agent searched the current docs, pulled the tested procedure from a skill, and the public access block was configured correctly on the first pass. The deprecated parameter never showed up, because the agent wasn't guessing. It was reading.

That's the whole shift: stop your agent from guessing at AWS, and let it read.

It's available at no additional charge. You only pay for the AWS resources you actually use.

This walkthrough uses the Agent Toolkit for AWS, but the underlying idea (give the agent a live source of truth and tested procedures instead of frozen training data) is a general agent pattern that carries over to other clouds and agent frameworks.

FAQ

What are agent skills in the Agent Toolkit for AWS?
Skills are curated packages of instructions, scripts, and reference material that an agent retrieves on demand. Instead of guessing a procedure, the agent pulls a tested one (for example, the validated steps to lock down an S3 bucket) at the moment it needs it.

Do I need an AWS account to use it?
Not for everything. Documentation search and skill discovery work with no credentials. You only need local AWS credentials when the agent makes real API calls or runs scripts against your account.

Which coding agents does it support?
Claude Code, Codex, and Cursor install the plugins directly. Kiro and any other MCP-compatible agent can add the AWS MCP Server via config. If you build autonomous agents with frameworks like Strands, LangChain, or Bedrock AgentCore, the same MCP Server is the AWS interface underneath them.

How is this different from letting the agent call the AWS CLI?
The CLI runs whatever the agent guessed. The Toolkit changes the source of truth first: the agent checks live docs and tested skills before acting, runs scripts in a sandbox, and logs every call to CloudTrail with metrics in CloudWatch.

How much does it cost?
The Toolkit is available at no additional charge. You only pay for the AWS resources the agent actually creates or uses.

Which AWS workflow does your coding agent get wrong most often? Tell me in the comments. I want to see if the Toolkit fixes it.

Resources

Gracias!

🇻🇪🇨🇱 Dev.to Linkedin GitHub Twitter Instagram Youtube

Elizabeth Fuentes L

I help developers build production-ready AI applications through hands-on tutorials and open-source projects.

Top comments (4)

Mehmet Can Farsak • Jun 13

The hallucination problem goes beyond wrong APIs — agents also hallucinate about when to act vs when to think. You'll ask for brainstorming and get a file diff. Built Brainstorm-Mode (mehmetcanfarsak on GitHub) which adds PreToolUse hooks as a guardrail layer: when in brainstorming mode, it intercepts tool calls so the agent actually stays in ideation. Same principle as the MCP Server guardrails, but for the thought-process side of agents.