Dinindu Suriyamudali

Posted on Sep 2

When Your AI Agent Needs to be a Scalpel, Not a Sledgehammer

#genai #claude #agentcore #aws

The Problem: When "Minimal Fix" Means "Complete Refactor"

In my previous blog, I shared how I built a cloud engineer agent using AWS Strands that could automatically detect CloudWatch log errors and raise PRs with potential fixes. While the concept worked, I encountered a critical issue: the agent consistently made drastic changes when simple fixes were needed.

Picture this scenario: Your Lambda function fails because it's missing a single IAM permission. The fix? Add one line to your IAM policy. But instead, your agent decides to refactor your entire infrastructure, reorganise your code structure, and "improve" things you never asked it to touch.

Despite countless iterations of my system prompt, emphasising:

Apply ONLY the specific fix needed
No broader improvements beyond fixing the specific error
Your role is automated incident response with minimal, targeted fixes only

AWS Strands kept making those broad, sweeping changes. The surgical precision I needed just wasn't there.

💡 The Solution: Testing Claude Code SDK with Bedrock AgentCore

Frustrated with this limitation, I decided to test the Claude Code SDK integrated with Bedrock AgentCore. The results were immediately promising. It successfully applied the precise, surgical fixes I was looking for.

But I wanted to be thorough. So I built a comprehensive POC with three different setups to test various combinations of how Claude Code can work with Bedrock AgentCore:

Running the Claude Code command-line interface directly within the Bedrock AgentCore framework.
Integrating the Claude Code SDK directly into Bedrock AgentCore for more programmatic control.
Using the Claude Code SDK as a specialised tool while keeping AWS Strands as the orchestrator agent.

The Architectural Differences That Matter

AWS Strands: The Over-Eager Optimiser

Strands seemed to treat every incident as an opportunity for comprehensive optimisation. Even with explicit constraints in the prompt, it would:

Refactor code beyond the error scope
Suggest architectural improvements
Make style and formatting changes
Add "helpful" features nobody requested

Claude Code SDK + Bedrock AgentCore: The Surgical Specialist

The Claude Code SDK approach demonstrated much better constraint adherence:

Identified the exact root cause
Applied minimal viable fixes
Respected the existing codebase structure
Maintained focus on incident response, not improvement

Real-World Example: The IAM Permission Fix

The Scenario: Lambda function fails with "Access Denied" error when trying to publish a message to SQS.
The Required Fix: Grant SNS publish permission to the Lambda function.

😅 AWS Strands Response:

"Oh, you have a permission issue? Let me fix EVERYTHING!"

Claude Code SDK Response:

Key Learnings: Why This Matters for Production Systems

1. Blast Radius Control
In production incident response, you want the smallest possible change to restore service. Every additional modification increases risk and potential for new issues.

2. Audit and Compliance
When changes are minimal and targeted, they're easier to:

Review and approve
Audit for compliance
Roll back if needed
Understand the impact

3. Team Confidence
Teams are more likely to trust and adopt automated incident response when they know it won't make unexpected changes to their carefully crafted systems.

Implementation Insights

What Worked Well with Claude Code SDK

Better constraint understanding: Claude Code SDK consistently respected boundaries
Contextual awareness: Better at understanding what NOT to change
Debugging precision: More accurate root cause identification

Challenges with AWS Strands

Scope creep tendency: Natural inclination to "improve" beyond requirements
Prompt instruction drift: Difficulty maintaining a narrow focus
Context bleeding: Unable to separate "fixing" from "improving"

Conclusion: The Right Tool for the Right Job

While AWS Strands excels at comprehensive analysis and broad improvements, the Claude Code SDK with Bedrock AgentCore proved superior for surgical and incident-response scenarios where precision and constraint adherence are paramount.
The key insight? Not all AI agents are created equal for every task. Sometimes you need a scalpel, not a sledgehammer.

What's your experience with AI agent precision in production environments? Have you found similar challenges with scope creep in automated systems? Share your thoughts in the comments below!

Resources

GitHub Repository
Claude Code SDK Documentation
AWS Bedrock AgentCore

DEV Community