DEV Community

POTHURAJU JAYAKRISHNA YADAV
POTHURAJU JAYAKRISHNA YADAV

Posted on

I Replaced 47 DevOps Scripts With One AI Agent — Here’s What Happened

The Hook: I Was Wrong About Automation

I thought I automated DevOps.

I had 47 deployment scripts.

Then I started replacing them with an AI agent —
and most scripts became unnecessary.

Not by following instructions.
By making decisions.

And the 2 AM debugging stopped.


Note: The code shown here is simplified for clarity.
The GitHub repo contains a more modular implementation.

That's when I realized:

I wasn't automating.
I was hardcoding decisions.


I gave Claude a list of AWS tools and one instruction: "Deploy this app."

No hardcoded logic.
No decision trees.

Just: describe the goal, Claude figures out how.

Deployment time: 3 hours → in minutes for most cases.

(Built this while deploying real workloads on AWS: Docker, ECS, EC2, IAM.)

🎯 Is This For You?

This post is for:

  • DevOps engineers with 10+ deployment scripts
  • Platform engineers building internal developer platforms
  • Anyone exploring AI agents beyond chatbots
  • Teams on AWS (Docker, EC2, ECS, IAM)

If manual deployment takes > 30 minutes, read this.


✨ What's Possible

Old way (manual):

aws ecs create-cluster --cluster-name prod
aws ecs register-task-definition --family myapp ...
# [50+ commands, 3 hours, manual debugging]
Enter fullscreen mode Exit fullscreen mode

New way (agent):

agent.run("Deploy FastAPI with PostgreSQL, auto-scale to 20, "
          "minimal IAM perms, CloudWatch monitoring")

# ✅ Done in minutes for most cases, with significantly reduced debugging
Enter fullscreen mode Exit fullscreen mode

No scripts.

Just Claude thinking out loud about your infrastructure.


🚀 How It Actually Works

You describe the goal (natural language):

"Deploy my app on 5 ECS tasks, auto-scale to 20 on high CPU"
Enter fullscreen mode Exit fullscreen mode

Claude gets these tools:

dispatcher = {
    "ecs__create_cluster": ecs.create_cluster,
    "ecs__register_task": ecs.register_task_definition,
    "ecs__create_service": ecs.create_service,
    "iam__create_role": iam.create_role,  # For permissions
}
Enter fullscreen mode Exit fullscreen mode

Claude reasons autonomously:

"User wants ECS deployment.
 I need to:
 1. Check if cluster exists
 2. Register task definition
 3. Create service with 5 tasks
 4. Setup auto-scaling
 5. Verify it's running"
Enter fullscreen mode Exit fullscreen mode

It executes:

  1. Call ecs__create_cluster() → Get cluster ARN
  2. Call ecs__register_task() → Get task definition
  3. Call ecs__create_service() → Get running tasks
  4. Call ecs__setup_autoscaling() → Confirmed
  5. Return: "All 5 tasks running, auto-scaling 5-20"

** Minimal manual intervention in most cases. Just reasoning + execution + feedback loops.**


⚠️ The Problem I Started With

I had:

  • deploy_docker.py — 150 lines of Docker logic
  • deploy_ec2.py — 200 lines of EC2 logic
  • deploy_ecs.py — 300 lines of ECS logic
  • 3 routers trying to chain them together
  • 0 way to handle "deploy to both EC2 AND ECS"

Every new service = new script. Every new workflow = rewrite everything.


Solution: Reduce rigid scripts — let the agent handle orchestration logic dynamically.

Instead of routing to specific tools, give Claude all available tools and let it decide which ones to use, in which order, adapting as it goes.

# All tools in one place
tools = {
    "docker__run": run_container,
    "ec2__create": create_instance,
    "ecs__deploy": create_service,
    "iam__create_role": create_role,
    # ... more tools
}

# Let Claude orchestrate
agent.run("Deploy with Docker locally, then scale to ECS production")
# Claude figures out: Docker first, then ECS, then IAM for permissions
Enter fullscreen mode Exit fullscreen mode

🏗️ The Architecture (3 Layers)

┌──────────────────────────────┐
│ User Goal (natural language) │  
│ "Deploy scalable production  │
│  stack with auto-scaling"    │
└──────────┬───────────────────┘
           │
           ▼
    ┌─────────────────────────────────────────────┐
    │  Claude (Bedrock)                           │
    │  ← Reads goal + available tools             │
    │  ← Decides sequence of actions              │
    │  ← Adapts when things fail                  │
    └──────────────┬────────────────────────────┘
                   │
    ┌──────────────┴──────────────┐
    │                             │
    ▼                             ▼
┌─────────────┐  ┌──────────────────┐
│ AWS APIs    │  │ Conversation     │
│ (via boto3) │  │ Memory (DynamoDB)│
│ ← Executes  │  │ ← Recalls setup  │
│   decisions │  │   from last week │
└─────────────┘  └──────────────────┘
    │                    │
    └────────┬───────────┘
             │
    ┌────────▼─────────┐
    │ Actual Resources │
    │ EC2, ECS, Docker │
    │      IAM, etc    │
    └──────────────────┘
Enter fullscreen mode Exit fullscreen mode

3 moving parts:

  1. Claude — Reasons about the task
  2. Tools — Execute AWS API calls
  3. Memory — Remember past deployments for coherence

🔧 The Setup (Code Foundation)

Here's the entire base system in ~100 lines. Full code on GitHub.

import boto3
from agents.memory import load_history, save_message

MODEL = "apac.amazon.nova-lite-v1:0"  # Fast, cheap 

class BaseAgent:
    """Foundation for all agents (Docker, EC2, ECS, IAM)"""
    AGENT_KEY = None  # "docker", "ec2", etc.
    SYSTEM_PROMPT = ""  # Agent's personality
    CAPABILITIES = []  # What this agent can do

    def __init__(self, name: str, region: str = "us-east-1"):
        self.name = name
        self.session = boto3.Session(region_name=region)
        self._bedrock = None

    @property
    def bedrock(self):
        """Lazy load Bedrock connection (only when needed)"""
        if not self._bedrock:
            self._bedrock = self.session.client("bedrock-runtime")
        return self._bedrock

    def run(self, task: str, session_id: str = None) -> dict:
        """The agentic loop: think → decide → act → repeat"""
        tools = self.get_tools()
        dispatcher = self.get_dispatcher()
        messages = load_history(session_id) + [{"role": "user", "content": [{"text": task}]}]

        for _ in range(10):  # Max 10 iterations to prevent infinite loops
            # Ask Claude what to do
            response = self.bedrock.converse(
                modelId=MODEL,
                system=self.SYSTEM_PROMPT,
                messages=messages,
                tools=tools,
                toolUseDepth=2,
            )

            # Is Claude done?
            if response["stopReason"] == "endTurn":
                return {"status": "SUCCESS", "message": response["content"]}

            # Execute tools Claude wants to call
            tool_results = []
            for tool_use in response["content"]:
                tool_name = tool_use["toolUse"]["name"]
                try:
                    result = dispatcher<a href="tool_use["toolUse"]["input"]">tool_name</a>
                except Exception as e:
                    result = {"error": str(e), "status": "failed"}

                tool_results.append({"toolUseId": tool_use["toolUse"]["toolUseId"],
                                    "content": [{"json": result}]})

            # Add Claude's decision + results to conversation
            messages.append({"role": "assistant", "content": response["content"]})
            messages.append({"role": "user", "content": [{"toolResult": tr} for tr in tool_results]})

            # Save for future reference
            if session_id:
                save_message(session_id, messages[-1])

        return {"status": "FAILED", "reason": "Max iterations reached"}
Enter fullscreen mode Exit fullscreen mode

What's happening:

  • Lazy loading (@property bedrock): Don't connect to Claude until needed
  • Tool loop: Call tools, capture results, show Claude the output
  • Error handling: Don't crash—tell Claude what failed, let it adapt
  • Memory: Save each turn so Claude remembers next week

🚀 Agents in Action: 3 Real Examples

DockerAgent: Deploy Locally

class DockerAgent(BaseAgent):
    AGENT_KEY = "docker"
    SYSTEM_PROMPT = """You manage Docker containers.
Rules: Pull image first, check if container exists, use sensible defaults."""

    CAPABILITIES = [
        {"name": "list_containers", "description": "List running containers"},
        {"name": "run_container", "description": "Pull and run image"},
        {"name": "stop_container", "description": "Stop a container"},
    ]
Enter fullscreen mode Exit fullscreen mode

Usage:

docker_agent = DockerAgent()
docker_agent.run("Deploy FastAPI on port 8000 with health check")
# Claude calls: list_containers → run_container → confirms it's running
Enter fullscreen mode Exit fullscreen mode

EC2Agent: Scale to Cloud

class EC2Agent(BaseAgent):
    AGENT_KEY = "ec2"
    SYSTEM_PROMPT = """You manage EC2 instances.
Rules: Tag for organization, verify security groups, create new→test→retire old."""

    CAPABILITIES = [
        {"name": "describe_instances", "description": "List instances"},
        {"name": "create_instance", "description": "Launch new instance"},
        {"name": "stop_instance", "description": "Stop instance"},
    ]
Enter fullscreen mode Exit fullscreen mode

Usage:

ec2_agent = EC2Agent()
ec2_agent.run("Create 2 t2.micro instances, tag as app-server")
# Claude: checks if instances exist → creates 2 → returns IPs → confirms running
Enter fullscreen mode Exit fullscreen mode

ECSAgent: Production Auto-Scaling

class ECSAgent(BaseAgent):
    AGENT_KEY = "ecs"
    SYSTEM_PROMPT = """You manage ECS services at production scale.
Rules: Task definition first, use Fargate, always configure auto-scaling."""

    CAPABILITIES = [
        {"name": "create_cluster", "description": "Create ECS cluster"},
        {"name": "register_task_definition", "description": "Register blueprint"},
        {"name": "create_service", "description": "Deploy service"},
        {"name": "setup_autoscaling", "description": "Configure scaling rules"},
    ]
Enter fullscreen mode Exit fullscreen mode

Usage:

ecs_agent = ECSAgent()
ecs_agent.run("Deploy with 5 tasks, auto-scale to 20 on high CPU, health monitoring")
# Claude orchestrates: cluster → task definition → service → autoscaling → verification
Enter fullscreen mode Exit fullscreen mode

Real Usage: From Local to Production

Scenario: Deploy FastAPI from laptop to production in minutes for most cases, with significantly reduced debugging.

Stage 1: Test locally

docker_agent.run("Run myapp:latest on port 8000", session_id="prod_001")
# ✅ Docker container running
Enter fullscreen mode Exit fullscreen mode

Stage 2: Scale to cloud

ec2_agent.run("Create 2 instances for load balancing", session_id="prod_001")
# ✅ 2 EC2 instances up (same session = Claude remembers port 8000)
Enter fullscreen mode Exit fullscreen mode

Stage 3: Auto-scaling production

ecs_agent.run("Deploy with 5-20 auto-scaling, CloudWatch monitoring", session_id="prod_001")
# ✅ 5 ECS tasks running, auto-scaling 5-20 based on CPU
Enter fullscreen mode Exit fullscreen mode

All with the same session_id: Claude remembers the image name, port, and configuration from Stage 1. Everything connects seamlessly.


📚 Memory: Agents Remember

# Day 1
agent.run("Deploy on Docker with port 8000", session_id="user_123")

# Week later
agent.run("Move this to ECS for production", session_id="user_123")
# Claude reads history from DynamoDB:
# "I remember this app used port 8000. I'll keep that for ECS too."
Enter fullscreen mode Exit fullscreen mode

DynamoDB stores every conversation turn. Claude recalls past decisions and uses them for coherence.


🛡️ Production-Ready Code Patterns

This is what the user feedback emphasized. Real code needs safety.

Pattern 1: Error Handling (Don't Crash, Recover)

# ❌ Wrong: crashes on error
result = dispatcher[tool_name](tool_input)

# ✅ Right: tell Claude about the error
try:
    result = dispatcher[tool_name](tool_input)
except Exception as e:
    result = {
        "error": str(e),
        "status": "failed",
        "request_id": request_id,  # For debugging
    }
# Claude sees the error and tries a different approach
Enter fullscreen mode Exit fullscreen mode

Why this matters: If Docker pull fails, don't give up. Tell Claude: "Pull failed, but let me check if the image is locally cached." Claude adapts.


Pattern 2: Timeouts (Prevent Hanging)

from functools import wraps
import signal

def with_timeout(seconds=30):
    """Prevent tools from running forever"""
    def decorator(func):
        def wrapper(*args, **kwargs):
            def handler(signum, frame):
                raise TimeoutError(f"Tool execution exceeded {seconds}s")

            signal.signal(signal.SIGALRM, handler)
            signal.alarm(seconds)
            try:
                result = func(*args, **kwargs)
                signal.alarm(0)  # Cancel timeout
                return result
            except TimeoutError as e:
                return {"error": str(e), "status": "timeout"}
        return wrapper
    return decorator

@with_timeout(seconds=60)
def create_instance(params):
    """If EC2 creation hangs, timeout after 60 seconds"""
    ec2 = boto3.client("ec2")
    return ec2.create_instances(**params)
Enter fullscreen mode Exit fullscreen mode

Pattern 3: Input Validation (Prevent Bad Requests)

def create_instance(params):
    """Validate before executing"""
    # Check required fields
    required = ["ImageId", "InstanceType"]
    for field in required:
        if field not in params:
            return {"error": f"Missing required field: {field}", "status": "failed"}

    # Validate instance type
    valid_types = ["t2.micro", "t2.small", "t3.medium"]
    if params["InstanceType"] not in valid_types:
        return {
            "error": f"InstanceType must be one of {valid_types}",
            "status": "failed"
        }

    # Now safe to execute
    ec2 = boto3.client("ec2")
    return ec2.create_instances(**params)
Enter fullscreen mode Exit fullscreen mode

Pattern 4: Audit Logging (Proof for Compliance)

import json
from datetime import datetime

def log_action(request_id: str, user: str, action: str, params: dict, result: dict):
    """Log everything for audits"""
    log_entry = {
        "request_id": request_id,
        "timestamp": datetime.utcnow().isoformat(),
        "user": user,
        "action": action,
        "params": json.dumps(params),  # What was requested
        "result_status": result.get("status"),
        "result_code": result.get("error"),
    }

    # Save to CloudWatch Logs or DynamoDB
    dynamodb.put_item(TableName="agent_audit", Item=log_entry)

# In the tool dispatcher:
result = dispatcher[tool_name](tool_input)
log_action(request_id, user_id, tool_name, tool_input, result)
Enter fullscreen mode Exit fullscreen mode

Pattern 5: Cost Guards (Don't Deploy Expensive Mistakes)

MONTHLY_BUDGET = 1000  # $1000/month limit

def estimate_cost(action: str, params: dict) -> float:
    """Estimate AWS cost before executing"""
    if action == "create_instance":
        instance_type = params.get("InstanceType", "t2.micro")
        hourly_rates = {
            "t2.micro": 0.012,
            "t2.small": 0.023,
            "t3.medium": 0.042,
        }
        months_running = 1
        return hourly_rates.get(instance_type, 0) * 730 * months_running

    elif action == "create_rds":
        # RDS: ~$400/month for 20GB
        return 400

    return 0

# In the runnable:
estimated = estimate_cost(tool_name, tool_input)
if estimated > MONTHLY_BUDGET:
    return {
        "error": f"Cost ${estimated:.2f} exceeds budget ${MONTHLY_BUDGET}",
        "status": "rejected",
    }
# Only execute if under budget
result = dispatcher[tool_name](tool_input)
Enter fullscreen mode Exit fullscreen mode

Pattern 6: Role-Based Access (Security Boundaries)

# Define who can do what
USER_PERMISSIONS = {
    "admin": ["create_instance", "delete_instance", "create_role", "delete_role"],
    "developer": ["create_instance", "stop_instance", "deploy_to_ecs"],
    "readonly": ["describe_instances", "list_containers", "get_logs"],
}

def check_permission(user_role: str, action: str) -> bool:
    """Prevent unauthorized actions"""
    allowed = USER_PERMISSIONS.get(user_role, [])
    if action not in allowed:
        return False
    return True

# Before executing:
if not check_permission(user_role, tool_name):
    return {
        "error": f"User role '{user_role}' cannot perform '{tool_name}'",
        "status": "permission_denied",
    }
result = dispatcher[tool_name](tool_input)
Enter fullscreen mode Exit fullscreen mode

✅ Why Agents > Scripts

Scripts Agents
New workflow Rewrite code Claude adapts
Error recovery Crashes Tries alternatives
Reasoning None (hardcoded) Full decision log
Maintenance Grows linearly One framework
Learning Manual proof Automatic audit trail

🎯 Getting Started

Just 3 commands:

git clone https://github.com/jayakrishnayadav24/ai-agents
cd ai-agents
pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

Then try:

from agents.docker_agent import DockerAgent

agent = DockerAgent()
response = agent.run(
    "Deploy FastAPI app on port 8000 with health check"
)
print(response)
Enter fullscreen mode Exit fullscreen mode

That's it. Claude handles the rest.


📚 What's Next (Part 2)

This was the concept. Github:

  • ✅ Full working code (all agents)
  • ✅ DynamoDB setup for memory
  • ✅ Deploying agents to Lambda
  • ✅ Real production patterns (request tracing, cost estimation)
  • ✅ Demo with actual deployments

⚠️ Where This Breaks

  • Complex compliance environments (manual approvals still needed)
  • Cost estimation is approximate
  • Requires well-defined tools (garbage in → garbage out)

The Bottom Line

Before: 47 scripts, 3 hours/deployment, constant debugging

After: 1 agent framework, in minutes for most cases, with significantly reduced debugging/deployment.

Why? Because Claude doesn't follow scripts. Claude plans and executes the steps based on the goal.

It adapts. It learns. It remembers.

That's not automation anymore. That's the future of infrastructure.


How many deployment scripts are you still maintaining?

10+? 20+? 50+?

I want to see how bad this problem is. Drop a comment 👇

Your feedback shapes Part 2.

Top comments (0)