Aaron VanSledright

Posted on Mar 11

I Replaced My Agent Framework With Markdown Files and 140 Lines of Python

#python #terraform #ai #agents

Every AI agent framework I tried added complexity I didn't need. LangChain, CrewAI, AutoGen — they're powerful, but for deploying a Slack bot that answers questions using a few tools, I was pulling in hundreds of dependencies to do something boto3 already handles natively.

So I built something different: a Terraform module where agent behavior lives in markdown files, tools are plain Python functions, and the entire runtime engine is ~140 lines of code with zero external dependencies.

I open-sourced it: terraform-module-markdown-agent

The Problem With Agent Frameworks

Most agent frameworks want to own your entire stack. You get:

Heavyweight dependencies — hundreds of packages for what amounts to a loop calling an LLM
Framework lock-in — custom decorators, base classes, and abstractions that couple your business logic to the framework
Deployment friction — designed for containers or servers, not serverless
Opaque behavior — hard to debug when the agent does something unexpected because the prompt is buried in framework internals

If you're running agents on AWS Lambda with Bedrock, you already have boto3. The Bedrock Converse API handles tool use natively. The framework is mostly just getting in the way.

The Core Idea: Markdown as Configuration

What if agent behavior was just a markdown file?

---
name: support-agent
version: 1.0.0
description: "Handles customer support queries"
tags: [support, customer]
---

# Support Agent

## When to Use
Activated for all customer-facing support requests.

## Process
1. Greet the customer
2. Use `search_docs` to find relevant documentation
3. If the issue requires escalation, use `create_ticket`
4. Summarize the resolution

## Guardrails
- Never share internal pricing or roadmap details
- Always confirm before creating tickets
- Keep responses under 3 paragraphs

This markdown file is the system prompt. The frontmatter provides metadata for routing. The sections give the LLM structured instructions. You can read it, diff it, review it in a PR — no code changes needed to adjust agent behavior.

How the Runtime Works

The engine is a simple loop:

Load the skill markdown file as the system prompt
Append any shared rules (company context, formatting guidelines)
Call bedrock-runtime.converse() with the user message and tool specs
If the model wants to use a tool, route it to the handler function
Feed the tool result back and loop
Return the final text response

Here's the actual function signature:

from runtime.engine import run_agent

result = run_agent(
    skill_name="support-agent",
    user_input="I can't log in to my account",
    tool_specs=my_tool_specs,
    tool_handler=my_handler,
    history=conversation_history,
)

The full engine handles Bedrock throttling with exponential backoff, safe error messages (no internal details leaked to users), a max-turns safety limit, and S3 or local filesystem skill loading. And it does all of this in ~140 lines using only boto3.

Tools Are Just Functions

No decorators. No base classes. Define a JSON schema for Bedrock's tool spec, write a Python function, register it:

# tools/specs/support.py
SUPPORT_TOOL_SPECS = [
    {
        "toolSpec": {
            "name": "search_docs",
            "description": "Search the knowledge base",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "query": {"type": "string", "description": "Search query"}
                    },
                    "required": ["query"]
                }
            }
        }
    }
]

# tools/support.py
def search_docs(query: str) -> str:
    # Your actual search logic here
    results = my_search_index.query(query, limit=5)
    return json.dumps(results)

# tools/registry.py
TOOL_HANDLERS = {
    "search_docs": lambda name, inp: search_docs(**inp),
}

That's it. The registry is a dictionary. The spec is JSON. The handler is a function. You can test each piece independently.

Multi-Agent Delegation

A coordinator skill can delegate to specialized sub-skills:

---
name: coordinator
version: 1.0.0
description: Routes requests to specialized agents
---

# Coordinator

## Process
1. Analyze the user's request
2. Delegate to `support-agent` for customer issues
3. Delegate to `ops-agent` for infrastructure questions
4. Handle general conversation directly

The delegate_to_skill tool handles the routing. Recursion depth is limited (default: 3 levels) to prevent infinite loops between skills.

What Terraform Deploys

The module provisions everything you need:

Resource	Purpose
Lambda Function + Layer	Agent runtime
IAM Role	Least-privilege Bedrock + DynamoDB access
API Gateway (optional)	HTTP endpoint for Slack webhooks
DynamoDB Table (optional)	Thread-based conversation memory
EventBridge Rules (optional)	Scheduled agent tasks (cron)

module "agent" {
  source = "github.com/45squaredLLC/terraform-module-markdown-agent"

  name        = "support-agent"
  environment = "prod"

  source_dir       = "${path.module}/src"
  layer_path       = "${path.module}/dist/layer.zip"
  bedrock_model_id = "us.anthropic.claude-sonnet-4-5-20250929-v1:0"

  ssm_parameter_prefixes = ["/support-agent/slack/*"]

  enable_api_gateway  = true
  enable_memory_table = true
}

terraform apply and you have a working agent with an HTTPS endpoint, conversation memory, and IAM policies scoped to exactly what it needs.

Conversation Memory

DynamoDB stores conversation history per Slack thread:

Partition key: THREAD#{thread_id}
Sort key: MSG#{timestamp}#{uuid} (collision-safe)
TTL: Auto-expires after 30 days (configurable)
Cap: 100 messages per thread to stay within context windows

The runtime loads history automatically when processing a message in an existing thread. No session management code needed.

Scheduled Agents

Need an agent that runs on a cron schedule? EventBridge handles it:

scheduled_tasks = [
  {
    name                = "daily-report"
    schedule_expression = "cron(0 13 * * ? *)"
    input = {
      source        = "scheduled"
      task          = "daily-report"
      slack_channel = "C123ABC"
      prompt        = "Generate the daily operations summary"
    }
  }
]

Same agent, same skills, same tools — just triggered by a schedule instead of a Slack message.

Project Structure

src/
├── orchestrator/
│   ├── handler.py        # Lambda entry point
│   └── agent.py          # Wires skills + tools
├── runtime/              # Provided by the module
│   ├── engine.py         # ~140-line Bedrock Converse loop
│   ├── handler.py        # Slack event handling
│   ├── memory.py         # DynamoDB conversation store
│   └── delegation.py     # Skill-to-skill routing
├── skills/
│   ├── coordinator.md    # Entry point skill
│   └── support-agent.md  # Domain skill
├── rules/
│   └── formatting.md     # Shared context
└── tools/
    ├── registry.py       # Tool routing
    ├── specs/
    │   └── support.py    # Tool JSON schemas
    └── support.py        # Tool implementations

Changing agent behavior = editing a markdown file. Adding a tool = writing a function + JSON schema. No framework upgrades, no breaking API changes.

Security

A few things I cared about getting right:

IAM scoping: Policies are locked to the deployment region and specific resource ARNs. Bedrock access is limited to Anthropic models only.
Skill validation: Skill names are regex-validated to prevent path traversal. S3-loaded skills are size-limited to 1MB.
Tool error isolation: Internal errors return only the exception type to the model — no stack traces or secrets leak into responses.
Slack verification: HMAC-SHA256 signature verification runs before any event processing.
SSM least-privilege: Lambda can only read the specific SSM parameter prefixes you declare.

When To Use This (and When Not To)

Good fit:

Slack bots and chat agents on AWS
Agents with a handful of well-defined tools
Teams that want agent behavior in version-controlled markdown
Serverless-first deployments

Look elsewhere if:

You need multi-model orchestration (different LLMs per step)
Your agent requires complex stateful workflows with branching
You're not on AWS or don't want Bedrock

Getting Started

# Clone the example
git clone https://github.com/AIOpsCrew/terraform-module-markdown-agent
cd terraform-module-markdown-agent/examples/slack-bot

# Build the Lambda layer
bash ../../scripts/build_layer.sh .

# Deploy
terraform init
terraform apply

The example includes a working Slack bot with get_time and get_weather tools. Swap the skills and tools for your use case.

The repo is Apache 2.0 licensed. If you're building agents on AWS and tired of fighting frameworks, give it a look: [github.com/AIOpsCrew/terraform-module-markdown-agent(https://github.com/AIOpsCrew/terraform-module-markdown-agent)

Questions or feedback? Drop a comment or open an issue.

DEV Community