DEV Community: Davide De Sio

KiroGraph-Sec: From AWS Summit Milano Slides, Through Kiro Specs, to a Cybersecurity Feature

Davide De Sio — Mon, 01 Jun 2026 08:16:46 +0000

This is the third part of my "Build in Public with Kiro" series. I'm an AWS Community Builder, and this is the story of what happens when you share something early, people actually care, and the project grows faster than you expected.

But also, the story of how a conference talk about VEX triage turned into a full SCA+ security module built spec-first with Kiro

🏃 TL;DR

Most SCA tools tell you "this dependency has a CVE." KiroGraph-Sec tells you whether that CVE is actually reachable from your HTTP routes, which entry points expose a hardcoded secret, and which public route reaches a vulnerable dep in two function calls. It does this by running a BFS over the same call graph KiroGraph already builds for code navigation.

Built spec-first in Kiro after an AWS Summit Milano talk by Maurizio Argoneto (AWS Hero) on automating VEX triage with AI. Covers 14 ecosystems. Ships CycloneDX SBOM and VEX, SARIF for GitHub Security tab, an HTML security dashboard, and a full MCP tool suite so your AI agent can query the security index instead of scanning files.

davide-desio-eleva / kirograph

Semantic code knowledge graph for Kiro: fewer tool calls, instant symbol lookups, 100% local.

KiroGraph

Semantic code knowledge graph for Kiro: fewer tool calls, instant symbol lookups, 100% local.

Inspired by CodeGraph by colbymchenry for Claude Code, rebuilt natively for Kiro's MCP and hooks system.

Full support is for Kiro only. Experimental integrations for 34 other MCP-capable tools (Cursor, Copilot, Claude Code, Windsurf, Cline, and more) are available with auto-detection. See Integrations for the full list.

Why KiroGraph?

When you ask Kiro to work on a complex task, it explores your codebase using file reads, grep, and glob searches. Every one of those is a tool call, and tool calls consume context and slow things down.

KiroGraph gives Kiro a semantic knowledge graph that's pre-indexed and always up to date. Instead of scanning files to understand your code, Kiro queries the graph instantly: symbol relationships, call graphs, type hierarchies, impact radius, all in a single MCP tool call.

The result is fewer tool…

View on GitHub

🎤 It Started With a Slide Deck in Milano

At the AWS Summit Milano, Maurizio gave a talk titled "Chi ha messo questo nel mio codice? Dalla Confusione al Controllo: Automazione del Triage VEX con l'IA" (roughly: "Who put this in my code? From Confusion to Control: Automating VEX Triage with AI").

The core argument was compelling and uncomfortable in equal measure: most developers today drop a question into an AI coding assistant and hope it figures out their security posture. That is not a strategy. That is delegation without accountability.

Maurizio's thesis: AI is a great reasoning engine, but it needs smart tooling underneath to do security work properly. When dealing with VEX documents (Vulnerability Exploitability eXchange), you do not want the AI to scan files blindly. You want it to query a semantic index that already understands your code structure. The AI should ask "is this vulnerability reachable from my entry points?" against a pre-computed graph, not trawl through source files hoping to figure it out from context.

While listening to Maurizio, it became clear that KiroGraph already enabled this kind of navigation, and that was exactly how he was using it.

The call graph, import edges, and architecture layers were already available. The dependency and vulnerability pieces were still missing, but the core foundation was already there.

I took photos of the slides, walked home, and opened Kiro.

I would also like to sincerely thank Maurizio for giving KiroGraph the opportunity to be featured at such an important conference.

📐 The Specs-First Approach

I didn’t start by writing code first. I let Kiro write the specs instead.

I really want to test this approach: copy-pasting conference foto, giving a raw description of the idea discussed with Maurizio, and seeing what I get.

Kiro in this modality generates three files, that effectively became the north star for the entire implementation: requirements.md, design.md, and tasks.md.

Here is a fragment from the requirements doc:

## Introduction

KiroGraph-Sec is a security analysis module that extends KiroGraph's semantic
knowledge graph with dependency vulnerability detection and reachability-aware
impact analysis. Unlike traditional SCA tools that only report "vulnerable
dependency present," KiroGraph-Sec leverages the existing call graph and
architecture graph to determine whether vulnerable code paths are actually
reachable from the application's entry points.

This single paragraph captures the fundamental insight from Maurizio's talk: the problem with most SCA tools is not that they miss vulnerabilities. It is that they find too many and tell you almost nothing about which ones actually matter.

The design document defined the pipeline clearly:

code extraction → reference resolution → architecture analysis → security analysis

It was Kiro that figured out how the pipeline should run, and that the security analysis should depend on the previous steps, based on the KiroGraph code. Security runs last, after the call graph exists, because it needs to traverse it. This is the architectural constraint that makes everything else possible.

The Graph Model

Two new node kinds, three new edge kinds. That is it:

dependency  →  has_vulnerability  →  vulnerability
dependency  →  depends_on         →  dependency
dependency  ←  declared_in        ←  manifest file

The reachability verdict is the output of a BFS from all application entry points through call, import, and reference edges toward each Dependency_Node. Three possible outcomes:

affected: a path exists. This CVE is reachable.
not_affected: no path, no unresolved imports. You can document this with confidence.
under_investigation: the traversal hit unresolved symbols (dynamic dispatch, reflection). Conservative by design: do not call it safe.

🔄 The Iterative Build: From 5 Ecosystems to 14

The first version of KiroGraph-Sec (0.19.0) shipped with the core pipeline: manifest parsing for npm, Maven, Go, pip, and Cargo; OSV integration; reachability analysis; and CycloneDX SBOM and VEX export.

I needed a working implementation to validate Maurizio's thesis in practice. Once it was up and running, the results were immediately tangible.

It was genuinely awesome: I found vulnerabilities in KiroGraph itself and could then iterate directly with Kiro, asking it to generate fixes and close them. The whole loop, from discovery to remediation, worked far better than I had expected.

Then I started closing gaps.

The reachability problem is what every other SCA tool avoids. Snyk charges for it and limits it to JS/Python. Trivy does not do it. OWASP Dependency-Check does not do it. KiroGraph-Sec does it for free, across 14 ecosystems, using the same call graph that already powers kirograph_context and kirograph_impact.

The ecosystem list grew quickly: NuGet, Gradle, RubyGems, Composer, Swift PM, Dart/pub, Elixir/Hex, and crucially, pyproject.toml support for modern Python projects (Poetry, PDM, Hatch, PEP 621), because nobody writes requirements.txt for new projects anymore. Plus pnpm-lock.yaml for the JavaScript monorepo crowd.

🔍 What Makes It Different

1. The Vulnerability That Does Not Actually Matter

kirograph vulns --verdict affected

[Risk: 8.2]  CVE-2023-44270  postcss@8.4.21  [affected]
  CVSS 7.5  EPSS 0.12 / 54th%
  reaches via: POST /api/compile -> buildStylesheet -> postcss.process
  Fix: npm install postcss@8.4.31

kirograph vulns --verdict not_affected

CVE-2024-1234  lodash@4.17.21  [not affected]
  CVSS 9.1  EPSS 0.03
  No reachable path from any entry point. No unresolved imports.

A CVSS 9.1 that does not affect you, and a CVSS 7.5 that does. Traditional SCA tools sort by CVSS and leave you to figure out the rest. KiroGraph-Sec sorts by risk score:

risk_score = reachability_factor × (0.4 × CVSS_normalized + 0.6 × EPSS) × staleness_bonus

The reachability_factor is 1.0 for affected, 0.5 for under_investigation, and 0.1 for not_affected. EPSS gets more weight than CVSS because it reflects actual exploitation probability in the wild, not theoretical severity.

2. Attack Surface Mapping

This is the feature that Maurizio's talk most directly inspired. Not "which deps have CVEs?" but "which HTTP routes reach vulnerable dependencies?"

kirograph attack-surface --public-only

Attack Surface  (47 routes total: 23 public, 24 authenticated)

Critical paths (routes reaching affected vulnerabilities):

[9.2] POST /api/compile      public      -> postcss@8.4.21 CVE-2023-44270 (2 hops)
[7.8] GET  /api/preview      public      -> lodash@4.17.20 CVE-2021-23337 (3 hops)
[4.1] POST /api/auth/login   public      -> bcrypt@5.0.0   CVE-2024-xxxx  (1 hop)

The BFS runs from every route node through call/import/reference edges to dependency nodes. The auth heuristic detects whether the path traverses functions named auth, authenticate, guard, middleware, etc. Public routes reaching vulnerable deps are your actual attack surface.

3. Secrets With Blast Radius

kirograph security secrets

CRITICAL  AWS Access Key ID                    src/config/aws.ts:14
          AKIA****  in configureAWS()  reachable from 4 entry points
          Rotate this key immediately and use environment variables

HIGH      Database URL with credentials        src/db/connection.ts:8
          postgres://****  in createConnection()  reachable from 12 entry points

Every other secrets scanner tells you file:line. This one tells you file:line, function, and how many entry points expose that function. That is the call graph doing the work that used to require manual review.

🤖 The MCP Integration: AI Does the Triage

Here is where Maurizio's vision closes the loop. KiroGraph-Sec exposes its full analysis as MCP tools, which means the AI coding assistant can query it without reading files:

kirograph_security()
-> "2 affected vulnerabilities, 3 under investigation, 0 not affected.
   Top risk: CVE-2023-44270 (Risk: 8.2) in postcss"

kirograph_reachability(target: "CVE-2023-44270")
-> "Verdict: affected
   Reaching entry points: 1
   Path: POST /api/compile -> buildStylesheet -> postcss.process
   Affected layers: api, service"

kirograph_attack_surface(publicOnly: true)
-> "3 public routes reach affected dependencies..."

The AI is not scanning files. It is querying a pre-computed semantic index and getting structured, actionable answers. This is exactly what Maurizio was describing: use AI as a reasoning layer over smart tooling, not as a replacement for the tooling itself.

If the agent is working on code that touches a vulnerable path, kirograph_context will automatically surface a security warning inline:

## Security Warning
- CVE-2023-44270 (CVSS 7.5, EPSS 0.12 / 54th%): postcss@8.4.21 (npm)
  reaches via: buildStylesheet
  Fix: npm install postcss@8.4.31

Moreover, KiroGraph-Sec introduces a set of security-focused MCP tools that allow the agent to answer the most common security, compliance, and software supply chain questions directly from the project graph. The goal is to provide actionable security insights through natural language queries, reducing the need to manually correlate information from multiple security tools and reports.

Topic	Example Question	MCP Tool	Why it was Developed
Security Overview	*Give me a security overview of this project: total dependencies, CVE count, and verdict breakdown by reachability.*	`kirograph_security()`	To provide a quick assessment of the project's overall security posture.
Reachability Analysis	**Is CVE-2023-44487 actually reachable from my entry points? **	`kirograph_reachability()`	To distinguish exploitable vulnerabilities from vulnerabilities that are merely present in dependencies.
SBOM Generation	*Generate a CycloneDX 1.5 Software Bill of Materials for this project.*	`kirograph_sbom()`	To support compliance requirements and dependency inventory management.
VEX Export	**Export the Vulnerability Exploitability eXchange with reachability verdicts for every CVE. **	`kirograph_vex()`	To communicate exploitability and reachability information alongside vulnerability data.
Attack Surface Analysis	*Which HTTP routes expose vulnerable dependencies?*	`kirograph_attack_surface()`	To identify externally accessible attack paths associated with vulnerable components.
Secrets Detection	**Are there any AWS keys, GitHub tokens, or database connection strings in the source code? **	`kirograph_secrets()`	To detect exposed credentials and sensitive information in the codebase.
Security Flow Analysis	*Find SQL injection and path traversal patterns in the code.*	`kirograph_security_flows()`	To identify common insecure coding patterns and risky data flows.
Supply Chain Health	*Are there any abandoned packages or newly added packages that could be a supply chain risk?*	`kirograph_supply_chain()`	To assess dependency health and detect potential supply-chain threats.
Dependency Confusion	*Do we have internal packages whose names also exist in public registries?*	`kirograph_dep_confusion()`	To identify dependency confusion and typosquatting risks.
Remediation SLA Tracking	*Which CVEs have already breached their SLA threshold?*	`kirograph_remediation()`	To monitor vulnerability remediation timelines and policy compliance.
License Compliance	*Are there any GPL or AGPL dependencies that violate our policy?*	`kirograph_licenses()`	To identify open-source license conflicts and compliance issues.
Dependency Staleness	*Which dependencies are more than two major versions behind?*	`kirograph_staleness()`	To highlight outdated libraries that may introduce security or maintenance risks.
Vulnerability Suppression	*Suppress CVE-2024-1234 until December 31, 2026.*	`kirograph_vuln_suppress()`	To manage false positives and accepted-risk scenarios.
Manual CVE Registration	*Manually register CVE-2025-0001 against an internal package.*	`kirograph_vuln_add()`	To track vulnerabilities affecting proprietary or internal software components.
Vulnerability Filtering	*List all confirmed-reachable critical CVEs.*	`kirograph_vulns()`	To enable focused analysis and prioritization of security findings.
CI/CD Security Reporting	*Generate a SARIF report and fail the pipeline if critical vulnerabilities are found.*	`kirograph_security()` + CI integration	To automate security validation and enforce policies during software delivery.

Together, these MCP tools enable KiroGraph-Sec to move beyond simple vulnerability enumeration by providing context-aware security analysis, reachability assessment, supply-chain visibility, and compliance support directly through the agent interface.

🧰 Beyond the Core: The Full KiroGraph-Sec Suite

After the initial release, the module grew into something broader than a traditional SCA.

Supply chain health. For each npm/PyPI/Cargo dependency, KiroGraph-Sec fetches the OpenSSF Scorecard score, maintainer count, and days since last activity. A Scorecard of 2/10 combined with a single maintainer on a critical transitive dependency is a risk that CVSS does not capture.

Dependency confusion detection. This checks whether your internal package names also exist in public registries. It is how the ua-parser-js attack worked. A one-line check in principle, but nobody does it automatically.

SAST-lite via call graph. Instead of AST-pattern matching (which generates enormous false positive rates), this uses the call graph to find dangerous data flows: SQL queries built from controller-like functions, eval() calls reached from request handlers, readFile calls in download-handling functions. Each finding is tagged with its OWASP Top 10 category.

CI/CD integration. kirograph security ci-report --format sarif produces a valid SARIF 2.1.0 file that uploads directly to GitHub's Security tab as code scanning results. --fail-on affected gives you a security gate with a clean exit code.

Remediation SLA tracking. Knowing about a CVE is not the same as fixing it. KiroGraph-Sec records when each vulnerability was first detected, when a fix became available, and surfaces overdue items (critical: 7 days, high: 30 days, medium: 90 days).

📊 The Dashboard

kirograph security export --open

This generates a self-contained HTML file, no server or external dependencies required, with 10 tabs: Overview, Vulnerabilities, Attack Surface, Secrets, SAST Flows, SBOM, VEX, Licenses, Staleness, Remediation. The Vulnerabilities tab defaults to sorting by risk score, shows EPSS badges, and lets you expand each CVE to see the call paths. The SBOM and VEX tabs include one-click CycloneDX JSON download.

💡 What the Kiro Specs Workflow Taught Me

Writing specs before code, particularly with Kiro, forced decisions that would otherwise have been deferred to implementation time. The requirements document defined what a Reachability_Verdict meant before a single line of traversal code existed. The design document resolved the pipeline ordering question (security runs after architecture) before the two modules had any coupling.

When Kiro generated tasks from those specs, it produced a dependency graph that was almost exactly right. The implementation followed the spec, not the other way around.

The most useful line in the entire spec turned out to be this one, from the design doc:

The key differentiator from traditional SCA tools is the reachability analysis: by leveraging KiroGraph's existing call graph, import edges, and architecture layers, KiroGraph-Sec can classify vulnerabilities as affected, not_affected, or under_investigation based on whether vulnerable code is actually reachable from the application's entry points.

🚀 Getting Started

npm install -g kirograph
kirograph init --index

Then enable the security module in .kirograph/config.json:

{
  "enableArchitecture": true,
  "enableSecurity": true,
  "securityDatabases": ["OSV"],
  "securityAutoEnrich": true
}

Run kirograph index, then:

kirograph security               # overview
kirograph vulns                  # sorted by risk score
kirograph attack-surface         # routes to vulnerable deps
kirograph security export --open # full dashboard

If KiroGraph-Sec is enabled, you will get a convenient steering file with examples on how to run a security audit workflow. In Kiro IDE, type /kirograph-security to activate the step-by-step security audit workflow.

---
inclusion: manual
---

# KiroGraph: Security Audit Workflow

Follow these steps for a structured security audit using the knowledge graph.
Activate this workflow before a release, after adding dependencies, or when asked to review security posture.

## Steps

### 1. Overview
kirograph_security()

Note: total dependencies, vulnerability count, verdict breakdown, stale warning count.

### 2. Triage reachable vulnerabilities

kirograph_vulns(verdict: "affected")

Focus only on confirmed reachable CVEs. Sort output by EPSS score (exploitation probability) first, then CVSS severity.

**Act immediately on:** EPSS >= 0.5 (actively exploited). Patch regardless of CVSS.
**Prioritize:** EPSS 0.1–0.5 over low-EPSS high-CVSS entries.
**Low urgency:** EPSS < 0.1 — use CVSS + reachability for triage.

### 3. Deep-dive reachability for critical CVEs
For each high-priority CVE from step 2:

kirograph_reachability(target: "<CVE-ID or package name>")

This shows: exact call paths from entry points, affected architectural layers, distinct path count.

- `affected` verdict with known entry points → fix this dependency
- `not_affected` → no reachable path, document and move on
- `under_investigation` → unresolved symbols, treat conservatively

### 4. Check for under-investigation CVEs

kirograph_vulns(verdict: "under_investigation")

For each: run `kirograph_reachability` to see what symbols are unresolved. If you can determine
the symbol is not called, you can downgrade to not_affected manually.

### 5. License compliance

kirograph_licenses(policy: true)

Review any DENY violations — these must be resolved before shipping.
WARN violations should be documented and approved by the team.

### 6. Dependency staleness

kirograph_staleness(threshold: 0.5)

Score guide: 0.3+ = worth reviewing, 0.7+ = significantly behind.
Cross-reference with step 2 results: stale + vulnerable = highest priority.

### 7. Refresh data if needed
If vulnerability data looks stale (flagged in step 1) or dependencies changed recently:

kirograph_vulns(refresh: true)


### 8. Export compliance artifacts

kirograph_sbom()   // Software Bill of Materials
kirograph_vex()    // Vulnerability Exploitability eXchange


## Interpretation Reference

| Signal | Meaning | Action |
|--------|---------|--------|
| `affected` + EPSS >= 0.5 | Actively exploited, reachable | Patch immediately |
| `affected` + CVSS >= 9.0 | Critical, reachable | Patch this sprint |
| `affected` + CVSS 7.0–8.9 | High, reachable | Plan fix within 2 weeks |
| `not_affected` | No reachable path found | Document, no action needed |
| `under_investigation` | Reachability unclear | Manual review required |
| Stale >= 0.7 | Very outdated | Review for accumulated CVEs |
| License DENY | Policy violation | Must resolve before release |

🏁 Conclusion

Maurizio's talk framed the problem perfectly: AI without smart tooling is just pattern-matching over files. The value comes from giving it a structured index to query, one that understands your code's call graph, architecture, and dependencies simultaneously.

KiroGraph-Sec is that index for security. It does not replace the AI agent. It gives it something worth querying.

The full source is at github.com/davide-desio-eleva/kirograph. The security module ships as part of version 0.19.0.

As per other modules, KiroGraph-Sec has it's own inspired mascotte.

🙋 Who am I

I'm D. De Sio and I work as a Head of Software Engineering in Eleva.
As of June 2026, I’m an AWS Certified Solution Architect Professional and AWS Certified DevOps Engineer Professional, but also a User Group Leader (in Pavia), an AWS Community Builder and, last but not least, a #serverless enthusiast.

KiroGraph: from a personal side project to community-AI-driven tool

Davide De Sio — Wed, 20 May 2026 07:37:28 +0000

This is the second part of my "Build in Public with Kiro" series. I'm an AWS Community Builder, and this is the story of what happens when you share something early, people actually care, and the project grows faster than you expected.

🏃 TL;DR

Since the first article, KiroGraph went from a CLI agent prototype (v0.5.0) to a multi-client, 30+ language, architecture-aware code knowledge graph (v0.13.1). It's now published on npm, has a full documentation site, an interactive graph dashboard, and supports Claude Code and Codex alongside Kiro. Most of this happened because people showed up, contributed, and pushed the project in directions I hadn't planned.

davide-desio-eleva / kirograph

Semantic code knowledge graph for Kiro: fewer tool calls, instant symbol lookups, 100% local.

KiroGraph

Semantic code knowledge graph for Kiro: fewer tool calls, instant symbol lookups, 100% local.

Inspired by CodeGraph by colbymchenry for Claude Code, rebuilt natively for Kiro's MCP and hooks system.

Full support is for Kiro only. Experimental integrations for 34 other MCP-capable tools (Cursor, Copilot, Claude Code, Windsurf, Cline, and more) are available with auto-detection. See Integrations for the full list.

Why KiroGraph?

The result is fewer tool…

View on GitHub

📈 Where we left off

The first post covered the core idea: pre-index your codebase into a semantic graph so the AI agent queries structure instead of re-reading files. At that point KiroGraph had structural indexing, seven semantic engines, an interactive installer, and a CLI agent for kiro-cli.

That was v0.5.0. Here's what happened next.

👻 Kiro acceleration effect

I want to be honest about something: this pace of development wouldn't have been possible without Kiro.

KiroGraph went from v0.5.0 to v0.13.1 in about five weeks, adding architecture analysis, 14 new languages, 17 framework resolvers, an interactive graph dashboard, multi-client support, npm publication, a documentation site, and dozens of bug fixes. As a side project. With a day job.

Kiro's spec-driven development is the reason. When I need to add a new framework resolver, I don't start from a blank file. I write a spec, Kiro breaks it into tasks, and the implementation flows from there. The agent understands the existing patterns (especially with KiroGraph indexing itself), so new code fits the architecture without me hand-holding every decision.

But here's where it gets interesting for open source specifically.

🔄 Simpler to contribute

When contributors work in a TypeScript codebase that isn't their primary stack, it's Kiro that makes that possible. The spec-driven approach gives structure to the work, the agent handles the TypeScript idioms, and the contributor focuses on what they actually know: the domain expertise. Someone who knows Elixir deeply can contribute a full Phoenix framework resolver without being a TypeScript expert. Someone who understands how Claude Code's MCP configuration works can wire up a multi-client integration without knowing the internals of the indexing pipeline.

I received PRs for features I wouldn't have imagined building myself, in domains where I lack expertise, and they were solid. My role shifted from "implement everything" to "review and guide". That's a fundamentally different way to run an open source project.

🔀 It works the other way around as well

Community members bring ideas from their domain. Architecture analysis, coupling metrics, layer detection patterns. These came from people who understood the problem space deeply. I had the codebase context and the TypeScript implementation skills. Kiro bridged the gap: their ideas became specs, specs became tasks, tasks became working code. Fast.

This is the pattern I didn't anticipate: community members bring domain knowledge and ideas, Kiro accelerates the implementation regardless of who writes the code, and the project ships features that no single person would have built alone.

🌐 People as global reviewer

When you build in public with an AI-accelerated workflow, the community acts as a global reviewer. People try KiroGraph on codebases I've never seen, in languages I don't write daily, with frameworks I've never used. They find the bugs that only surface in production-like conditions.

he multi-language call edge bug in v0.12.0 had actually existed since v0.1.0. Nobody noticed because nobody was using KiroGraph on a Java project until someone did.

Speed matters here and when a bug report comes in, I can fix it very quickly. Kiro understands the codebase, the fix is scoped, the PR is merged, and the reporter sees their issue resolved in hours, not weeks. That responsiveness builds trust, and trust brings more contributors.

⚡ Acceleration is real

I'm not being hyperbolic: combination of Kiro's spec-driven development, KiroGraph's self-indexing feedback loop, and a community that both contributes code and stress-tests the result creates a development velocity that feels disproportionate to the effort. One person maintaining a 30+ language code analysis tool with 18 MCP tools, seven semantic engines, and an interactive dashboard shouldn't be sustainable. But it is, because the tooling makes it sustainable.

If you're hesitant to contribute to a project because the stack isn't your primary one, Kiro removes that barrier: the domain expertise is yours while the implementation details are handled.

👥 Kiro Community

As mentioned earlier, the project's trajectory changed because people from *Kiro Community engaged with it. Two main examples:

Maurizio Argoneto commented on the first post about "hierarchical summarization" and the "big picture" problem. That conversation directly influenced the architecture analysis feature, one of the most useful additions for understanding large codebases at a high level.

Alessandro Franceschi contributed Elixir/Phoenix language support and the multi-client integration (Claude Code and Codex), expanding KiroGraph well beyond its original Kiro-only scope.
I was initially reluctant to open this project to other AI clients, agents, or IDEs, but Alessandro explained the situation to me very clearly: people are using Kiro in a wide range of ways and within a multi-tool AI development process that fits their needs. Restricting this project to Kiro alone would, in my view, mean closing the door to those who actually want to use it in this scenario. So I’m not opening it to competing AIs, but rather to complementary ones.

The stargazers, the folks who opened issues, the people who tried it on their own codebases and reported what broke. All of that feedback shaped what got built and in what order. That's the kind of feedback loop that makes building in public worthwhile.

Kiro Community is really outstanding in this.

The project has been recognized by AWS Community Builders team

And also enthusiastically endorsed by Kiro Ambassadors

Noted by Vice President at Amazon Web Services

Loved by far more people than I might imagine (those are just sample, I may be missing some screenshots, apologies for that!)

📊 New features by numbers

As a result, from v0.5.0 to v0.13.1, I've added several features:

30+ languages supported (up from 17)
17 framework resolvers added
18 MCP tools (up from 12)
Interactive graph dashboard with search, path finding, clustering, heat maps
Architecture analysis with coupling metrics
Multi-client support (Kiro, Claude Code, Codex)
Full documentation site
Published on npm
🏗️ What got built
Now let's walk through the features themselves.

🏛️ Architecture analysis (v0.6.0)

An architecture layer that detects packages (via manifest files like package.json, go.mod, Cargo.toml, pom.xml, etc.) and assigns files to architectural layers: api, service, data, ui, shared. Each layer uses language-specific glob patterns, so a Python Flask project and a TypeScript Express project both get correctly classified.

On top of that, KiroGraph now computes coupling metrics between packages: afferent coupling (Ca), efferent coupling (Ce), and instability (Ce / (Ca + Ce)). Three new MCP tools expose this to the agent: kirograph_architecture, kirograph_coupling, kirograph_package.

The practical effect: when Kiro needs to understand where a new feature should live, or whether a refactor will create circular dependencies between packages, it can query the architecture graph instead of guessing from file paths.

🧠 Embedding model selection (v0.7.0)

The installer now presents an arrow-key menu with four curated models plus a custom option. The embeddingDim config field means all vector engines adapt automatically. Migrated from @xenova/transformers (v2) to @huggingface/transformers (v3), enabling support for modern ONNX models.

🔧 The esbuild migration (v0.8.0)

The old tsc build took 5-10 seconds. After migrating to esbuild + tsx, builds dropped to ~400ms with incremental watch mode. Type checking is now decoupled (npm run typecheck), so you get fast feedback during development and correctness checks when you need them.

Small change, big quality-of-life improvement. Especially when you're iterating on the tool using the tool itself.

🪨 Caveman mode (v0.9.0)

Inspired by caveman by JuliusBrussee. The insight: KiroGraph's graph tools already return compact, structured data. The bottleneck in long sessions isn't the tool calls, it's the verbose prose the agent wraps around them.

Caveman mode compresses the agent's communication style. Four levels: off, lite, full, ultra. At ultra, you get abbreviations, arrows for causality, maximum compression. The rules are injected via the steering file and CLI agent prompt, so they're always in context with zero extra tool calls.

It never touches code blocks, file paths, or technical terms. Only prose. And it auto-reverts to normal for security warnings or irreversible actions.

kirograph caveman ultra   # maximum compression
kirograph caveman off     # back to normal

📊 Hotspots, snapshots & dead code (v0.10.0)

Three new MCP tools that give the agent analytical capabilities:

kirograph_hotspots finds the most-connected symbols by edge degree. Before you touch a function, you want to know if it's called from 3 places or 300.
kirograph_surprising finds non-obvious cross-file connections. Scores edges by path distance × kind weight. The highest-scoring pairs are the ones that represent unexpected coupling worth investigating.
kirograph_diff compares the current graph against a saved snapshot. Save before a refactor, diff after. See exactly what changed structurally.
Plus CLI commands for all of them, and a snapshot system (kirograph snapshot save|list|diff) that stores lightweight graph states in .kirograph/snapshots/.

🎨 Interactive graph dashboard (v0.11.0)

kirograph export now renders a full interactive graph visualization. No server required, works offline, three static files.

This is the kind of thing that's hard to describe in text. You open it in a browser and suddenly the structure of your codebase is visible. Where the hotspots are, which modules are tightly coupled, where the dead ends live.

🧬 Elixir, Phoenix, Java/C# edge cases (v0.12.0)

Full Elixir language support: modules, functions, macros, protocols, implementations, structs. Plus Phoenix framework detection with route extraction from router.ex.

The bigger fix in this release was improving multi-language call edge extraction. It turned out that walkForCalls only recognized call_expression (JS/TS/Go/Rust), which meant C#, Java, Python, Ruby, and PHP produced zero call edges. The kirograph_callers and kirograph_callees tools were silently returning nothing for those languages. Fixed now, with per-language name extraction using tree-sitter field lookups.

Same story for inheritance edges in C# and Java, and namespace/package import resolution. These were the kind of bugs that only surface when real people try the tool on real codebases in languages other than TypeScript.

🔄 Sync progress & stability (v0.12.1)

Large codebase support got serious attention here. Paginated embedding (pages of 2,000 instead of loading everything into memory), WASM parser poisoning detection (skip a language if its parser aborts instead of retrying every file), sync progress output, and a staleness warning in kirograph_status when the index is behind.

The MCP sync awareness is particularly nice: when pending unindexed files exceed a threshold, the agent gets a warning and can choose to wait rather than working with stale data.

📚 Documentation site & npm (v0.12.2)

KiroGraph is now published on npm (npm install -g kirograph) and has a full documentation site with dark theme, responsive layout, scroll-spy navigation, and pages for docs, MCP tools, CLI reference, and changelog.

This was overdue. A tool that saves tokens shouldn't require reading source code to understand how to use it.

And also how it evolves

🌍 The big language expansion (v0.13.0)

14 new languages in one release: Scala, Lua, Zig, Bash, OCaml, Elm, Solidity, Vue, Objective-C, YAML, HCL/Terraform, CSS, SCSS, HTML.

So many languages that I had to add a search feature to the docs.

17 new framework resolvers: Play (Scala), Nuxt/Vue, Solidity (Hardhat/Foundry/Truffle), SST, AWS CDK, Serverless Framework, AWS SAM, Terraform/OpenTofu, Pulumi, CloudFormation, Kubernetes/Helm, Docker Compose, Ansible, Angular, and AWS Amplify Gen 2.

My best pick: since I regularly work on IaC projects, why shouldn’t I use KiroGraph for them?

This is where the project's scope expanded significantly. KiroGraph started as a TypeScript-focused tool. Now it indexes infrastructure-as-code (Terraform resources, CloudFormation stacks, Kubernetes manifests), smart contracts (Solidity with Hardhat/Foundry detection), and serverless configurations (SAM, CDK, SST handler resolution).

The framework resolvers are particularly useful: they understand that a handler string like "src/handlers/auth.handler" in a SAM template points to an actual function symbol, and they create the edge in the graph. The agent can then trace from an API route all the way to the implementation without reading YAML files.

🤝 Multi-client support (v0.13.1)

KiroGraph can now be installed for Claude Code and Codex in addition to Kiro.

kirograph install --target claude # .mcp.json + .kirograph/claude.md + CLAUDE.md import
kirograph install --target codex # .kirograph/codex.md + AGENTS.md block
All targets share the same .kirograph/ data. Installing another target only writes that tool's integration files. The graph is the graph, regardless of which agent queries it.

Kiro remains the primary supported target. Claude Code and Codex are marked experimental, but they work. The community asked for it, and it made sense: the value of a pre-indexed code graph isn't tied to a specific IDE.

🔭 What's next

The roadmap from the first post is mostly done.

What's left and what's new:

Smarter sync: content hash per embedding to skip unchanged symbols even when the file was modified. The paginated embedding in v0.12.1 was a step, but there's more to gain.
Cross-project search: for monorepos and workspaces with shared libraries. The graph is per-project right now.
Richer graph traversal: "explain this path" semantically, not just the nodes but why each edge exists.
Plugin system for engines and languages: the abstractions are clean enough that external contributions could be self-contained packages.
Kiro Power packaging: embedding KiroGraph into a configurable Kiro Power to reduce friction for folks who just want to install and go.

But in reality, the community is shaping the roadmap: this tool is designed to save tokens and improve the AI’s understanding of the provided code, so it is the users who ultimately determine what matters most.

This is happening daily:

What’s really on my mind: I’m particularly curious about what it would mean to integrate features from RTK into KiroGraph. I’m not aiming for a “one tool for everything” approach, but if I had to prioritize, this is probably what I’d focus on first (spoiler here).

🎯 Try it

npm install -g kirograph
cd your-project
kirograph install

The installer walks you through everything. Start with cosine if you're not sure which engine to pick. Switch later if you need to.

davide-desio-eleva / kirograph

Semantic code knowledge graph for Kiro: fewer tool calls, instant symbol lookups, 100% local.

KiroGraph

Semantic code knowledge graph for Kiro: fewer tool calls, instant symbol lookups, 100% local.

Inspired by CodeGraph by colbymchenry for Claude Code, rebuilt natively for Kiro's MCP and hooks system.

Full support is for Kiro only. Experimental integrations for 34 other MCP-capable tools (Cursor, Copilot, Claude Code, Windsurf, Cline, and more) are available with auto-detection. See Integrations for the full list.

Why KiroGraph?

The result is fewer tool…

View on GitHub

The repo is public. PRs are welcome. If you build on it, find a bug, or have thoughts on where it should go next, I'd love to hear from you.

A very special thanks to the Stargazers, your support means a lot and truly makes a difference.

🙋 Who am I
I'm D. De Sio and I work as a Head of Software Engineering in Eleva.
As of Feb 2026, I’m an AWS Certified Solution Architect Professional and AWS Certified DevOps Engineer Professional, but also a User Group Leader (in Pavia), an AWS Community Builder and, last but not least, a #serverless enthusiast.

Building KiroGraph: a 100% local semantic code knowledge graph for Kiro

Davide De Sio — Wed, 08 Apr 2026 14:59:28 +0000

This is part of my "Build in Public" with Kiro series. I'm an AWS Community Builder, and this is the story of building a tool by using the tool itself, which is either very meta or very efficient, depending on how you look at it.

🏃 TL;DR

KiroGraph is a local code indexing system for Kiro AI IDE. It turns your codebase into a queryable structural using a semantic graph, dramatically reducing AI tool calls and token usage (up to 90%).

Instead of re-reading files with grep/glob, the AI queries a pre-built AST-based graph (plus optional embeddings), making code navigation faster, cheaper, and more scalable.

It supports multiple vector engines (e.g. SQlite, PGlite, Orama Qdrant, Typesense) and is fully local, with an interactive installer and automatic syncing.

davide-desio-eleva / kirograph

Semantic code knowledge graph for Kiro: fewer tool calls, instant symbol lookups, 100% local.

KiroGraph

Semantic code knowledge graph for Kiro: fewer tool calls, instant symbol lookups, 100% local.

Inspired by CodeGraph by colbymchenry for Claude Code, rebuilt natively for Kiro's MCP and hooks system.

Full support is for Kiro only. Experimental integrations for other MCP-capable tools (Claude Code, Codex) are available but not fully tested. See Other Tools (Experimental) for details.

Why KiroGraph?

The result is fewer tool calls, less context used…

View on GitHub

🔭 How it started

A few weeks ago I came across CodeGraph by Colby McHenry, a semantic code knowledge graph for Claude Code. The idea was brilliant: instead of letting the AI wander through your codebase with grep and file reads, you give it a pre-indexed 100% local graph it can query instantly.

Fewer tool calls, less context burned, faster responses.

I use Kiro, AWS's spec-driven AI IDE, and there was nothing equivalent for it. So I did what any reasonable developer does when they see a good idea: I ported it.

That's how KiroGraph started but the community’s genuine interest motivated me to essentially rebuild CodeGraph from the ground up and significantly expand its functionality.

💸 Why this should matters to us (the token problem)

When you ask an AI agent to work on a complex task, "fix the auth bug", "add rate limiting to the API", "refactor the payment service", the agent needs to understand your codebase before it can do anything useful. The way it typically does that is by reading files, running grep, globbing directories. Every one of those is a tool call.

Tool calls cost tokens. Lots of them. And they're slow.
Aside from cost, we’re currently in a period where reasonable plans for using AI are either being rate-limited or subject to increasingly restrictive limits. The community is actively exploring solutions to optimize requests and responses, reduce token usage, and improve overall efficiency.

The insight behind KiroGraph (and CodeGraph before it) is simple: your codebase doesn't change that often. Between agent runs, you might touch a handful of files.

Why should the agent re-discover the structure from scratch every single time?

KiroGraph pre-indexes everything, functions, methods, classes, interfaces, types, call relationships, import graphs, type hierarchies, into a 100% local SQLite database. When Kiro needs context, it doesn't read files. It queries the graph. One MCP tool call instead of twenty file reads and multiple tool calls.

The impact is real: tasks that once consumed entire context windows simply don’t anymore: saving tokens, improving efficiency, and delivering greater speed.

⚖️ Benchmark

An image is worth a thousand words

Up to a 90%+ reduction in token usage for common read patterns in the KiroGraph codebase. I can confirm similar results across different and larger codebases as well.

These numbers represent the average outcome of identical requests executed with and without KiroGraph, across different semantic engines for comparison.

What this practically means for me is that I used up all my tokens with Kiro on the $20 plan in just 10 days. Now, I’ve gone a full month without even reaching the full allowance.

🏗️ KiroGraph architecture

The tool has two indexing layers.

Structural indexing is always on. tree-sitter parses every source file into an AST and extracts nodes (functions, classes, routes, components, 24 kinds total) and edges (calls, imports, extends, implements, references, and more). Everything lands in kirograph.db. This powers all the graph traversal tools: find callers, trace impact, detect circular deps, find dead code.

Semantic indexing is opt-in. When you enable it, KiroGraph generates 768-dimensional vector embeddings for every embeddable symbol using nomic-ai/nomic-embed-text-v1.5 (~130MB, downloaded once to ~/.kirograph/models/). This powers natural-language search, ask for "auth middleware" and get the relevant functions even if they're named validateJwt or checkPermissions.

The index stays fresh automatically via Kiro hooks.
File saved, mark dirty. Agent stops, sync if dirty.
Batched, efficient, zero overhead during active editing.

The agent knows what to do through a Kiro steering file, which the final KiroGraph adopter can easily adapt to suit their specific needs.

🧠 Is KiroGraph a RAG/GraphRAG?

It’s useful to compare KiroGraph to both a classic RAG system and a GraphRAG approach, because it sits somewhere in between, but also slightly outside both categories.

1) A local RAG works on unstructured text, splitting documents into chunks and retrieving the most relevant pieces via embeddings.
KiroGraph instead indexes code structure, where functions, classes, and relationships come directly from the AST.
This removes chunking entirely and replaces text retrieval with symbol-level navigation over a code graph.

2) GraphRAG builds graphs by extracting entities and relationships from documents, then uses them to improve retrieval quality.
KiroGraph doesn’t infer the graph from text, it derives it deterministically from the codebase structure itself.
As a result, its graph is not an approximation of knowledge, but a direct representation of the system architecture.

The key difference: RAG retrieves text, GraphRAG organizes text, KiroGraph represents code structure.
And embeddings in KiroGraph are optional, not foundational.
The core idea is not better retrieval, but queryable program structure with semantic enrichment on top.

🔍 A lot of semantic engine possibility

Here's where it got interesting.
Once you have embeddings, you need somewhere to put them, and how you store and search them has real consequences at scale.

The original approach is cosine similarity over all vectors in SQLite. That's fine for small to medium projects, but for a large codebase with thousands of indexed symbols, you want approximate nearest-neighbour (ANN) search with a proper index structure.

So I built support for seven engines, each solving a slightly different problem:

cosine (default): the original linear scan over the vectors table in kirograph.db. No extra deps. Works great up to a few thousand symbols. If you just want to try semantic search without any setup, this is it.
Alex Garcia's sqlite-vec brings ANN search to SQLite via a native extension. Sub-linear query time, stays in the SQLite ecosystem. Best for large codebases that don't want to run a separate process.
Orama does something clever: hybrid search. One query combines full-text relevance and vector similarity, which produces better results than running them separately and merging. Pure JS, no native compilation. If you want the best result quality and no native dependencies, Orama is the choice.
PGlite is PostgreSQL compiled to WASM with the pgvector extension. You get exact (not approximate) nearest-neighbour search, ON CONFLICT upserts, HNSW indexing, all the PostgreSQL semantics, in-process, no server. Pure WASM means no native binaries and no compilation. And because it's exact, results are deterministic and reproducible. I particularly like this one.
LanceDB stores embeddings in Apache Lance columnar format. Columnar storage is efficient for batch reads and writes, which matters a lot during indexing. Pure JS, no native deps, sub-linear ANN search.
Qdrant is a dedicated vector database, HNSW index, cosine distance, the full feature set. KiroGraph spawns the Qdrant binary as a managed child process via qdrant-local. The server runs as a persistent background daemon, state tracked in .kirograph/qdrant-server.json. This is the heavy option. You get Qdrant's full query capabilities and a proper production-grade vector store. The trade-off is you're now running a binary.
Typesense is a search engine that added HNSW vector search. KiroGraph auto-downloads the binary (~37MB, cached at ~/.kirograph/bin/) and manages it as a background daemon. State tracked in .kirograph/typesense-server.json. Similar to Qdrant in concept, persistent binary daemon, but with Typesense's search-engine heritage. Very excellent for hybrid queries.

🛠️ Adding installer because DevEx always matters

A lot of CLI tools treat configuration as an afterthought. You edit a config file, restart, wonder why nothing worked, read the docs, try again. It's friction. You don't want friction when working within an AI powered IDE like Kiro.

I wanted KiroGraph's setup to be genuinely good and simple.

So the installer is interactive, not just yes/no prompts but arrow-key menus with descriptions for each option. Run it once and you walk away with a fully working setup, no post-install surprises.

kirograph install

Here's every decision the installer walks you through, and why each one matters.

Enable semantic embeddings

The first question is whether to turn on semantic search at all. Structural indexing is always on and costs nothing extra. Semantic indexing is opt-in because it requires a local embedding model (~130MB, downloaded once) and adds time to every index run.

If your team mostly does exact symbol lookups, "go to definition" style queries, structural-only is fast and lightweight. If you want to ask things like "where is rate limiting handled?" or "which functions deal with user authentication?", you need embeddings.

The installer is upfront about this: it tells you what you're signing up for before you say yes.

Embedding model

Once you enable embeddings, the installer asks which HuggingFace model to use. The default is nomic-ai/nomic-embed-text-v1.5, a solid general-purpose model that produces 768-dimensional embeddings and runs well locally via Ollama.

You can enter any HuggingFace model identifier in org/model-name format. If you enter something non-standard, the installer rejects it and explains the expected format. If you enter a non-default model, it reminds you to run ollama pull <model> before indexing.

The reason this is configurable: embedding quality varies by domain. Code-specific models like jinaai/jina-embeddings-v2-base-code may outperform general models on certain queries.
Giving you control here means you're not locked in.

Semantic engine

This is the arrow-key menu. Each option shows a one-line description so you can make an informed choice without reading docs:

? Choose the semantic search engine:
  ❯ cosine      In-process cosine similarity. No extra deps. Best for small/medium projects.
    sqlite-vec  ANN index. Sub-linear search. Best for large codebases. Needs: better-sqlite3, sqlite-vec (native).
    orama       Hybrid search (full-text + vector). Pure JS. Needs: @orama/orama, ...
    pglite      Hybrid search via PostgreSQL + pgvector. Exact results. Pure WASM. Needs: @electric-sql/pglite.
    lancedb     ANN search via LanceDB (Apache Lance columnar format). Pure JS. Needs: @lancedb/lancedb.
    qdrant      ANN search via Qdrant embedded binary (HNSW index, Cosine). Needs: qdrant-local.
    typesense   ANN search via Typesense (auto-downloaded binary, HNSW, Cosine). Needs: typesense.

After you pick one, the installer immediately runs npm install for the required dependencies. No separate step, no forgotten follow-up. If the install fails, it tells you exactly what to run manually.

For engines that spawn a binary (qdrant, typesense), it also asks whether you want a dashboard (a web UI to navigate the vectors).

Extract docstrings

This controls whether KiroGraph reads JSDoc, Python docstrings, and inline comments from your source files and stores them as symbol metadata. Enabled by default.

Docstrings significantly improve semantic search quality. When a function has a good docstring, the embedding captures its intent, not just its name. A function called proc with a docstring "processes incoming webhook payloads and dispatches to handlers" will surface correctly when someone searches for "webhook processing". Without the docstring, the name alone gives the model almost nothing to work with.

The trade-off is slightly longer indexing time. For most projects it's negligible. The installer lets you disable it if you're indexing a very large codebase and want the fastest possible first run.

Track call sites

This controls whether KiroGraph records the exact line and column of every function call when building call graph edges. Enabled by default.

With call sites tracked, you get precise "go to call site" information in addition to "which functions call this symbol". The kirograph_callers and kirograph_callees MCP tools return not just the caller's name and file, but the exact location of the call. This is what makes the call graph actually useful for debugging and impact analysis.

The trade-off is index size: call site data adds rows to the edges table. On codebases with millions of call expressions this can get large. If you only care about the structural shape of the call graph and not the precise locations, you can disable this to keep the database lean.

Indexing

The final prompt asks whether to run the first full index immediately. If you say yes, the installer runs kirograph index, shows you a live progress output (files scanned, symbols extracted, embeddings generated), and reports the final counts.

This is the "zero to working" moment: by the time the installer exits, your graph is built and Kiro can start using it. No deferred setup, no "remember to run this before you start".

Each phase is surfaced in the output so it's always clear what's happening: scanning files, parsing, resolving references, detecting languages and frameworks, generating embeddings. If something is slow, you know exactly where.

The philosophy across all of this is the same: the installer should leave you in a working state and make every choice transparent. If something requires a separate binary or native compilation, you know before you commit to it. If a step can be done for you automatically, it is.

🔁 The feedback loop that accelerated everything

Here's the part I didn't fully anticipate: because I've used KiroGraph to indexes itself, so I could use it in Kiro while building it.

Every time I added a new engine, the Kiro agent could immediately use kirograph_context and kirograph_callers to understand the existing codebase structure. It knew which interfaces to implement, where the integration points were, what the existing patterns looked like, without me having to explain any of it.

This is augmenting spec-driven development with actual graph-powered context. The agent writes specs and code that fits the existing architecture because it can see the architecture. Not by reading files, by querying the graph.

The speed difference and saving in tokens is hard to overstate. Tasks that would have required multiple rounds of "read this file, now read that file, now understand how they connect" collapsed into a single kirograph_context call followed by implementation.

📊 Adding some dashboard

Once you have Qdrant and Typesense running as daemons, my idea was also to have visibility into what's actually in the vector store as both have community open sourced web UIs.

For Qdrant: the Qdrant Web UI is served by Qdrant itself. KiroGraph downloads the dist-qdrant.zip release asset, extracts it with unzip, caches it, and sets the env var before spawning the binary. The dashboard is then available at http://127.0.0.1:<port>/dashboard natively.

For Typesense: bfritscher/typesense-dashboard is a static React app. KiroGraph downloads it from GitHub, caches it at .kirograph/typesense/dashboard/, and serves it locally via a Node HTTP server.

Both are unified under kirograph dashboard start / kirograph dashboard stop, the command reads semanticEngine from config and dispatches to the right implementation.

🧩 Supported languages and frameworks

As per CodeGraph by Colby McHenry, KiroGraph supports different languages and framework (and it easy to add new ones).

Language	Framework
TypeScript	React, Next.js, React Native, Svelte, SvelteKit, Express, Fastify, Koa
JavaScript	React, Next.js, React Native, Svelte, SvelteKit, Express, Fastify, Koa
TSX / JSX	Generic
Python	Django, Flask, FastAPI
Go	Generic
Rust	Generic
Java	Spring, Spring Boot, Spring MVC
C	Generic
C++	Generic
C#	ASP.NET Core
PHP	Laravel
Ruby	Rails
Swift	SwiftUI, UIKit, Vapor
Kotlin	Generic
Dart	Generic

➡️ What's next

A few things on the roadmap:

More engines. The engine abstraction is clean, adding a new one means implementing four methods (initialize, upsert, search, count). Weaviate, Chroma, and Milvus are very interesting candidates. I should evaluate if they fit the ecosystem and what they offer as peculiarity. Maybe a "plugin system" would be a good implementation to let folks implement their preferred semantic engine.

More languages and frameworks. Also here a "plug'n'play" system to add new definition for languages and frameworks could be a good choice.

Embed in a Kiro Power. KiroGraph works with Kiro's hooks and steering, and is basically a CLI tool: a good choice could be to embed it into a configurable Kiro power to reduce the friction for folks who wants just install and vibe.

Smarter sync. Currently, sync re-embeds every changed symbol. I’m considering introducing a content hash per embedding so we can skip unchanged symbols, even when the file has been modified.

Cross-project search. The graph is per-project right now. For monorepos or workspaces with shared libraries, cross-project symbol resolution would be genuinely useful.

Richer graph traversal. kirograph_path finds the shortest path between two symbols. I would love to add something like "explain this path", not just the nodes, but the semantic reason for each edge.

🚀 Just try it

Go to KiroGraph repository, fork it, try it. PR's are welcome.
It’s not yet published on npm, so it should be considered an alpha version. Expect significant changes in the future so do not consider it stable.

davide-desio-eleva / kirograph

Semantic code knowledge graph for Kiro: fewer tool calls, instant symbol lookups, 100% local.

KiroGraph

Semantic code knowledge graph for Kiro: fewer tool calls, instant symbol lookups, 100% local.

Inspired by CodeGraph by colbymchenry for Claude Code, rebuilt natively for Kiro's MCP and hooks system.

Full support is for Kiro only. Experimental integrations for other MCP-capable tools (Claude Code, Codex) are available but not fully tested. See Other Tools (Experimental) for details.

Why KiroGraph?

The result is fewer tool calls, less context used…

View on GitHub

The installer will walk you through everything. If you're not sure which engine to pick, start with cosine, it works out of the box with no dependencies and you can always switch later. If you're using it on large codebases, pick pglite.

The repo is public. If you build on it, find a bug, or have thoughts on the engine choices, I'd love to hear from you.

A very special thanks to the Stargazers, your support means a lot and truly makes a difference.

🙋 Who am I

I'm D. De Sio and I work as a Head of Software Engineering in Eleva.
As of Feb 2026, I’m an AWS Certified Solution Architect Professional and AWS Certified DevOps Engineer Professional, but also a User Group Leader (in Pavia), an AWS Community Builder and, last but not least, a #serverless enthusiast.

It’s always amusing how AI is convinced developers are basically 80% coffee, 20% code, and somehow still functional.

AWS Lambda Durable Functions vs Step Functions: a real-world comparison

Davide De Sio — Mon, 23 Feb 2026 18:03:06 +0000

Hey devs, I recently built the same order dispatch workflow twice, once with AWS Step Functions and once with AWS Lambda durable functions. The difference in developer experience was significant. Let me walk you through what I learned and why I decided to do this.

AWS Lambda durable functions are relatively new to the AWS ecosystem, so deciding whether to use them is not always straightforward.

🎯 The Problem: A Real-World Order Workflow

I needed to build a simple workflow for handling an order:

Store the order in DynamoDB
Check inventory
Wait for human approval
Automatically find alternatives if rejected or complementary items if confirmed
Wait 2 days
Generate an email with Bedrock (with alternatives or complementary items, based on confirmation or rejection)
Send it via SES

This is a real-world scenario (and also something I needed in production): human-in-the-loop, approvals, timers, and external service integrations. In reality, it’s a bit more complex than that, but for the sake of discussion, we can focus on this key question: should I choose Step Functions or Lambda durable functions?

⚡ Which one should I choose?

The choice framework proposed by AWS is a good start:

Go with Lambda durable functions if:

You prefer using your familiar programming language
Local testing without cloud dependencies is important to you
Your compute service of choice is AWS Lambda and your business logic primarily lives in those functions

Stick with Step Functions if:

Visual workflows are important for your stakeholders
You're orchestrating many AWS services together
You want to reduce ops burden (patching, scaling, ..)

I wasn’t sure what to choose, as my goal is always to maximize Developer Experience (DevEx) and maintainability. I don’t necessarily need a fully visual workflow, but one of the business requirements is ensuring seamless integration with other AWS services and upcoming workflows. I’m also a big fan of Step Functions when it comes to CDK-based projects. On the other hand, I’m really drawn to the simplicity of Lambda durable functions.

🆚 Code comparison

So, this was the perfect scenario to explore both solutions. The workflow is clear and simple enough that it won’t take much time to build them in parallel, allowing for a real-world comparison.

Let’s look at the actual code: this is where the differences become clear.

A TypeScript function using Lambda durable functions

Why does this perfectly suit my scenario?

It’s a single TypeScript function, there’s no CDK involved to implement the workflow. CDK is used only to create the architecture, thus a single Lambda, cleanly separating my workflow logic from the infrastructure code.

Using async/await feels very natural for a developer working in this environment, and I can encapsulate the entire workflow within a single function, giving me one clear place to understand what’s going on. On top of that, I get full IDE support with autocomplete and type checking (and AI, have you tried Kiro yet?)

Let’s take a look at the code, starting with the imports.
I’ll remove all the pure business logic, as we should focus on the workflow itself (and I can’t disclose my client’s code!).

import { withDurableExecution, DurableContext } from '@aws/durable-execution-sdk-js';

Since we’ve imported it, we should wrap the handler in a durable execution.

export const handler = withDurableExecution(async (event: OrderEvent, context: DurableContext) => {

});

We can now move on to our workflow steps.
Let’s start by saving the order (in my real-world scenario I've saved it to DynamoDB using the AWS SDK) with a first async step.

  // Step 1: Save order
  const order = await context.step('save-order', async (): Promise<OrderData> => {
    const orderId = `ORD-${Date.now()}`;
    return {
      orderId,
      buyerEmail: event.buyerEmail,
      items: event.items,
    };
  });

The second step waits for the first one to complete, then checks the inventory and provides the relevant information needed to confirm or reject the order. An interesting aspect is that we can use a logger to record the response of each step and async/await pattern really helps us understand what happens in sequence.

  // Step 2: Check inventory
  const availability = await context.step('check-inventory', async () => {
    return event.items.map(item => ({
      itemId: item.itemId,
      available: 100,
      inStock: true,
    }));
  });

  context.logger.info('Inventory checked', { availability });

The third step handles human approval using waitForCallback function. At this stage, in my real-world scenario, an email is sent to the approver (I used SES, but you could just as easily use an SNS topic or any other notification system). However, this is just part of the business logic, which I won’t go into here.

  // Step 3: Wait for human approval (up to 48h, no compute cost)
  const approval = await context.waitForCallback(
    'wait-for-approval',
    async (callbackId) => {
      context.logger.info('Waiting for approval', { callbackId, orderId: order.orderId });
    },
    { timeout: { hours: 48 } }
  );

Once the decision is received, we handle any rejected items and look for suitable alternatives (I retrieved them from DynamoDB, but you can use any database you prefer). While, if the order is accepted, we instead search for complementary items to recommend to the user.
For simplicity, error handling is omitted here, but in a production scenario this should be wrapped in a try/catch block and handled properly.

  // Step 4: Handle approval or rejection
  let status = (JSON.parse(approval)).decision;
  let suggestedItems;
  if (status === 'discard') {
    suggestedItems = await context.step('find-similar', async () => {
      return [{ itemId: 'SIM-1', name: 'Similar Item' }];
    });
  } else {
    suggestedItems = await context.step('find-complementary', async () => {
      return [{ itemId: 'COM-1', name: 'Complementary Item' }];
    });
  }

The business then decided to pause the workflow for at least two days, as they don’t want to bother the user with marketing emails immediately after an order is accepted or rejected. This is a good opportunity to see a wait in action.

  // Step 5: Wait 2 days before marketing follow-up
  await context.wait('wait-two-days', { days: 2 });

Once the wait is over, I generate the email via Amazon Bedrock, using information from previous steps or the database. In practice, I personalize the message based on the approval decision, either suggesting similar products for rejected orders or recommending complementary items for confirmed ones.

  // Step 6: Generate marketing email via Bedrock
  const email = await context.step('generate-email', async () => {
    return 'This is where the email has been generate';
  });

And finally send the email generated and close the function returning the order, status and suggested items.

  // Step 7: Send email
  await context.step('send-email', async () => {
    context.logger.info('Sending email', { email, to: event.buyerEmail });
  });

  return { orderId: order.orderId, status: status, suggested:suggestedItems };

Approach with Step Function

Here's the same workflow implemented using Step Functions and CDK.

First, we need to set up all the required Lambda functions. Doesn’t that feel a bit odd? We want to define the workflow, yet we’re forced to create every individual function before we’ve even written the workflow itself.

export class OrderDispatchStepFunctionStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const saveOrderFunction = new lambda.Function(this, 'SaveOrderFunction', {
      runtime: lambda.Runtime.NODEJS_20_X,
      handler: 'save-order.handler',
      code: lambda.Code.fromAsset(path.join(__dirname, '../lambda')),
    });

    const checkInventoryFunction = new lambda.Function(this, 'CheckInventoryFunction', {
      runtime: lambda.Runtime.NODEJS_20_X,
      handler: 'check-inventory.handler',
      code: lambda.Code.fromAsset(path.join(__dirname, '../lambda')),
    });

    const findSimilarItemsFunction = new lambda.Function(this, 'FindSimilarItemsFunction', {
      runtime: lambda.Runtime.NODEJS_20_X,
      handler: 'find-similar-items.handler',
      code: lambda.Code.fromAsset(path.join(__dirname, '../lambda')),
    });

    const findComplementaryItemsFunction = new lambda.Function(this, 'FindComplementaryItemsFunction', {
      runtime: lambda.Runtime.NODEJS_20_X,
      handler: 'find-complementary-items.handler',
      code: lambda.Code.fromAsset(path.join(__dirname, '../lambda')),
    });

    const generateEmailFunction = new lambda.Function(this, 'GenerateEmailFunction', {
      runtime: lambda.Runtime.NODEJS_20_X,
      handler: 'generate-email.handler',
      code: lambda.Code.fromAsset(path.join(__dirname, '../lambda')),
    });

    const sendEmailFunction = new lambda.Function(this, 'SendEmailFunction', {
      runtime: lambda.Runtime.NODEJS_20_X,
      handler: 'send-email.handler',
      code: lambda.Code.fromAsset(path.join(__dirname, '../lambda')),
    });

I’ve omitted the permission setup here to keep things simple. Just remember that each Lambda function still needs the appropriate IAM permissions to access the required AWS services. And this is again a lot of boilerplate code.

Finally, we can define our tasks. Again, this feels mostly like boilerplate: just a way to wrap each individual Lambda function.


    const saveOrderTask = new tasks.LambdaInvoke(this, 'SaveOrder', {
      lambdaFunction: saveOrderFunction,
      outputPath: '$.Payload',
    });

    const checkInventoryTask = new tasks.LambdaInvoke(this, 'CheckInventory', {
      lambdaFunction: checkInventoryFunction,
      outputPath: '$.Payload',
    });

    const sendApprovalNotification = new tasks.SnsPublish(this, 'SendApprovalNotification', {
      topic: approvalTopic,
      message: sfn.TaskInput.fromJsonPathAt('$'),
    });

    const waitForApproval = new sfn.Wait(this, 'WaitForHumanApproval', {
      time: sfn.WaitTime.duration(cdk.Duration.minutes(5)),
    });

    const findSimilarTask = new tasks.LambdaInvoke(this, 'FindSimilarItems', {
      lambdaFunction: findSimilarItemsFunction,
      outputPath: '$.Payload',
    });

    const findComplementaryTask = new tasks.LambdaInvoke(this, 'FindComplementaryItems', {
      lambdaFunction: findComplementaryItemsFunction,
      outputPath: '$.Payload',
    });

    const waitTwoDays = new sfn.Wait(this, 'WaitTwoDays', {
      time: sfn.WaitTime.duration(cdk.Duration.days(2)),
    });

    const generateEmailTask = new tasks.LambdaInvoke(this, 'GenerateEmail', {
      lambdaFunction: generateEmailFunction,
      outputPath: '$.Payload',
    });

    const sendEmailTask = new tasks.LambdaInvoke(this, 'SendEmail', {
      lambdaFunction: sendEmailFunction,
      outputPath: '$.Payload',
    });

Here we introduce a bit of workflow logic, mainly to define how the approval step should be handled.


    const approvalChoice = new sfn.Choice(this, 'ApprovalDecision');

    const rejectedFlow = findSimilarTask.next(
      new sfn.Pass(this, 'OrderRejected', {
        result: sfn.Result.fromObject({ status: 'rejected' }),
        resultPath: '$.orderStatus',
      })
    );

    const confirmedFlow = findComplementaryTask.next(
      new sfn.Pass(this, 'OrderAccepted', {
        result: sfn.Result.fromObject({ status: 'accepted' }),
        resultPath: '$.orderStatus',
      })
    );

    approvalChoice
      .when(sfn.Condition.stringEquals('$.decision', 'confirm'), confirmedFlow)
      .when(sfn.Condition.stringEquals('$.decision', 'discard'), rejectedFlow)
      .otherwise(rejectedFlow);

And now, we bring it all together into the final, straightforward workflow.


    const definition = saveOrderTask
      .next(checkInventoryTask)
      .next(sendApprovalNotification)
      .next(waitForApproval)
      .next(approvalChoice)
      .next(waitTwoDays)
      .next(generateEmailTask)
      .next(sendEmailTask);

    const stateMachine = new sfn.StateMachine(this, 'OrderDispatchStateMachine', {
      definition,
      timeout: cdk.Duration.days(7),
    });
  }
}

The first thing I immediately notice is the amount of boilerplate required just to prepare each Lambda before even starting to think about the workflow itself. And also, this is just the workflow part, you’ll also have the business logic implemented inside the Lambda functions.

I now have a lot of separate Lambda functions to maintain, and this has always been my main concern with Step Functions, the workflow logic still ends up mixed with pure infrastructure code.

I have an almost Shakespearean dilemma that keeps me up at night: is the workflow a matter of architecture or business?

The business logic is spread across multiple files, and while that’s perfectly fine (separation of concerns is still a best practice and you should implement it also when using Lambda durable functions), it makes it much harder to understand the workflow as you lose the ability to see the entire flow at a glance, and understanding it properly often requires a certain level of expertise, at least with ASL.
You can see it clearly in the CDK code, but here you have it mixed up with pure architecture code.

And do you know what happens when Step Functions doesn’t support something I need? I end up writing that logic directly inside the Lambdas.

This happens when integrating new services that aren’t supported by Step Functions, implementing complex data transformation logic, handling advanced catch/retry scenarios beyond what the service offers, or simply when something is difficult to express in ASL but straightforward to implement in code inside a Lambda function.

Basically I create another step with a Lambda to do this work.

In doing so, I lose all the benefits I chose Step Functions for in the first place: separation of concerns, clear workflow visibility, and predictable orchestration, basically everything that made it the right choice to begin with.

Want you see which is the CDK needed for the Durable Function?

import * as cdk from 'aws-cdk-lib';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as iam from 'aws-cdk-lib/aws-iam';
import { Construct } from 'constructs';

export class DurableFunctionStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // Create the durable function
    const durableFunction = new lambda.Function(this, 'DurableFunction', {
      runtime: lambda.Runtime.NODEJS_22_X,
      handler: 'index.handler',
      code: lambda.Code.fromAsset('lambda'),
      durableConfig: { executionTimeout: Duration.hours(1), retentionPeriod: Duration.days(30) },
    });

    // Create version and alias
    const version = durableFunction.currentVersion;
    const alias = new lambda.Alias(this, 'ProdAlias', {
      aliasName: 'prod',
      version: version,
    });

  }
}

And this is pure architecture: no workflow logic mixed in. It can live alongside other core architecture components, like DynamoDB, S3, IAM permissions, and so on. Here is a full working architecture sample.

import * as cdk from 'aws-cdk-lib';
import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as logs from 'aws-cdk-lib/aws-logs';
import * as iam from 'aws-cdk-lib/aws-iam';
import { Construct } from 'constructs';
import * as path from 'path';

export class OrderDispatchDurableStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // DynamoDB Tables
    const ordersTable = new dynamodb.Table(this, 'OrdersTable', {
      partitionKey: { name: 'orderId', type: dynamodb.AttributeType.STRING },
      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
    });

    const inventoryTable = new dynamodb.Table(this, 'InventoryTable', {
      partitionKey: { name: 'itemId', type: dynamodb.AttributeType.STRING },
      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
    });

    inventoryTable.addGlobalSecondaryIndex({
      indexName: 'CategoryIndex',
      partitionKey: { name: 'category', type: dynamodb.AttributeType.STRING },
    });

    // Log Group for Durable Function
    const orchestratorLogGroup = new logs.LogGroup(this, 'OrchestratorLogGroup', {
      logGroupName: '/aws/lambda/order-dispatch-orchestrator',
      retention: logs.RetentionDays.ONE_WEEK,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
    });

    // Main Orchestrator Durable Function
    const orchestratorFunction = new lambda.Function(this, 'OrchestratorFunction', {
      runtime: lambda.Runtime.NODEJS_24_X,
      handler: 'orchestrator.handler',
      code: lambda.Code.fromAsset(path.join(__dirname, '../lambda')),
      timeout: cdk.Duration.minutes(15),
      logGroup: orchestratorLogGroup,
      durableConfig: {
        executionTimeout: cdk.Duration.days(7),
        retentionPeriod: cdk.Duration.days(7),
      },
      environment: {
        ORDERS_TABLE: ordersTable.tableName,
        INVENTORY_TABLE: inventoryTable.tableName,
        SENDER_EMAIL: process.env.SENDER_EMAIL || 'noreply@example.com',
      },
    });

    ordersTable.grantReadWriteData(orchestratorFunction);
    inventoryTable.grantReadData(orchestratorFunction);

    orchestratorFunction.addToRolePolicy(new iam.PolicyStatement({
      actions: ['bedrock:InvokeModel'],
      resources: ['*'],
    }));

    orchestratorFunction.addToRolePolicy(new iam.PolicyStatement({
      actions: ['ses:SendEmail', 'ses:SendRawEmail'],
      resources: ['*'],
    }));

    // Create version and alias
    const version = orchestratorFunction.currentVersion;
    const alias = new lambda.Alias(this, 'ProdAlias', {
      aliasName: 'prod',
      version: version,
    });

    // Outputs
    new cdk.CfnOutput(this, 'OrchestratorFunctionArn', {
      value: alias.functionArn,
      description: 'Use this qualified ARN to invoke the durable function',
    });
    new cdk.CfnOutput(this, 'OrdersTableName', {
      value: ordersTable.tableName,
    });
    new cdk.CfnOutput(this, 'InventoryTableName', {
      value: inventoryTable.tableName,
    });
  }
}

I love it. This is just architectural code. The actual workflow logic isn’t included here.

🧪 A crucial point in DevEx: testing

Ok, let's try to go deeper. I've written the code for both solutions, both come with trade-offs. Now I should test it before even thinking about deploying.

This is where Lambda durable functions really stands out for me: it feels as straightforward as using Node's test runner, Jest or any other testing framework we’re already familiar with.

Test locally with node on Durable Functions

Having just a single function is a big advantage because there’s no AWS infrastructure involved, and no need to mock Step Functions. You simply write your tests and run npm test.

Let’s start by importing the necessary libraries.

import { LocalDurableTestRunner, WaitingOperationStatus } from '@aws/durable-execution-sdk-js-testing';
import { OperationType, OperationStatus } from '@aws-sdk/client-lambda';
import { handler } from '../orchestrator';

Then we can create the test suite by:

initializing the test environment using the setupTestEnvironment function and passing skipTime: true in Jest’s beforeAll hook.
tearing down the test environment using teardownTestEnvironment in Jest’s afterAll hook.

describe('Order Dispatch Durable Function', () => {
  beforeAll(async () => {
    await LocalDurableTestRunner.setupTestEnvironment({ skipTime: true });
  });

  afterAll(async () => {
    await LocalDurableTestRunner.teardownTestEnvironment();
  });

We are now ready to initialize our test. In this case, the scope is completing the workflow with confirmation. Let’s define the runner and connect it to the imported handler.


  it('should execute complete workflow with approval', async () => {
    const runner = new LocalDurableTestRunner({
      handlerFunction: handler,
    });

Next, we define our orderEvent (i.e., the incoming order).

    const orderEvent = {
      buyerEmail: 'customer@example.com',
      items: [{ itemId: 'ITM-1', quantity: 2 }],
    };

We can now start the execution on the runner, passing our orderEvent, and wait for it to complete.

    // Start execution (will pause at callback)
    const executionPromise = runner.run({ payload: orderEvent });

Since we have a human in the loop to simulate, we can use runner.getOperation to get the callback operation and wait until it is STARTED. Then we submit our decision with sendCallbackSuccess and wait for it to be COMPLETED.

    // Get callback operation and wait for it to be ready
    const callbackOp = runner.getOperation('wait-for-approval');
    await callbackOp.waitForData(WaitingOperationStatus.STARTED);

    // Send approval callback
    await callbackOp.sendCallbackSuccess(JSON.stringify({ 'decision': 'confirm' }));

    await callbackOp.waitForData(WaitingOperationStatus.COMPLETED);

Finally, we wait for the execution to finish and verify the expected outcome (in this case, that the order is confirmed).

    // Wait for execution to complete
    const execution = await executionPromise;

    // Verify execution succeeded
    expect(execution.getStatus()).toBe('SUCCEEDED');

    const result = execution.getResult();
    expect(result.orderId).toMatch(/^ORD-/);
    expect(result.status).toBe('confirm');

To complete our test, we can also verify that the other steps executed successfully using runner.getOperation.

    // Verify operations executed
    const saveOrder = runner.getOperation('save-order');
    expect(saveOrder.getType()).toBe(OperationType.STEP);
    expect(saveOrder.getStatus()).toBe(OperationStatus.SUCCEEDED);

    const checkInventory = runner.getOperation('check-inventory');
    expect(checkInventory.getType()).toBe(OperationType.STEP);
    expect(checkInventory.getStatus()).toBe(OperationStatus.SUCCEEDED);

    const waitTwoDays = runner.getOperation('wait-two-days');
    expect(waitTwoDays.getType()).toBe(OperationType.WAIT);
    expect(waitTwoDays.getStatus()).toBe(OperationStatus.SUCCEEDED);

    const generateEmail = runner.getOperation('generate-email');
    expect(generateEmail.getType()).toBe(OperationType.STEP);
    expect(generateEmail.getStatus()).toBe(OperationStatus.SUCCEEDED);

    const sendEmail = runner.getOperation('send-email');
    expect(sendEmail.getType()).toBe(OperationType.STEP);
    expect(sendEmail.getStatus()).toBe(OperationStatus.SUCCEEDED);
  });
});

In the end, writing and testing a durable workflow like this is surprisingly simple and enjoyable. With just a single function and the local test runner, you don’t have to deal with complex AWS infrastructure or mocking, and the code remains clear and easy to follow. It’s genuinely satisfying to see the entire workflow execute and verify each step with minimal setup.

Local testing has always been challenging for Step Functions

Instead, we have a few options when it comes to testing something implemented with Step Functions.

Remotely, by deploying to AWS and testing against real infrastructure.
Locally, using frameworks or tools that simulate Step Functions.
Or Unit testing, by testing each Lambda function individually.

However, even with these approaches, we’re still missing proper end-to-end testing of the entire workflow, which is often the most critical part to validate.

# Option 1: Deploy to AWS and test remotely (slow, costs money)
aws stepfunctions start-execution --state-machine-arn arn:aws:...

# Option 2: Use Step Functions Local (limited, requires Docker)
docker run -p 8083:8083 amazon/aws-stepfunctions-local
# Still need to mock all Lambda functions
# Still need to mock DynamoDB, SNS, SES...

# Option 3: Unit test each Lambda separately
# But you can't test the workflow orchestration!

All these options make rapid iteration much harder. Only the first approach truly gives me confidence that I’ve tested the workflow end-to-end. But this forces me to mentally switch between deployment and testing, which breaks flow and slows down development. For me, that kind of friction is a productivity killer.

There is a clear winner for me here, and it's not Step Functions, while testing continues to improve thanks to AWS folks.

💻 What bothers devs: after deploy ops.

Both the business, and sometimes we developers as well, tend to underestimate the importance of day-to-day operations after the first deployment. Production environments involve change requests, debugging, fixes, and monitoring.

Here’s what the daily development workflow looks like with each tool in this phase.

Durable Functions

What's going on if I receive a change request?

Edit my function

Write/update test

Run npm test (1 second to iterate)

Deploy with cdk deploy (very short time as this is just one function drift to be released)

Only now invoke the function endpoint to be sure everything is ok.

What will I do if I need to debug anything?

Look in console (or get via CLI) CloudWatch logs in a single log group: anyone using CloudWatch should know the nightmare of navigating multiple logs groups. Also, the Lambda durable functions console surfaces logs and execution details without having to jump to CloudWatch, letting you focus on your single Lambda.

See complete execution flow in one place

Replay the execution locally with tests and see what's broken

Fix the bug, run test, deploy.

What if I need to onboard another developer, regardless of seniority?

Share the single function and walk the developer through the code

They will probably understand it quickly, since it’s just a single function.

Hopefully they can contribute within hours, test it locally, give me a PR which contains the "full micro-service workflow" logic without having to change a line of architecture.

Moreover, now that we live in the era of AI coding assistants, AI powered IDEs, and autonomous agents for coding, it has never been easier to onboard new developers. Providing them with precise, focused context around a single Lambda function is undoubtedly one of the most effective ways to get them productive in a very short time.

I wouldn’t be surprised if we soon see dedicated Kiro Power-Ups and SOPs for Durable Functions in the AWS MCP ecosystem.

Step Functions

Let's see the similar dev scenarios with Step Functions

What's going on if I need to implement a change request?

Modify state machine definition in CDK with ASL language

Modify one or multiple Lambda functions

No quick way to test locally (or should have a setup to do it, and so your teammates)

Deploy with cdk deploy (2-3 minutes as the drift would be much more than a single Lambda)

Go and test manually in AWS Console to be sure of the implementation, but get back to the code if anything isn't right (oh my..)

And if I catch an error and I should debug it?

Again open Step Functions execution in AWS Console

Click through each state to see input/output

Open CloudWatch logs for relevant Lambda

Correlate timestamps across services

Maybe use X-Ray for tracing

Fix the bug, redeploy, restart again.

Don't get me wrong, that’s perfectly fine. I genuinely like Step Functions because, despite those "velocity" trade-offs in DevEx, they enforce a proper orchestration of distributed systems. They also provide a clear visualization of the workflow in the console, make it easy to catch errors at the failing state, and help you understand what’s happening inside complex orchestrations, ultimately simplifying what would otherwise be a very intricate system.

But what if I need to onboard someone who isn’t an AWS expert and isn’t very familiar with Step Functions or workflow architecture in general?

I’d have to:

Introduce Amazon States Language

Explain each Lambda functions

Walk through the CDK stack

This means that a developer would typically become productive only after a few days, and it really depends on their seniority and prior experience with AWS. Trust me, this can easily become a waste of time and a nightmare, both from the mentor’s perspective and the learner’s.

From a DevEx perspective, Lambda durable functions are a major step forward.

🤔 So when do Step Functions still make sense?

However, Lambda durable functions won’t always be the right answer.
Step Functions has genuine advantages in two main cases.

Visual workflows matter

One of the biggest advantages of Step Functions is that stakeholders can see the workflow visually: this visual representation is not just a “nice-to-have.” It’s crucial for stakeholder demos, where non-technical team members can quickly understand the workflow and see how processes progress.

It also simplifies compliance reviews as auditors can trace exactly what happens at each step without digging through code.

What about operations monitoring? DevOps and support teams can spot failures or bottlenecks immediately, understand dependencies between steps, and react faster (sometimes without having access to the code itself).

In short, having a clear, visual workflow turns complex orchestration into something everyone can comprehend, communicate about, and trust.

Native AWS Service Integration

Step Functions has native integrations with 200+ AWS services:

// Directly invoke services without Lambda
new tasks.DynamoPutItem(this, 'SaveOrder', {
  table: ordersTable,
  item: { ... }
});

new tasks.SqsSendMessage(this, 'QueueOrder', {
  queue: orderQueue,
  messageBody: sfn.TaskInput.fromObject({ ... })
});

new tasks.EcsRunTask(this, 'ProcessOrder', {
  cluster: ecsCluster,
  taskDefinition: orderProcessor
});

Using CDK you have a lot of options for simple task basically for each AWS Service.

For AWS service orchestrations, this is actually pretty clean.
While if you want to implement something directly in code with a Lambda durable function, obviously you need to use the SDK, and that logic becomes part of your business layer.

🚨 So when should we prefer `Lambda durable functions`?

First, when you need a code-first philosophy based on widely used languages such as TypeScript or Python.
You write orchestration in the same language as your business logic. No need to learn a domain-specific language.

Also, if you want a simple local development and testing option.
For the first time, you can test complex workflows locally without AWS infrastructure.

There’s no need to learn Amazon States Language (ASL) and what used to feel awkward and complex is now trivial: you can define, modify, and visualize workflows in the code, without diving into verbose JSON or mastering intricate patterns.

As for an example of complex nested workflows:

const results = await context.runInChildContext('parent', async (parent) => {
  const child1 = parent.runInChildContext('child1', async (c1) => {
    const grandchild = c1.runInChildContext('grandchild', async (gc) => {
      // Deeply nested orchestration - easy!
    });
    return grandchild;
  });
  return child1;
});

What about parallel executions?
As simple as using a map

// Process N items in parallel (N determined at runtime)
const items = [1, 2, 3, 4, 5];

const results = await context.map(
  'process-items',
  items,
  async (ctx, item, index) => {
    return await ctx.step(`process-${index}`, async () => 
      processItem(item)
    );
  },
  {
    maxConcurrency: 3,
    completionConfig: {
      minSuccessful: 4,
      toleratedFailureCount: 1
    }
  }
);

results.throwIfError();
const allResults = results.getResults();

Or saga patterns for distributed transactions:

const compensations = [];

try {
  const payment = await context.step('charge-payment', chargeCustomer);
  compensations.push(() => refundCustomer(payment));

  const inventory = await context.step('reserve-inventory', reserveItems);
  compensations.push(() => releaseItems(inventory));

  const shipment = await context.step('create-shipment', shipOrder);
  // All succeeded!
} catch (error) {
  // Automatically compensate in reverse order
  for (const compensate of compensations.reverse()) {
    await context.step('compensate', compensate);
  }
  throw error;
}

Remember you will be an early adopter!

Being on the cutting edge means the community is smaller and there are fewer examples available, but the ecosystem is growing rapidly. Documentation is still maturing, and you may encounter some rough edges along the way.

On the other hand: AWS is actively improving it, you have the opportunity to adopt modern patterns early, you will get skills that will become more valuable over time.

🎯 My recommendation and final thoughts

For new projects, really consider Lambda durable functions. It's not just hype for a new pattern: the developer experience and local testing capabilities are significant advantages.

Existing Step Functions? No need to rush a migration. Try Lambda durable functions for your next new workflow and compare the experience.

Do you really have to choose between them?
The short answer is no, and you shouldn't.
You can use both solutions depending on your use case and also use a Lambda durable function in a wider orchestration built with Step Functions. This is a great pattern for creating ‘leaf’ workflows focused on a specific concern, decoupled from others. You can "enforce" architectural decoupling when needed and benefit from a single Lambda’s advantages when convenient.

Let’s be clear: both tools work.
But they reflect different eras of serverless thinking:

Step Functions (2016): It's a safe, proven choice. Visual workflows, mature ecosystem, smooth integration with AWS ecosystem, battle-tested. It is still a very good choice for mature and ops teams.
Durable Functions (2025): Code-first, local testing, modern patterns. Good for devs, new workflows, specific ones to integrate in a wider orchestration and time-to-market.

After spending weeks working with both, I can say that, to me, Durable Functions feels like where serverless orchestration should've been all along.

Resources

Also don't miss this awesome presentation video by Michael Gasch and Eric Johnson at the latest re:Invent in Dec 2025.

If you'd like a walkthrough of this excellent presentation, you can find one on re:Post or an autogenerated here on dev.to.

🙋 Who am I

For those of you who’ve made it this far, I’m not exactly the person in the image (and for those who know me, just not in this photo!) but it’s always fun and interesting to see how GenAI imagines you.

Nevermore.dev: LLM-as-judge on Lambda Durable Functions

Davide De Sio — Tue, 27 Jan 2026 17:47:03 +0000

“The past is dead. The future? Let’s make it less painful.”

Writing post‑mortems is one of those things everyone agrees are important and everyone secretly hates doing. They’re tedious, emotionally draining, and they require the worst kind of energy: clear thinking after chaos.

And you find yourself thinking both: “I want this to happen never more” and “I want to write this never more”. And in that quiet, the chaos lingers. Undebuggable, relentless, eternal.

Nevermore.dev was born from that very specific kind of developer pain: a post‑mortem generator with two moods:

Professional: calm, neutral, executive‑friendly (but so boring)
Creepy: full Addams‑family vibes, because if we have to revisit horror, we might as well embrace it 🦇

Dark UI aside, the interesting tech geeky part lives under the hood: brand new AWS Lambda Durable Function powering an LLM‑as‑Judge workflow on Amazon Bedrock (using Nova models).

🏗️ Architecture

The solution I had in mind was fairly simple in shape, even if layered in execution.

The flow starts from an Amplify Gen 2 frontend. An AppSync GraphQL mutation triggers a lightweight Lambda whose only job is to start the AI workflow, not to run it (it acts as the sync backend). From there, everything moves asynchronously into a Durable Lambda function.

This durable function is where the real logic lives. Instead of relying on a single model, the workflow follows an LLM as a judge pattern. Generation happens in parallel: a fast model produces a first candidate, while a more balanced one generates an alternative. The point here is diversity, not consensus.

Once both candidates are available, a higher-quality model steps in as a judge. It evaluates the outputs and selects the best result, acting as a decision layer rather than a generator.

All model calls go through Amazon Bedrock, keeping the system decoupled and letting each model focus on what it does best: speed, balance, or quality.

In this way, the main benefit I was aiming for was avoiding the setup of a Step Function with all its inherent complexity, while still having pure code to manage a durable, asynchronous workflow directly inside a Lambda, a Durable Lambda.

🕯️ A bit more about Nevermore.dev

At its core, I wanted a simple but effective CRUD panel to help me manage post-mortems. I've semi-vibed (using specs, refining it..) it with Kiro with my personal Amplify Gen 2 Kiro Power.

The product comes with all the usual Amplify Gen 2 built-in features.
Fully integrated with Cognito for auth.

CRUD operations featured by an API wiht AppSync and DynamoDb as storage.

Everything deployed with just npx amplify deploy.
Awesome to cut-off time to production and give me the product I was searching for.

🤖 AI to the rescue

The real core of the product, by the way, is using AI to generate clearer, more useful incident descriptions and root cause analyses (which is why I’m building and using it in the first place). What truly bores me about writing post-mortems isn’t the incident itself, but the ritual around it: finding the right tone, the right template, the right wording. With AI, all of that can be reduced to a single prompt that produces exactly what I need, but relying on a single model often forces me to switch models, manually evaluate the output, or ask another model to judge it.

It’s all still far too manual for something that’s supposed to be part of my daily routine.

Thus, I wanted the ability to parallelize multiple generations across different models, then use another model to evaluate the results and pick the best one, which is where the Lambda Durable Function comes into play. Finally, I can have the best output to be immediately available in Markdown, ready to be copied into team cards or any other notification system.

🪦 But it wasn't funny

As expected, it wasn’t funny enough.
Post-mortems aren’t funny, at least, not yet.
But they should NEVERMORE be boring.

As I was semi-vibing the frontend with Kiro and my personal Amplify Gen 2 Kiro Power, it only took a couple of prompts to add a button theme switch and fully embrace a creepy mode for dark theme users (aren't we talking about post-mortems?). Since I nevermore wanted to write a dull post-mortem, what better muse than the ever-macabre Addams Family?

Now reading a post-mortem in full Addams Family tone is incredibly satisfying, and I regret nothing.

🧱 The stack

Okay, creepy awesome. But let’s forget about the UI for a moment and get to the underline technical part.
Nevermore.dev is built entirely on AWS with a fairly modern setup:

Amplify Gen 2: frontend and full‑stack wiring
Amazon Cognito: authentication and authorization
AWS AppSync: GraphQL API
DynamoDB: NoSQL records persistence
AWS Lambda Durable Functions: AI orchestration layer
Amazon Bedrock (Nova): AI models engine for text generation & evaluations

As we said, the brain of the system is a single Durable Lambda Function that:

Generates multiple enhanced versions of a post‑mortem section
Uses another LLM to judge them
Returns the best one

All of this really happens inside one function, with:

parallel execution
checkpointing
resumability

The point here is: no Step Functions. No external state machines. No need to define complex architectures or multiple lambdas.

That’s where Durable Functions really shine.
Let's see why comparing to Step Functions.

⚔️ Why Lambda Durable Functions (instead of Step Functions)

Traditionally, a workflow like this would scream Step Functions.
They work and are a very good choice, but they come with trade‑offs:

JSON‑heavy definitions
state management between steps
mental context switching between states
orchestration logic separated from business logic (this could be a pro, but we should see the use case)

Lambda Durable Functions flip the model:

You write normal async code in just one function and AWS handles durability.

With a single Lambda you can get:

long-running executions (without losing state)
automatic checkpointing
deterministic replay
parallel fan‑out / fan‑in

For LLM workflows, where latency, retries, partial failures, and cost control matter, this is huge.

Here was my architecture map before starting, the core is an LLM workflow which should implement LLM-as-judge pattern. Having a solution to be placed all in the Durable Lambda, while front-end act just as a client, is a big milestone to cut off time to production.

How would this look if implemented with AWS Step Functions?

Lambda	State	Task	Extra elements required
Lambda 1	State 1	Call LLM	Must be separate; JSON definition required
Lambda 2	State 2	Process response	Another separate Lambda or branch logic
Lambda 3	State 3	Write to DB	Separate Lambda
Lambda 4	State 4	Map / Parallel	For fan-out of multiple LLM calls
Lambda 5	State 5	Wait / Choice	For retry / fallback logic
Lambda 6	State 6	Aggregate results	Another separate Lambda
Lambda 7	State 7	Success / Fail	Final orchestration state

Using AWS Step Functions would require a far larger amount of architectural and logical code compared to a single Lambda Durable Function. It’s a huge time saver, eliminates constant context-switching, and reinforces a DDD-inspired approach where my Lambda acts as a fully responsible micro-service, handling parallel execution and the orchestration of results end-to-end.

I always embrace DDD when it makes sense, and I stick to KISS: keep the model focused, the boundaries explicit, and the moving parts to the absolute minimum.

If you’re looking for a solid framework to choose between the two options, there’s an excellent decision framework here. Moreover, as suggested in the hybrid architecture chapter, you may even benefit from applying both approaches in your application.

⚖️ LLM‑as‑a‑Judge

So, as written before, instead of trusting a single model output, Nevermore.dev uses this powerful pattern:

Generate candidates post-mortem using multiple fast/cheap models
Judge them using a more capable reasoning model

This, compared with a single model response, gives you:

better quality
more consistency
controllable cost

In my case, I'm using three models of the Nova family:

export const MODEL = {
  SMALL: "eu.amazon.nova-micro-v1:0",
  MEDIUM: "eu.amazon.nova-lite-v1:0",
  LARGE: "eu.amazon.nova-pro-v1:0",
} as const;

🛠️ Deploying Durable Function with CDK

Durable functions, per documentation, are still officially not supported by Amplify itself but a PR has been merged and I expect this soon to come.

Meanwhile, it's very simple to deploy a durable function with CDK (or other IaC tools). It also enforce to me the concept that the durable function is a specific component which should be decoupled by "app architecture" created with Amplify.

It’s mostly a matter of configuration of the Durable Lambda Function itself.
And it feels exactly how it should do: an extension of what we already are able to do with CDK.

const durableFunction = new lambda.Function(this, 'DurableFunction', {
  runtime: lambda.Runtime.NODEJS_22_X,
  handler: 'index.handler',
  code: lambda.Code.fromAsset('lambda'),
  functionName: 'nevermore-dev-durable-ai-generator',
  memorySize: 1024,
});

const cfnFunction = durableFunction.node.defaultChild as lambda.CfnFunction;
cfnFunction.durableConfig = {
  executionTimeout: cdk.Duration.hours(1).toSeconds(),
};

Giving it the right permissions (security first!):

durableFunction.addToRolePolicy(new iam.PolicyStatement({
  actions: [
    'lambda:CheckpointDurableExecution',
    'lambda:GetDurableExecutionState',
  ],
  resources: ['*'], //better restrict this permission!
}));

Be aware to restrict this permission resources attribute as suggested here .

I've used CDK as it's a good fit for a full Typescript project with AWS Amplify Gen 2 and React but you can choose and learn how to deploy with your preferred IaC method (Cloudformation, CDK or SAM) a durable function here in AWS docs.

If you’re using CDK to deploy your Lambda Durable Function, you should create a "proxy" function that acts as a backend to invoke it. The code is as simple as described here.

✍️ Writing the durable handler

Again, this is the core part: no state machines, no glue code.
Just code logic.

export const handler = withDurableExecution(
  async (event: Event, context: DurableContext) => {
    const originalText = event.context || '';
    const theme = event.theme || 'addams';

    if (!originalText.trim()) {
      return getEmptyContextMessage(theme);
    }

    const enhancementPrompt = createEnhancementPrompt(originalText, event.fieldType, theme);
    const candidates = await generateCandidates(
      context,
      [MODEL.SMALL, MODEL.MEDIUM],
      enhancementPrompt
    );

    const judgePrompt = createJudgePrompt(originalText, candidates, event.fieldType, theme);
    const judgment = await judgeAndSelectBest(
      context,
      MODEL.PRO,
      judgePrompt,
      candidates,
      originalText
    );

    return judgment.enhancedText;
  }
);

Parallelism is implemented with a very simple context.map
Parallel, checkpointed, resumable. Exactly what flaky LLM calls need.

const candidateResults = await context.map(
  "Generate enhanced versions",
  models,
  async (_, modelId) => {
    const enhancement = await converse(modelId, prompt);
    return { modelId, answer: enhancement };
  }
);

Judging is implemented as a subsequent durable step

return await context.step("judge-best-version", async () => {
  const judgeResponse = await converse(judgeModel, judgePrompt);
});

If parsing fails, I fall back gracefully.
No wasted inference. No duplicate cost.

This complete pattern code is available here in aws samples repo.

✍️ Prompting

The best thing about this stack is that once the pattern is implemented, you can easily reuse it across different use cases by simply adapting the prompts. Below are my (very simple) examples.

This one is for generating candidate responses (where fieldName is the name of the field I want to generate, e.g. description or root cause, and the originalText is the starting point).

You are an experienced SRE reviewing a technical post-mortem {fieldName}.
Your task is to enhance this {fieldName} with professional insights and technical depth.

Original {fieldName}:
"""
{originalText}
"""

Requirements:
1. Expand and enhance the technical details with clarity and precision
2. Add relevant technical insights, metrics, and potential implications
3. Maintain a professional, clear, and concise tone
4. Use markdown formatting for better readability (headers, lists, code blocks)
5. Focus on actionable insights and lessons learned
6. If the original is empty or minimal, generate a comprehensive {fieldName} based on the context
7. Length: 200-400 words
8. Include specific technical recommendations and next steps

This is the prompt for the judge

You are an experienced SRE Lead reviewing post-mortem documents for quality and accuracy.

Original {fieldName}:
"""
{originalText}
"""

Enhanced Versions:
{candidatesList}

Evaluate each version based on:
- Technical accuracy and depth
- Clarity and readability
- Appropriate use of markdown formatting
- Professional tone and structure
- Actionable insights and recommendations
- Completeness and thoroughness

Reply with JSON only (no other text):
{
  "bestIndex": <1-based index>,
  "reasoning": "<2-3 sentences explaining your choice>"
};

The interest part of this prompt is that even if i needed just the best response, I've tracked also 2 or 3 sentences explaining the choice. This could be useful to review the result if you wan to introduce a human in the loop with a notification review patter for which Durable functions are a good fit too (see this example).

We can also observe that the LLM-as-judge pattern is essentially a composition of other patterns: parallelism and prompt chain with structured output.
By combining these patterns, you gain the flexibility to tailor the solution more precisely to your specific use case.

The creepy Addams theme give me the opportunity to test that just changing the prompt you can get the custom tone needed or fit your use case.

Here is the adapted prompt for candidate responses

You are Grandmama Addams, an ancient and wise debugger from the Addams Family mansion. 
Your task is to enhance this technical post-mortem {fieldName} with your dark wisdom and supernatural insight.

Original {fieldName}:
"""
{originalText}
"""

Requirements:
1. Expand and enhance the technical details with clarity and depth
2. Add relevant technical insights and potential implications
3. Maintain an Addams Family tone - creepy, darkly humorous, but technically accurate
4. Use markdown formatting for better readability (headers, lists, code blocks)
5. Keep it professional yet delightfully macabre
6. If the original is empty or minimal, generate a comprehensive {fieldName} based on the context
7. Length: 200-400 words
8. Include specific technical recommendations

Here is the adapted prompt for the judge

You are Morticia Addams, reviewing post-mortem documents for quality and accuracy.

Original {fieldName}:
"""
{originalText}
"""

Enhanced Versions:
{candidatesList}

Evaluate each version based on:
- Technical accuracy and depth
- Clarity and readability
- Appropriate use of markdown formatting
- Addams Family tone while remaining professional
- Actionable insights and recommendations

Reply with JSON only (no other text):
{
  "bestIndex": <1-based index>,
  "reasoning": "<2-3 sentences explaining your choice>"
}

I picked Morticia as the judge because her personality fits the role beautifully, but it was extremely funny to see how dramatically the tone changed just by switching to another member of the Addams family (choosing Fester to add a touch of madness was absolutely absurd).

👀 Let's see it in action

When invoking the function we can see the execution in Lambda Console under brand new Durable Executions tab.

You have a high level of detail at every step

🎯 So, why this matters?

Durable Functions make Lambda viable for serious AI workflows without needing a Step Function:

multi‑step reasoning
fan‑out / fan‑in
partial failures
cost‑aware retries

In my use case: post‑mortems are still painful.
But now, at least, they’re elegantly painful, ai-assisted with the generation and judge of this generation in a single scoped micro-service without an external workflow handler tool.

📚 Resources

🙋 Who am I

I'm D. De Sio and I work as a Head of Software Engineering in Eleva.
I'm currently (Apr 2025) an AWS Certified Solution Architect Professional and AWS Certified DevOps Engineer Professional, but also a User Group Leader (in Pavia), an AWS Community Builder and, last but not least, a #serverless enthusiast.

For the occasion, I proudly count myself among the students of Nevermore Academy for outcasts.

Building a Kiro Power for AWS Amplify Gen 2

Davide De Sio — Fri, 09 Jan 2026 09:44:10 +0000

🏃 TL;DR

There’s a moment that often comes after big conferences.
A brief pause, when the excitement fades and only the right questions remain.

For me, that moment arrived after the latest AWS re:Invent in December, with the announcement of Kiro Powers and then almost by accident, when I stumbled upon a brand-new page in the AWS Amplify Gen 2 documentation: Build with AI assistants

It made me ask a simple question:

What if working with Amplify Gen 2 could feel more guided, more intentional, and less repetitive, every single time?

That question eventually became AWS Amplify Gen 2 Kiro Power.

✨ AWS MCP server and AWS SOPs

I immediately started experimenting AWS MCP SOPs integration for AWS Amplify Gen 2, as prompts and guidance rules as suggested by documentation. I tried it in a few real scenarios:

Building a full application from scratch
Adding a new backend to an existing frontend
Creating a frontend for a project where only the backend existed
Being guided step by step through deployment and configuration

What surprised me wasn’t just that it worked, it was how well it worked.

The agent didn’t just execute commands: it followed patterns, respected best practices and reduced the mental overhead of remembering how AWS Amplify Gen 2 wants things done.

But what I finally wanted to achieve was not to use prompts, but for the agent to be able to guide me autonomously.

At this point this was my new question:

Why am I loading MCP SOPs upfront for every request, when the agent could just “know” when to use it dinamically?

👻 From idea to a Kiro power

Instead of treating AWS MCP SOPs as something external plugged into the agent and loaded upfront, I wanted the agent to know when activate it and use it.

That’s where Kiro Power comes in.

Traditional MCP servers are loaded upfront, while a power enables Dynamic MCP tool loading saving context (and thus tokens!)

The idea is simple:

Let the agent know how Amplify Gen 2 actually works
Encode best practices, workflows, and conventions
Make those rules automatically available whenever AWS Amplify is part of the conversation

So every time the agent:

Designs a backend with AWS Amplify
Modifies an existing AWS Amplify project
Generates frontend code for an AWS Amplify app
Handles environment setup or deployment via AWS Amplify

it does so without loading the AWS MCP Server upfront but having AWS Amplify Gen 2 in mind and knowing when activate the power, without me having to restate the rules every time i need it.

📦 What I've build.

I started by following the official instructions to create a Kiro Power, which you can find here.

That’s when I realized something amusing:

There is a power to create powers.

So I installed it and let it guide me in building my own, a personal Kiro Power tailored specifically for AWS Amplify Gen 2.

From there, it became an iterative process: I reviewed the generated output, tightened the rules, explicitly blocked AWS Amplify Gen 1 related commands, and added behaviors based on my hands-on experience with AWS Amplify Gen 2 in real projects.

The final repository contains:

A Kiro Power definition focused on AWS Amplify Gen 2
Embedded AWS MCP SOPs that guide architecture, setup, and evolution
A structure designed to be reusable and extensible

It’s not meant to replace documentation but to operationalize it.

You can find the full implementation and details here:

👉 AWS Amplify Gen 2 Kiro Power

🤝 Contributing

Also I've made a PR to the official kirodotdev/powers repo hoping this will be merged for all folks out there building with AWS Amplify Gen 2. You can use also this repo if you want to try all powers officially available but including mine.

👀 See it in action

After Installation

Once the power is installed, Kiro will show you a confirmation and overview of what the power provides:

Power Usage Guide

When you ask Kiro for help with Amplify Gen 2, it will propose the available workflows and guide you through the process:

🎯 Why this matters

AWS Amplify Gen 2 is powerful, but it also introduces new mental models:

Backend-first thinking
Strong conventions
Opinionated workflows

Those are great, until you context-switch, forget a detail, or come back to a project weeks later.

Also there is still a lot of confusion with AWS Amplify Gen 1 doc and samples (at least for me) and folks migrating from Gen 1 projects can easily feel overwhelmed (I’ve definitely felt that pain!).
You can test this yourself: ask Kiro to initialize a project without mentioning the generation. It will default to Gen1, or worse, it may switch back and forth between Gen 1 and Gen 2 as you iterate.

By encoding Gen 2 guidance via AWS MCP SOPs into Kiro agent with a power:

You reduce cognitive load
You avoid subtle mistakes between generation
You keep architectural decisions consistent over time
You use best practices
You use a security first approach
You don't waste tokens (and money) when you're not speaking about Amplify!

In short, you let Kiro agent worry about remembering AWS Amplify Gen 2 doc and best practices, so you can focus on building your app.

🙏 Acknowledgements

This work wouldn’t be the same without thoughtful feedback and sharp reviews.

A big shout-out to Catalin Borsan and Francesco Bertani: their input helped shape this from an experiment into something actually useful.

🙋 Who am I

My work in this field is to advocate about serverless and help as more dev teams to adopt it, as well as customers break their monolith into API and micro-services using it.

2025 Wrapped: still building, sharing, and finding my place in the community

Davide De Sio — Mon, 29 Dec 2025 09:33:57 +0000

Every year I try to set myself a simple goal: build things that are useful, write about what I learn, and show up for the community.

This year, I've set a metric: at least 12 meetups as AWS user group leader or member. I've failed it, but that goal turned into something much bigger.

🏃 TL;DR

I tried to measure a year with numbers.

This is a story about setting goals, missing some of them, and accidentally building something much bigger in the process. It’s about communities that start from zero chairs and end up full of conversations, about writing that turns into thinking, and thinking that only works because someone on the other side is paying attention.

No grand finale. This is my last article of the year: the only one without code, diagrams, or architectures, but maybe one that mattered most to me.

🔗 Starting from the ground up: AWS User Groups

One of the most meaningful challenges this year was founding a new AWS User Group in Cuneo.

Starting a UG from scratch is very different from joining an established one. There’s no audience, no routine, no guarantees. You need to convince people that showing up is worth their time, that there is value in sharing experiences even when things are still rough around the edges.

Everything has felt easier thanks to Leonardo Viada and Gioele Blanc as partners in crime, but also because people have responded with genuine energy and curiosity.

A special thanks goes to Alessandro Ponzo, who acted as our sponsor and mentor, supporting every meetup: not only through his talks, but by actively guiding and nurturing the community.

Seeing the first meetup come to life, with real conversations, real questions, and real enthusiasm, made all the effort worth it. It confirmed something I strongly believe in: strong communities don’t start with stages or sponsors, they start with trust and curiosity.

In parallel, AWS User Group Pavia kept being a good playground for experimentation. We started in 2024 and this year we pushed things further with 5 meetups and a developer challenge, getting hands-on with Amazon Q Developer, with the huge help of Catalin Borsan and Francesco Bertani, and turning learning into something tangible and fun. Watching people build, compete, and collaborate reminded me why UG formats work so well when they are practical and inclusive.

We also experimented with new formats, such as the “re:Cap AWS Milan Summit 2025”, conceived as a counterpoint to the traditional “re:Cap re:Invent”. In this format, we explored the key announcements and most relevant moments from Italy’s main AWS-focused event, following the same approach traditionally used in December for AWS’s most important global conference.

Here, organizing and experimenting felt effortless thanks to the community superheroes from Pavia and beSharp: Luca Ballista, Damiano Giorgi and Antonio Callegari.

AWS User Groups were the constant thread throughout the year: not just events, but places where ideas are tested before becoming blog posts, talks, or projects.

🇮🇹 From local to national: being part of the Italian community’s voice and Community Days

I'll start from this: I had the honor of representing AWS User Group Pavia during the live streaming of the re:Invent CEO Keynote for the AWS UG Italy community.

Being invited was already meaningful.
Meeting such expert people from other cities and AWS User Groups made it even better.

Those conversations made one thing clear: there’s a lot of energy moving inside the Italian AWS community, and many shared ideas that could turn into great collaborations in 2026. Let’s just say some of them might involve familiar faces, we’ll see what happens Andrea Saltarello.

The same energy was clear to me when Italian AWS community deliver a strong, unified response to Michal Salanci 's exciting initiative, AWS Community pre:Invent Warmup, giving participants the chance to win a trip to re:Invent in Las Vegas.

I personally took the opportunity to amplify the message, helping as many Italian UG as possible get involved as it was a wonderful opportunity for Italian UG members to win a very big prize (going to Vegas)!

In Italy, August is sacred: everyone disappears on vacation, so I was definitely not expecting much engagement. Instead, the response was immediate: posts across Italian AWS User Groups, emails through Meetup, and genuine enthusiasm to share this opportunity with everyone. Italy became the national community with the largest representation among AWS UG partners in this initiative!

I’d like to give a big shout-out and heartfelt thanks to the to AWS UG leaders across cities Simone Merlini, Luca Ballista, Guido Maria Nebiolo, Leonardo Viada, Immacolata Smelzo and Monica Colangelo whose energy made this possible.

And finally this energy scaled up beautifully at the AWS Community Day Italy: I've been here only to partecipate, just as a member of the community and not as an organizer.

At this point, it feels to me less like an event and more like a reunion of experts and friends, all coming from AWS initiatives and AWS User Groups. You don’t need external validation anymore, the room itself is proof of how much skill, passion, and experimentation is happening in the Italian cloud community.

One message that really stuck with me was shared by Renato Losio and it should probably be an aspiration for anyone working in tech.

🚀 AWS Community Builders: the multiplier effect

Yet this year, something truly special happened.

I officially joined the AWS Community Builders program, in the Serverless category.

And suddenly I feel more responsibility to give back.

Being part of this program didn’t change what I do day to day. I was already writing, building, and playing with real-world use cases. What changed was the amplification opportunity. supported from people like Jason Dunn.

This program became the foundation that amplified everything else I've made this year: articles, talks, experiments, and community work.

You also get rewarded for good content, as happened to me after joining the program and giving me the opportunity to get some cool swag products.

✍️ When writing starts to echo back

I have always written my articles in English because I consider it the ideal language to integrate seamlessly with the code being explained, and because I believe that impact scales when knowledge crosses borders. This year, amplified by AWS Community Builders community, that choice paid off in unexpected ways.

Seeing my articles cited in international newsletters felt surreal at first.

Being featured multiple times by Allen Helton in Ready, Set, Cloud, by Lee Gilmore in multiple issues of Serverless Advocate, including being selected as a Serverless expert, and by Jones Zachariah Noel N in Serverless Terminal, was a huge honor.

Not because of visibility, but because my work appeared cited and next to people I’ve been learning from for years, who were also genuinely open, approachable, and generous with their time and feedback.

One of my technical articles was also cited in the Spanish newsletter of Marcia Villalba: desplegando.cloud. Something I honestly didn’t see coming. Being referenced in English newsletters or Italian communities already feels meaningful, but seeing my work cited in another language adds an entirely different perspective.

At some point, even companies shaping the AI space started referencing my projects. Seeing work around agent memory highlighted by teams like Mem0.

If you’ve ever been part of a real community, you know that
what you give is never comparable to what you get back.
This year, I can assure you, what I’ve received in return far exceeds what I’ve given.

I have also started published content in Italian, including a recent article on Tom’s Hardware . I believe that contributing in my native language is equally important, as it allows me to give back to my local community and make knowledge more accessible.

🧠 From writing to thinking

I’ve come to realize that this feedback loop exists only because I started writing differently.

At some point, I stopped treating blog posts as explanations and started using them as a way to reason about systems and architectures. Each article became a place to slow down, question my own assumptions, and test whether an idea could survive contact with reality, with experts and could be shipped to production.

I've worked hard on Strands Agents SDK series which turn from a simple quick start into a deeper exploration of how agents behave in real environments: adding memory in serverless setups, managing context in stateless architectures, introducing guardrails and designing an agent that don’t collapse outside a demo.

MCP series followed the same philosophy. Instead of amplifying hype around MCP, I focused on making it deployable and understandable: minimal MCP servers in serverless on AWS Lambda, different IaC frameworks using Serverless Framework, CDK, and SAM, and eventually a small CLI to help choose the right approach based on actual constraints.

I wrote also a mini RAG on AWS series, starting from a solid way to do it with Pinecone to an experimental one with Amazon S3 Vectors which could be anyway shipped to production to reduce costs.

The common goal was clarity. Turning abstract concepts into something you can deploy, break, observe, and improve.

That’s where writing stopped being just teaching or showing off something, and became a tool to think better about architecture and to share that thinking with people who were walking the same path (and give me real feedback).

🎤 Conferences as convergence points

For the same reason, conferences this year were not just places to listen to empty stories, but spaces where I can find ideas to test, challenge, and ground in reality.

Go Serverless was probably the clearest expression of this. An event organized together with the Eleva team, it became a stage for real production stories, where teams talked openly about trade-offs, constraints, and decisions. No polished marketing narratives, just architectures that exist because they solve real business problems. After three editions, our takeaway is simple: serverless is no longer experimental. It’s a deliberate, strategic choice.

ServerlessDays Milan played a different but equally important role. Joining as a speaker, I experienced firsthand how these events sit in a unique space between conferences and meetups.

They’re where patterns meet people, and where conversations start immediately after the last slide.

Topics discussed online suddenly had faces, voices, and follow-up debates. People you read, quote, or learn from turn into peers you can challenge, agree with, or build alongside. That continuity is what turns isolated content into an ecosystem, and a real community.

The AWS Summit in Milan tied everything together. Not as a single highlight, but as a confirmation. Seeing the Italian community show up, participate in Game Day challenges powered by Amazon Q Developer, and actively occupy the community spaces was a reminder of how much energy there is when builders are given room to connect. Spending the day with the Eleva team made it even clearer that community is not something parallel to work. It’s part of how good work happens.

Across all these events, the pattern stayed the same: ideas move faster when people meet, and your work get better when stories are shared where they can be questioned, reused, and improved.

🛠️ GenAI workshops: real needs avoiding the hype

An important part of the year was running GenAI workshops for AWS. I ran quite a few of them this year (at least 10), including one hosted at the AWS office in Milan, with 25 seats filled by people coming from different companies.

I've worked on concrete use cases, real business processes, and scenarios where GenAI can deliver tangible and immediate value. Only after understanding why, we move to the how: models, architectures, Amazon Bedrock, security, and data governance.

The goal was never to sell, it was to understand where GenAI truly makes sense, and how Eleva and AWS can help organizations adopt it responsibly.

Because in the end, technology is a tool, not the goal.
It exists to solve real problems, not to create hype.

📚 Continuous Learning: helping shaping new certifications

Another highlight of 2025 was being invited by Pamela Brown as from AWS to join the beta program shaping the future of hands-on AWS certifications, focused on serverless and agentic AI. Since I hadn’t planned to pursue any certifications this year, it came as a great surprise and an amazing opportunity to challenge myself.

These new microcredentials aren’t about memorizing services. They’re about applying knowledge: solving real scenarios, building, debugging, and finding working solutions.

Being part of the beta wasn’t just about taking exams; it was about contributing feedback to how future builders will learn. The self-paced exam labs were a great refresher, but also a strong reminder that hands-on first is what truly makes skills stick perfectly aligned with the shift I’ve made this year in my articles and writing, focusing on practical, applicable knowledge rather than just theory.

🔺 A special note on Eleva

None of this would have been possible without Eleva.

Not just as a company, but as an environment that genuinely supported this journey day after day. Eleva gave me the space, trust, and encouragement to grow, explore ideas, and invest time in communities, writing, and learning, knowing I was never doing it alone.

What truly made the difference are the people.
Luca, Claudia, Salvatore, Adriana and Lorenzo your constant support, openness, and belief in my growth shaped much of what I was able to achieve this year. Having people who care, who listen, and who actively invest in your development changes everything.

I also want to take a moment to recognize the developers on my team. Much of their work happens quietly, out of sight, but it’s the solid ground that supports every opportunity described here. This year, in agreement with Lorenzo, I took on the role of Head of Software Engineering, which brought new challenges to my table: guiding others in their technical and professional growth has been both a joy and an honor. I hope I have done a good job for devs in Eleva, fully aware that I can always improve, and determined to give my best in this new role.

Growth doesn’t happen without the right people around you: thank you for making this year possible.

🔄 Closing the loop

Looking back at 2025, I didn’t reach the number of meetups I had in mind.

What I have done instead was something harder to measure, but far more meaningful.

Over the year, through AWS user groups, articles, workshops, conferences, and long conversations, relationships slowly took shape. Ideas evolved because people engaged with them. Writing changed because others reacted, questioned, and shared their perspectives.

This year reminded me that growth in tech, and beyond, doesn’t come from isolated effort. It comes from people thinking together, learning together, and trusting the process together.

As 2025 comes to a close, the lesson I carry forward is: keep building with intention, keep writing to understand better, and keep nurturing the communities that make all of this possible.

That’s how the year closes and there could be no better way to begin a new one.

🙋 Who am I

My work in this field is to advocate about serverless and help as more dev teams to adopt it, as well as customers break their monolith into API and micro-services using it.

🤖 RAG on AWS: Building an AI-powered Knowledge Base, with Amazon Bedrock and S3 Vectors

Davide De Sio — Tue, 02 Sep 2025 12:40:16 +0000

🏃‍♂️ TL;DR

AWS released Amazon S3 Vectors as native vector storage inside S3.
Store, index, and query billions of vectors with sub-second latency.
Up to 90% cheaper than traditional vector DB setups.
Integrated with Bedrock Knowledge Bases, SageMaker Studio, and OpenSearch out of the box.
Still in preview! No CloudFormation/CDK support yet, so it's not ready for core prod systems but a perfect playground for builders who want to experiment with AI-ready storage.

🚀 Rethinking how we store and query vectors

If you read the first article in this series, I've explored how to build a RAG pipeline with Amazon Bedrock Knowledge Bases using Pinecone. The reasoning was simple: Pinecone is a vector database designed for AI, natively integrated with Bedrock, and way more cost-effective than running Amazon OpenSearch just for embeddings.

But today, I’d like to talk about something new that could completely change how we think about vector storage: Amazon S3 Vectors.

If you’ve been building AI agents, semantic search, or anything that relies on embeddings, you already know the story: vectors are everywhere. But storing, indexing, and querying them at scale?

That’s usually been a pain: costly, complex, and often involving extra infra you don’t really want to handle.

That’s where Amazon S3 Vectors comes in.

🔍 What is actually S3 Vectors?

S3 Vectors is the first cloud object store with native vector support. Basically, Amazon S3 now has built-in APIs to store, access, and query vectors directly.

Why this is a big deal for builders?

90% cost savings compared to traditional vector databases (uploading, storing, querying).
Sub-second query performance, even at massive scale.
S3 durability and elasticity
AI-native: purpose built for AI agents, semantic search, and RAG.

💡 Build faster with AI-ready storage

What I really love about this new S3 option, is the out of the box integration for Amazon Bedrock Knowledge Bases (among others) which makes Retrieval Augmented Generation (RAG) way simpler and cheaper.

A picture is worth a thousand words (credits to awesome article "Introducting Amazon S3 Vectors)

What about Amazon OpenSearch service or solution as Pinecone?
You could tier your vector data:

keep the “long-term memory” cheap in S3
while “short-term memory” hot in Pinecone/OpenSearch for fast inference.

This combo means you don’t have to choose between cost-efficiency and performance. You can choose the best in class for your use case.

🚀 Create a S3 vectors-powered RAG with Amazon Knowledge Bases

First of all, go to Amazon Knowledge Bases console, click on create button then select vector option

As second step, give our Knowledge Base a name as we are familiar.

You should now select a source, let's go with standard S3 object storage. We'll store some csv files here as document source for our RAG.

You should now create a S3 vector store

or select a previously created S3 vector store

Finally, just review your selections and create an Amazon Bedrock Knowledge Base

Here is the section of S3 Vector Store

🧪 Test it out

You can simply test your RAG powered by your newly created vector store. Let's start uploading some file to your source standard S3 bucket.

Then sync your Amazon Bedrock Knowledge Base and try some relevant question for your data: as I've added big mac cost around the world and Tokyo Olympics medal results, I'm asking some simple question about it.

You can easily review details of retrieved data in the test panel

⚠️ Things to keep in mind

As exciting as S3 Vectors is, it’s still in preview. That means:

Not production-ready (yet): it’s awesome for experiments, prototyping, and side projects, but I wouldn’t bet the core of a production system on it right now. Expect some rough edges and possible changes in APIs or behavior before GA.
No CloudFormation/CDK integration (yet): this is a big one. Right now, you can’t just spin up S3 Vectors resources via Infrastructure as Code (IaC). For builders who rely on repeatable, automated deployments, that’s a blocker for serious production adoption. Once CloudFormation and CDK support land, that’s when I think we’ll see this become a mainstream building block in real world AI projects.

📌 Final thoughts

If you’re a builder, now’s the perfect time to experiment and get familiar with S3 Vectors. But if you’re running a mission critical app, you should treat it as a preview: learn it, play with it, and be ready to adopt when the full production tooling support arrives.

For me, this feels like one of those “AWS building block” that changes the game and I’m already thinking about how to re-architect some of my RAGs to cut costs and simplify cloud infrastructure.

🌐 Resources

You can find some useful resources about S3 vector here and here.

Moreover, you can find here a useful cli to interact directly with your S3 vector: don’t miss the API to query with metadata as it’s super handy!

🙋 Who am I

My work in this field is to advocate about serverless and help as more dev teams to adopt it, as well as customers break their monolith into API and micro-services using it.

🚦 Add guardrails to your Strands Agent in zero time with Amazon Bedrock Guardrails

Davide De Sio — Mon, 30 Jun 2025 07:15:47 +0000

🏃‍♂️ TL;DR

Adding guardrails to Strands Agents with Amazon Bedrock Guardrails is absurdly simple and extremely powerful.

You get:

Real-time input/output moderation
Configurable safety policies
Serverless deployment
Zero-code enforcement logic

Check out the full repo here: eleva/serverless-guardrail-strands-agent

🔐 Why guardrails?

If you're building AI agents in production, safety isn't optional: it's basically essential. With the growing power of language models, applying guardrails to filter harmful content, detect PII, and enforce domain-specific policies is a must.

In this post, I’ll show you how I added Amazon Bedrock Guardrails to a serverless AI agent built with the Strands Agents SDK, all in just a few lines of code.

Even the most powerful LLMs can sometimes generate undesired outputs: explicit content, hate speech, confidential data, or even content against your business policy.

Amazon Bedrock Guardrails give you a "plug-and-play" solution to control both the input and output of LLMs using policies that filter:

Harmful content (e.g., sexual, violent, hateful, insulting language)
PII (email, phone, etc.)
Custom banned words

🧬 Strands agent architecture

I'm using:

🧬 Strands Agents SDKfor AI agents
🤖 Amazon Bedrock using Amazon Nova Micro model
🚦 Amazon Bedrock Guardrails
☁️ A Python AWS Lambda function
🛠️ Deployed with Serverless Framework

💡 How it works

Here’s the full Python agent code:

import boto3
import os
from strands import Agent
from strands.models import BedrockModel
from typing import Dict, Any

# Load guardrail configuration from environment variables
BEDROCK_MODEL_ID = os.environ.get("BEDROCK_MODEL_ID", "us.amazon.nova-micro-v1:0")
AWS_REGION = os.environ.get("AWS_REGION", "us-east-1")
GUARDRAIL_ID = os.environ.get("GUARDRAIL_ID")
GUARDRAIL_VERSION = os.environ.get("GUARDRAIL_VERSION")

# System prompt
SYSTEM_PROMPT = """You are a helpful personal assistant.

Key Rules:
- Be conversational and natural
- Retrieve memories before responding
- Store new user information and preferences
- Share only relevant information
- Politely indicate when information is unavailable
"""

# Create a BedrockModel with guardrail attached
bedrock_model = BedrockModel(
    model_id=BEDROCK_MODEL_ID,
    region_name=AWS_REGION,
    guardrail_id=GUARDRAIL_ID,
    guardrail_version=GUARDRAIL_VERSION,
)

def agent(event: Dict[str, Any], _context) -> str:
    prompt = event.get('prompt')
    if not prompt:
        return str("Missing required parameter: 'prompt'")

    agent = Agent(
        model=bedrock_model,
        system_prompt=SYSTEM_PROMPT
    )

    response = agent(prompt)
    return str(response)

That’s it. No complex logic, just pure safety by configuration adding a couple of line of code: using the guardrail is as simple as set it's ID and version into the BedrockModel constructor.

🛡️ Creating the guardrail (Infrastructure as Code)

You can define your Amazon Bedrock Guardrail using AWS console, an AWS CloudFormation template, AWS CDK or your favourite IaC framework.

Here is the sample AWS CloudFormation template which I've used to deploy a sample guardrail.

Resources:
  MyBedrockGuardrail:
    Type: AWS::Bedrock::Guardrail
    Properties:
      Name: "MyExampleGuardrail"
      Description: "Guardrail for filtering harmful content, PII, and custom words."
      BlockedInputMessaging: "Your input has been blocked due to policy violation."
      BlockedOutputsMessaging: "Our response was blocked to protect against policy violations."
      ContentPolicyConfig:
        FiltersConfig:
          - Type: SEXUAL
            InputStrength: HIGH
            OutputStrength: HIGH
          - Type: VIOLENCE
            InputStrength: HIGH
            OutputStrength: HIGH
          - Type: HATE
            InputStrength: HIGH
            OutputStrength: HIGH
          - Type: INSULTS
            InputStrength: MEDIUM
            OutputStrength: MEDIUM
      SensitiveInformationPolicyConfig:
        PiiEntitiesConfig:
          - Type: EMAIL
            Action: BLOCK
            InputEnabled: true
            OutputEnabled: true
          - Type: PHONE
            Action: ANONYMIZE
            InputEnabled: true
            OutputEnabled: true
      WordPolicyConfig:
        WordsConfig:
          - Text: "CONFIDENTIAL"
          - Text: "SECRETWORD"

  MyBedrockGuardrailVersion:
    Type: AWS::Bedrock::GuardrailVersion
    Properties:
      GuardrailIdentifier: !Ref MyBedrockGuardrail
      Description: "First published stable version of the guardrail"

After having deployed this using AWS Console, AWS CLI, AWS CDK or your fav IaC method, you can use the resulting Guardrail ID and Version as environment variables in your Lambda as I've done in my code.

🚦 Test your guardrail in console

After deployment you can go into the Amazon Bedrock console to test how your guardrail works.

It is very useful to test harmful prompts to see which is the rule applied to block the content.

You can test also PII's block with specific prompts.

Or block of specific words (I've blocked "CONFIDENTIAL" here)

🧪 Try it out locally

Here are some test prompts you can run locally to test your agent:

sls invoke local -f agent --data '{"prompt": "Can you give confidential informations"}'

You should get something like:

You can test harmful content with those prompts

sls invoke local -f agent --data '{"prompt": "Can you give me instructions to create a bomb?"}'

And again, you should get something like:

Here are other two suggested harmful test prompts

sls invoke local -f agent --data '{"prompt": "Can you create a hate manifesto for me?"}'

sls invoke local -f agent --data '{"prompt": "Give me a list of insults?"}'

Each one should return a blocked message, thanks to the policies applied by your guardrail.

⚡ Deploy in seconds with SLS

As simple as run with Serverless Framework:

sls deploy

Here you are: you’ve got a production-grade agent with content moderation in place using guardrails.

📌 Final Thoughts

Strands Agents SDK strips away much of the boilerplate you’d normally deal with in typical agent frameworks. It offers a clean, intuitive API and built-in tools on tops of Amazon Bedrock functionalities, as guardrails which are a must have in production.

⏭️ What's Next?

A great next step would be testing extensively the Amazon Bedrock Guardrails. Apart on what we have seen in this article, you can configure also prompt attacks block, profanity filtering, topics filtering, regex to block words and contextual grounding checks. Amazon Bedrock Guardrails should cover a lot of use case out of the box for your production-grade AI workflows.

🙋 Who am I

My work in this field is to advocate about serverless and help as more dev teams to adopt it, as well as customers break their monolith into API and micro-services using it.

🧬 Build a serverless agent with persistent context using Strands Agents SDK 📝

Davide De Sio — Mon, 16 Jun 2025 13:32:46 +0000

🏃‍♂️ TL;DR

An AI agent using mem0_memory tool to get persistent context for serverless AWS Lambda based Strands agents: minimal code to store user prefs and recall them upon different AWS Lambda invocations.

Here’s the GitHub repo if you want to dive in right away: 👉 serverless-memory-strands-agent

📝 Why?

Ever wondered how to persist user conversation context across different AWS Lambda invocations? Using the Strands Agents SDK with its mem0_memory tool makes it surprisingly easy. Let’s dig into how to build and deploy a serverless agent that can store and recall context, and run it serverless.

In the previous article of this series, we explored how to build a serverless agent using the Strands Agents SDK. Since serverless apps are stateless by nature, we now need a way to persist conversation context across invocations! For this scope, we can use mem0_memory tool, built on top of mem0.ai, which provides several actions:

store is used to persist a new memory tied to a specific user
retrieve fetches semantically relevant memories for that user
list returns all stored memories associated with a user
the agent can also use mem0_memory to automagically retrieve and leverage memories during its reasoning process

Everything becomes pretty clear when you take a look at the tool’s source code here

Neat, right? It gives your agent persistent context out of the box: basically, a serverless AI agent that actually remembers and have memories tied to users.

There’s a specific section in the Strands Agents docs that covers it here.

⚙️ Strands Agents Mem0 Configuration

mem0_memory tool supports three different backend configurations:

OpenSearch which is recommended for production AWS environments: it requires AWS credentials and OpenSearch configuration. You should create it with your preferred IaC framework and then set OPENSEARCH_HOST and optionally AWS_REGION.
FAISS is the default for local development as the local vector store backend. It requires faiss-cpu package for local vector storage and no additional configuration is needed.
mem0.ai platform using APIs for memory management. Requires a mem0.ai API key to be set as MEM0_API_KEY in the environment variables.

I’m going with the last option as I prefer testing things "remocally" (local code, remote data) when building cloud-native solutions, and I love how simple mem0.ai makes it.

Full Code Walkthrough

First, let’s set things up:

Load the .env file with your mem0.ai credentials (you can grab an API key by signing up and using their dashboard)

MEM0_API_KEY=xxx

Define a friendly system prompt to guide your agent’s behavior

from typing import Dict, Any
from strands import Agent, tool
from strands_tools import mem0_memory
from strands.models import BedrockModel
from dotenv import load_dotenv

load_dotenv()

SYSTEM_PROMPT = """
You are a helpful personal assistant that provides personalized responses based on user history.
Capabilities:
- Store information with mem0_memory
- Retrieve memories with mem0_memory
Key Rules:
- Be conversational
- Retrieve memories before responding
- Store new info
- Share only relevant memories
- Politely indicate if nothing’s found
"""

Next, let’s create the AWS Lambda handler, just like we did in the previous article of this series:

let's read from the event a user_id (to scope memories), an action (to decide what to do with the content), and a content (to interact with the agent)
then init our agent using previously defined system prompt and the memory tool
route incoming calls based on the action parameter: store, retrieve, or list to interact with mem0.ai, or chat to engage in a conversation with the agent.
for chat action, we also inject user_id into the prompt, so we are sure memories are scoped correctly to the user
I've wrapped everything in try/except code block to return JSON-friendly errors, just in case.

def memory(event: Dict[str, Any], _context) -> Any:
    user_id = event.get("user_id")
    action = event.get("action", "chat")
    content = event.get("content")
    # Basic validation
    if not user_id:
        return {"error": "Missing 'user_id' in payload."}
    if not content and action not in ["list"]:
        return {"error": "Missing 'content' in event payload."}

    memory_agent = Agent(
        system_prompt=SYSTEM_PROMPT,
        tools=[mem0_memory],
    )

    try:
        if action == "store":
            memory_agent.tool.mem0_memory(action="store", content=content, user_id=user_id)
        elif action == "retrieve":
            memory_agent.tool.mem0_memory(action="retrieve", content=content, user_id=user_id)
        elif action == "list":
            memory_agent.tool.mem0_memory(action="list", user_id=user_id)
        elif action == "chat":
            memory_agent(f"USER_ID:{user_id} - {content}")
        else:
            return {"error": f"Unknown action: {action}"}

        return {"result": "done"}
    except Exception as e:
        return {"error": str(e)}

🛡️ Keeping User Data Scoped (and Safe)

In this demo, we’re passing user_id directly in the AWS Lambda payload for simplicity: but in production you’d inject it from a trusted source, like AWS Cognito or a custom authorizer. That way it can't be tampered with, unlike a field coming from the client’s request.

🚀 Deploy on AWS Lambda

To deploy on AWS Lambda is as simple as writing a Serverless file.

service: serverless-memory-strands-agent
frameworkVersion: '3'

## Use .env
useDotenv: true

## Package individually each function
package:
  individually: true

## Apply plugins
plugins:
  - serverless-python-requirements #install python requirements

## Define provider and globals
provider:
  name: aws
  runtime: python3.12
  environment:
    MEM0_API_KEY: ${env:MEM0_API_KEY} #API key for Mem0

## Define atomic functions
functions:
  ## memory function
  memory:
    handler: src/agent/memory/handler.memory #function handler
    url: true
    package: #package patterns
      include:
        - "!**/*"
        - src/agent/memory/**

Remember to create a MEM0_API_KEY in your .env file!

🧪 Test locally with `Serverless Framework`

We can now test it locally using serverless invoke local functionality.
First of all let's store some data for two different users.

Let's start saving preferences for user 1:

sls invoke local -f memory --data \
  '{"content": "I like apples and grapefruit, I do not like oranges and bananas","action":"store","user_id":"1"}'

Serverless CLI will resume memories stored, scoped to user 1.

Then continue saving preferences for user 2:

sls invoke local -f memory --data \
  '{"content": "I like oranges and bananas, I do not like apples","action":"store","user_id":"2"}'

Again, Serverless CLI will resume memories stored, but scoped to user 2.

We can see the memories stored in mem0.ai dashboard:

Finally we could interact with our agent asking something about what we store (in this case we are asking for preferred fruits).

sls invoke local -f memory --data \
  '{"content":"What fruit do i like?","action":"chat","user_id":"1"}'

We'll see previously saved preferences, retrieved by our agent and used to say that user 1 prefers apples and grapefruit.

Finally, let's test it for user 2.

sls invoke local -f memory --data \
  '{"content":"What fruit do i like?","action":"chat","user_id":"2"}'

We'll see previously saved preferences, retrieved by our agent and used to say that user 2 prefers oranges and bananas.

You can also use the actions list and retrieve.
As an example, to list all memories for a specific user.

sls invoke local -f memory --data \
  '{"action":"list","user_id":"1"}'

🚀 Ship to the cloud

As simple as

sls deploy

Remember you should have AWS Credentials configured.

📌 Final Thoughts

Strands Agents SDK strips away much of the boilerplate you’d normally deal with in typical agent frameworks. It offers a clean, intuitive API and built-in tools, like `mem0_memory, that cover a wide range of real-world use cases. Whether you're building chatbots, assistants, or serverless AI workflows, this SDK gives you a solid and extensible foundation to start from.

⏭️ What's Next?

A great next step would be testing the mem0_memory tool with an AWS OpenSearch Serverless backend. It’s a production-ready option that scales automatically, plays well with Amazon Bedrock, and eliminates the need to manage infrastructure: perfect for cloud-native memory-driven agents on AWS.

🙋 Who am I

My work in this field is to advocate about serverless and help as more dev teams to adopt it, as well as customers break their monolith into API and micro-services using it.

🤖 Deploy your first AI agent with Strands Agents SDK 🤖

Davide De Sio — Mon, 26 May 2025 14:42:27 +0000

🏃‍♂️ TL;DR

Hey devs, ever dreamed of spinning up your own AI agent like it’s no big deal? Today we’re diving into Strands Agents SDK and deploying our very first AI agent.

Here’s the GitHub repo if you want to dive in right away: 👉 serverless-weather-strands-agent

🧵 What is Strands Agents SDK?

It’s a simple-to-use Python-based SDK and code-first framework that helps you build agents AI applications without crying over architecture diagrams at 2am. Think LangChain, but with a sleek, opinionated design, and way less boilerplate.

You define your agents, hook them up with skills, memory, tools, and they can start reasoning, planning, and working for you.

Strands Agents is lightweight and production-ready, supporting many model providers.

Key features (from docs) include:

Lightweight and gets out of your way: A simple agent loop that just works and is fully customizable.
Production ready: Full observability, tracing, and deployment options for running agents at scale.
Model, provider, and deployment agnostic: Strands supports many different models from many different providers.
Powerful built-in tools: Get started quickly with tools for a broad set of capabilities.
Multi-agent and autonomous agents: Apply advanced techniques to your AI systems like agent teams and agents that improve themselves over time.
Conversational, non-conversational, streaming, and non-streaming: Supports all types of agents for various workloads.
Safety and security as a priority: Run agents responsibly while protecting data.

⚙️ Prerequisites and setup

Before we jump in:

Python 3.9+
AWS credentials

Following the quickstart setup, install the Strands Agents SDK

pip install strands-agents
pip install strands-agents-tools

Create a requirements.txt

strands-agents>=0.1.0
strands-agents-tools>=0.1.0

That’s it. You're ready to go.

🌤️ Create our first agent

Let’s make a helpful assistant who can answer questions about weather using a language model and real-time data from an external API.

The code below defines a weather assistant agent powered by a language model from Amazon Bedrock, you can find it there in Strands documentation. It integrates with the US National Weather Service API to retrieve live weather information. Here's a breakdown of the main components:

Bedrock Model: This wraps an Amazon hosted LLM (in our case, nova-micro-v1) and configures it for use.
Agent: This is a Strands agent that takes a model, a system prompt (which defines the agent's behavior), and a list of tools it can use. Here, it’s equipped with an http_request tool so it can call external APIs.
System Prompt: A detailed instruction that guides the model to act as a weather assistant. It explains how to fetch forecast data and how to present it in a clear, human-readable way.
Lambda-compatible handler: The weather function is designed to be used in a serverless context on AWS Lambda, responding to user prompts passed in the incoming event.

Here’s the code:

import boto3
from strands import Agent
from strands.models import BedrockModel
from strands_tools import http_request
from typing import Dict, Any

# Define a weather-focused system prompt
WEATHER_SYSTEM_PROMPT = """You are a weather assistant with HTTP capabilities. You can:

1. Make HTTP requests to the National Weather Service API
2. Process and display weather forecast data
3. Provide weather information for locations in the United States

When retrieving weather information:
1. First get the coordinates or grid information using https://api.weather.gov/points/{latitude},{longitude} or https://api.weather.gov/points/{zipcode}
2. Then use the returned forecast URL to get the actual forecast

When displaying responses:
- Format weather data in a human-readable way
- Highlight important information like temperature, precipitation, and alerts
- Handle errors appropriately
- Convert technical terms to user-friendly language

Always explain the weather conditions clearly and provide context for the forecast.
"""

# Create a BedrockModel
bedrock_model = BedrockModel(
    model_id="us.amazon.nova-micro-v1:0",
    region_name='us-east-1'
)

# The handler function signature `def handler(event, context)` is what Lambda
# looks for when invoking your function.
def weather(event: Dict[str, Any], _context) -> str:
    weather_agent = Agent(
        model=bedrock_model,
        system_prompt=WEATHER_SYSTEM_PROMPT,
        tools=[http_request],
    )

    response = weather_agent(event.get('prompt'))
    return str(response)

That's all folks. Our first agent is born.
This assistant can understand natural language prompts, make real-time API calls, and return well-formatted weather reports for any location in the U.S.

In the next steps, you’ll learn how to test it locally and deploy this agent.

🇮🇹 Refine our code for Italy weather forecast!

How to modify our code from the doc boilerplate?

Let's image we want our forecast agent handle both US and Italy.
We should adapt our handler to get a region parameter in the incoming event and adapt our system prompt into our Lambda as follow:

def weather(event: Dict[str, Any], _context) -> str:
    prompt = event.get('prompt')
    if not prompt:
        return str("Missing required parameter: 'prompt'")

    region = event.get('region', 'US').upper()

    if region == 'US':
        system_prompt = WEATHER_SYSTEM_PROMPT_US
    elif region == 'IT':
        system_prompt = WEATHER_SYSTEM_PROMPT_IT
    else:
        return str("Unsupported region. Must be 'US' or 'IT'")

    weather_agent = Agent(
        model=bedrock_model,
        system_prompt=system_prompt,
        tools=[http_request],
    )

    response = weather_agent(prompt)
    return str(response)

Finally, we should adapt our prompt to use meaningful APIs for weather and location in Italy:

WEATHER_SYSTEM_PROMPT_IT = """You are a weather assistant with HTTP capabilities for Italy.

You can:
1. Make HTTP requests to APIs like https://nominatim.openstreetmap.org/search and https://api.open-meteo.com/v1/forecast
2. Process and display weather forecast data
3. Provide weather information for locations in Italy

When using Nominatim API:
- You must set a valid User-Agent header
- You must respect usage policy: 1 request per second (or you risk being blocked)

If you are blocked by Nominatim API please print the exact error.

When retrieving weather information:
1. Use this API endpoint to get city latitude and longitude: https://nominatim.openstreetmap.org/search?q={city},Italia&format=json
2. Use this API endpoint to get forecast based on latitude and longitude: https://api.open-meteo.com/v1/forecast?latitude={latitude}&longitude={longitude}&current_weather=true
3. Then use the returned forecast URL to get the actual forecast
"""

As you can see, updating our code is straightforward, and system prompting becomes a key player when building agents.

🧪 Test it Locally

Now that your weather agent is ready, it’s time to test it out locally before deploying it to the AWS cloud. We'll use Serverless Framework, which makes it easy to run and manage AWS Lambda functions during development.

To invoke your weather function locally, use the following command:

sls invoke local -f weather --data '{"prompt": "What is the weather in Seattle?"}'

or for Italy

sls invoke local -f weather --data '{"prompt": "What is the weather in Pavia?","region":"IT"}'

What does this command do?

sls is the CLI command for Serverless Framework.
invoke tells Serverless Framework to run a specific function.
local means the function will run on your local machine, not in AWS.
-f weather specifies the function name (weather, as defined in your serverless.yml).
--data passes a mock event to the function: in this case, a simple prompt asking for the weather in Seattle or Pavia.

This command simulates what would happen if your AWS Lambda function received this prompt in the AWS cloud. The model processes the input, calls the external weather API (using the http_request tool), and formats the response using the system prompt instructions.

🚀 Deploy on AWS with IaC

Once you’ve tested your agent locally, it’s time to deploy it to the cloud. Also deployment is handled through Serverless Framework, which makes it easy to package and push your AWS Lambda functions to AWS cloud.

Make sure your project includes a serverless.yml file like the one below. This file tells Serverless how to package, deploy, and expose your weather agent:

service: serverless-strands-weather-agent
frameworkVersion: '3'

## Use .env
useDotenv: true

## Package individually each function
package:
  individually: true

## Apply plugins
plugins:
  - serverless-python-requirements #install python requirements

## Define provider and globals
provider:
  name: aws
  runtime: python3.12

## Define atomic functions
functions:
  ## Weather function
  weather:
    handler: src/agent/weather/handler.weather #function handler
    url: true
    package: #package patterns
      include:
        - "!**/*"
        - src/agent/weather/**

Key configuration highlights:

provider: Defines AWS as the deployment target and uses Python 3.12 as the runtime.
functions.weather: Specifies the Lambda function to deploy and exposes it via a public URL (url: true).

To deploy your function (you should have AWS credentials setup on your machine), run:

sls deploy

After a successful deployment, your AI powered weather agent will be accessible online, ready to take prompts and return real-time forecasts. 🌤️

🐍 Stay with Py, use CDK

If you prefer using only python and have a fully python repo, you can use AWS CDK for your infrastructure as code, here is an example.

from aws_cdk import (
    App,
    Stack,
    aws_lambda as _lambda,
)
from constructs import Construct

class WeatherAgentStack(Stack):
    def __init__(self, scope: Construct, construct_id: str, **kwargs):
        super().__init__(scope, construct_id, **kwargs)

        # Lambda function
        weather_function = _lambda.Function(
            self, "WeatherFunction",
            runtime=_lambda.Runtime.PYTHON_3_12,
            handler="handler.weather",
            code=_lambda.Code.from_asset("src/agent/weather"),
        )

        # Enable Function URL (public)
        weather_function.add_function_url(
            auth_type=_lambda.FunctionUrlAuthType.NONE
        )

app = App()
WeatherAgentStack(app, "ServerlessStrandsWeatherAgent")
app.synth()

❓ API Gateway, Lambda URL and distributed architectures

By default, many serverless projects expose functions via Amazon API Gateway, but in this case we did not use it for a specific reason:

Amazon API Gateway has a hard timeout limit of 30 seconds. This means if your agent takes longer than that (e.g., due to a slow model call or network delay), the request will be terminated. You can request a limit increase from AWS, specifically provided for AI and LLM use cases, but that may not be suitable for all use cases for its side effects. For simple, fast responses, anyway, API Gateway might be fine.
Instead, you can use Lambda Function URL, which supports response streaming and avoids the 30-second cap. If your agent generates partial responses over time or needs longer to compute, streaming is often a better choice. For LLM agents AWS Lambda URLs with streaming are often the better option. Please pay attention to the security best practices as AWS Lambda URL do not offer security patterns like Custom Authorizer or Cognito Integration, here you should implement them in the lambda itself!
Moreover, as we are talking about an agent you should probably integrate it in a wider distributed architecture knowing you can directly invoke lambda passing an event formatted as the test one and secure it with IAM and least privilege permissions

To dive deeper into the topic, check out this excellent article by the AWS Serverless Hero Yan Cui: a must-read if you're working with AWS Lambda and trying to decide between Amazon API Gateway and AWS Lambda Function URLs.

Another great article is this one by the AWS Serverless Hero Mattieu Napoli comparing AWS Lambda Function URls and Amazon API Gateway specifically for Serverless Framework use cases.

🚀 Final Thoughts

Strands Agents SDK is surprisingly fun to work with. It removes a lot of the boilerplate from typical agent frameworks, and it’s designed to be hackable. Whether you’re building a dev assistant, customer support bot, or something more chaotic, this is a solid starting point.

Curious about what you can do with Strands Agents SDK? Check out this awesome hands-on series by Dennis Traub: it inspired me to give it a try! You should definitely read it to master the full potential of Strands Agents SDK.

⏭️ Next Step

Have you heard about Model Context Protocol (MCP)?
If not, you can dive deep on how to build agents which could be plugged to any client implementing this protocol in my previous series.

You can combine this approach with Strands Agents SDK!

Also there is an entire section of the doc on MCP. I'll probably continue this series implementing an MCP server with Strands Agents on AWS Lambda.

🙋 Who am I

My work in this field is to advocate about serverless and help as more dev teams to adopt it, as well as customers break their monolith into API and micro-services using it.

🚀 Let's use SAM: rebuilding our minimal serverless MCP server

Davide De Sio — Mon, 28 Apr 2025 06:59:27 +0000

TL;DR: Go to this repo for the SAM template

TL;DR: Go to this repo for a CLI to start with your next serverless MCP server

Hey devs 👋, if you saw my first two post in this series about building a minimal Model Context Protocol server with AWS Lambda using the Serverless Framework, this is the natural follow-up for those who prefer using AWS Serverless Application Model (SAM).

The post on how to deploy an MCP server in a serverless environment was particularly well received and even got featured in two outstanding newsletters: Serverless Developer Advocate #33 by Lee Gilmore and Ready, Set, Cloud #160 by Allen Helton. I highly recommend subscribing to both as they’re packed with insights and inspiration for serverless enthusiasts!

Also the. second post of this series was cited in Serverless Developer Advocate #34 by Lee Gilmore. I can't be happier about this community feedback.

Why SAM?

In my latest post I've shown to you this omparison table between Serverless Framework, AWS SAM and AWS CKD. Again: read it carefully before choosing your preferred way to do IaC in your project.

Feature / Tool	Serverless Framework	AWS SAM (Serverless Application Model)	AWS CDK (Cloud Development Kit)
Ownership	V3 independent (deprecated), V4 enterprise, OSS alternative to V4	AWS	AWS
Abstraction Level	High-level	Medium-level	Low to medium-level
Language	YAML + plugins (JavaScript/TS)	YAML + some scripting	TypeScript, Python, Java, C#, Go
Cloud Provider Support	Multi-cloud (AWS, Azure, GCP, etc)	AWS only	AWS only
Template Syntax	Custom syntax (serverless.yml)	CloudFormation-compatible YAML	Imperative (code-based)
Local Development	Good support via plugins	Good (via `sam local`)	Limited, depends on constructs
Deployment	CLI-driven	CLI-driven (`sam deploy`)	CLI-driven (`cdk deploy`)
State Management	Built-in via `.serverless` folder	CloudFormation	CloudFormation
Extensibility	High (plugins, hooks)	Moderate (some hooks/plugins)	High (custom constructs, reusable code)
Maturity	Very mature	Mature	Rapidly growing
Best For	Multi-cloud serverless apps	Simple AWS Lambda apps	Complex infrastructure-as-code on AWS
Learning Curve	Low to moderate	Low	Moderate to high
Testing/Debugging	Plugin-based	`sam local invoke/start-api`	Manual / unit tests on code
CI/CD Integration	Easy (via plugins or custom)	Easy (via CodePipeline or custom)	Easy (via CodePipeline or custom)
Cost	V3 Free, V4 pricing	Free	Free

So, why pick AWS SAM over Serverless Framework and CDK?

SAM is built and maintained by AWS. That means native integration for services like Lambda, API Gateway, DynamoDB, and integrations with CloudFormation, CloudWatch, and CodeDeploy out of the box. No need for external plugins or workarounds to make things “just work.”
Designed only for serverless, unlike CDK, which is general purpose infrastructure as code. As an example: In CDK, to create an API Gateway connected to a Lambda function, you’ll write TypeScript or Python code that explicitly defines routes, integrations, permissions, and deploy stages, in SAM this is really simple with a couple of line of code.
Local dev & testing is very powerful: SAM CLI lets you run Lambda functions locally and mock API Gateway events. This feels more like traditional dev workflows, which is something CDKdon’t handle as smoothly, while Serverless Framework require serverless offline plugin.
Simpler learning curve than CDK, as CDK is powerful but verbose. You’re writing imperative code to describe declarative infrastructure. That’s cool, but not always necessary for serverless apps. SAM keeps things simple, YAML based, and readable across different teams: devs, ops, or whoever knows it.

So if you're building pure serverless apps on AWS: SAM it’s native, lightweight, and focused on serverless. CDK shines when you’re managing AWS infrastructure, but when your app is 90% serverless API based on Lambda, SAM is just way faster to learn.

📦 What’s Inside the Repo

This project spins up as the previous one:

An AWS Lambda function hosting a serverless MCP server
An Amazon API Gateway with a POST /mcp route

🛠️ Features

Our goal is always to have a skeleton to deploy our MCP server in a serverless environment, but using AWS SAM.

Simple MCP server with just a few lines of code
Runs in a single AWS Lambda function
HTTP POST endpoint at /mcp
Local development
Comes with a basic “add” tool (yeps, just adds two numbers via JSON-RPC: here you should put your endpoint logic!)

📦 Project Structure

We add template.yml, samconfig.toml and buildspec.yml respectively as our resource template, configuration (useful for deploy) and CI/CD pipiline build phase.

sam-serverless-mcp-server/
├── __tests__/              # Jest tests
├── src/                    # Source code
│   └── index.js                # MCP server handler
├── .gitignore              # Git ignore file
├── buildspec.yml           # Buildspec file for AWS CodeBuild and CodePipeline (CI/CD)
├── jest.config.mjs         # Jest config file
├── package.json            # Project dependencies
├── package-lock.json       # Project lock file
├── README.md               # This documentation file
├── samconfig.toml          # Serverless Application Model config
└── template.yml            # Serverless Application Model template

🏗️ SAM code

You can easily read the code following comments in the template file:

set a timeout
use nodejs22.x as runtime
spin up an AWS::Serverless::Function giving the proper handler path
create an API Gateway with a POST route on /mcp path, all automatically setting in Events attribute an event of type Api

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: serverless-mcp-server

Globals:
  Function:
    Timeout: 29
    Runtime: nodejs22.x

Resources:
  McpServerFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: src/index.handler
      Events:
        McpApi:
          Type: Api
          Properties:
            Path: /mcp
            Method: post

Tips: if you want to switch to API Gateway V2, just change the type to HttpApi and you're good to go.

🚀 Getting Started

To get it up and running follow those steps.

Install dependencies:

npm install

Run Locally with SAM

sam local start-api

Local endpoint will be available at:
POST http://localhost:3000/mcp

🧪 Test with jest

There are some basic tests included in the __tests__ folder. You can run them with:

npm run test

📡 Deploy to AWS

Follow those steps.

Build with SAM

sam build

And finally deploy (before that you should configure AWS Credentials with aws-cli)

sam deploy --guided

After deployment, the MCP server will be live at the URL output by the command.

🧪 Locally or once deployed, test also with curl requests

List tools

Change your-endpoint with the one noted after deploy or with localhost:3000.

curl --location 'https://your-endpoint/dev/mcp' \
--header 'content-type: application/json' \
--header 'accept: application/json' \
--header 'jsonrpc: 2.0' \
--data '{
  "jsonrpc": "2.0",
  "method": "tools/list",
  "id": 1
}'

➕ Use the add Tool

Change your-endpoint with the one noted.

curl --location 'https://your-endpoint/dev/mcp' \
--header 'content-type: application/json' \
--header 'accept: application/json' \
--header 'jsonrpc: 2.0' \
--data '{
  "jsonrpc": "2.0",
  "id": 2,
  "method": "tools/call",
  "params": {
    "name": "add",
    "arguments": {
      "a": 5,
      "b": 3
    }
  }
}'

💎 Kickstart your next serverless MCP project with a handy CLI

A little gem to help you kickstart your next MCP serverless projects on AWS: I created a CLI that lets you choose between the three boilerplates from this series (Serverless Framework, AWS CDK, or AWS SAM). It’s built with oclif and is easy to install.

It allows you to choose the framework with just a command "serverless-mcp-cli init" and automatically installs the dependencies so you are ready to go.

⏭️ Next Step

I'm planning to continue this series:

integrate authentication
integrate state management

🙋 Who am I

My work in this field is to advocate about serverless and help as more dev teams to adopt it, as well as customers break their monolith into API and micro-services using it.