DEV Community: Paul SANTUS

Zipping 15Gb of S3 files in 6s. How the power of community made it possible.

Paul SANTUS — Thu, 25 Jun 2026 18:23:01 +0000

In my first article, I showed how parallelizing zip assembly across multiple Lambdas can beat the single-Lambda bandwidth ceiling. I zipped 6.9GB in 35 seconds with just 5 workers.

Since then, Jérémie published a follow-up article where a contributor (Fitz) introduced a brilliant optimization: UploadPartCopy. Instead of downloading (or even streaming) big files through Lambda just to upload them back into the zip, you can tell S3 to copy them server-side. This halves the bandwidth requirement and brought his single-Lambda solution down to 106 seconds.

I took Fitz's UploadPartCopy idea and combined it with my parallel approach. Here's what happened.

What I took from Jérémie and Fitz

The UploadPartCopy insight is elegant: since ZIP STORE mode has deterministic offsets, we know exactly where each file's data lands in the final archive. For big files (≥5MB), we can:

Write just the local file header (50 bytes) in an UploadPart
Have S3 copy the file data directly via UploadPartCopy — no download, no upload, instant

This means workers barely use any memory or bandwidth for big files.

Only issue is that S3 multipart upload API requires all segments (except the last one) to be bigger than 5MB. So the local file header needs to be appended to an another file (or group of files).

My planner Lambda groups small files together until they reach 5MB, appends the LOC header of the next big file, then the worker fires an UploadPartCopy for that big file's data.

When we run out of small files, we stream the smallest remaining big file and pair it with (the LOC header then) a copy of the largest remaining one.

For CRC32 (required in zip headers): files uploaded with modern AWS SDKs already have CRC32 stored as object metadata. A simple HeadObject call retrieves it — no need to read the file.

Step Functions: three limitations

My original architecture used Step Functions to orchestrate workers. Here's what I hit.

1. Inline Map caps at ~40 concurrent iterations

The AWS documentation says the Inline Map state supports "up to 40 concurrent iterations." In practice I saw up to 55, but never more. With 1500 duos to process, Step Functions queued them in batches of 55.

I switched to Distributed Map which launches Express child workflow executions. All 1120 iterations started within 2 seconds. Problem solved? Not quite.

2. Distributed Map: fast to dispatch, slow to collect

With Distributed Map, all workers started within 2 seconds. Every single one finished in under 1 second (mostly UploadPartCopy calls). Total Lambda compute: ~500ms average.

Yet the Map state took 38 seconds to complete.

The bottleneck? Step Functions' internal machinery for collecting and aggregating results from 1120 Express child executions. I confirmed: all workers started at 10:06:52-53, all finished by 10:06:54, but the Map state didn't report success until 10:07:28. 35 seconds of pure orchestration overhead.

3. The 256KB payload limit

Step Functions states can pass at most 256KB between them. With 3000 files:

The planner's assignment list exceeds 256KB → had to write to S3
The aggregated worker results exceed 256KB → had to write CRC32s to S3, read them back in the finalizer

This added complexity and latency (the finalizer reading 1500 small S3 files — 29 seconds sequentially, until I parallelized it down to 1.5s).

After all these fixes, the Step Functions version ran in 41 seconds for 3000 × 5MB files. Respectable — 2.5× faster than Jérémie's 106s — but I knew most of that time was Step Functions overhead, not actual work.

The final version: direct Lambda invocation

I stripped out Step Functions entirely and wrote a single orchestrator Lambda that:

Lists files, computes zip layout (the job of the "planner" Lambda in my StepFunction architecture), and initiates multipart upload (~0.5s)
Invokes all worker Lambdas synchronously in parallel using goroutines + the Lambda SDK (~0.5s to dispatch)
Collects results (workers return inline, no S3 round-trip for parts)
Reads CRC32 files from S3 in parallel, builds central directory, completes multipart upload (~1s)

Orchestrator Lambda (15min timeout, 1024MB)
    │
    ├─── goroutine → Invoke Worker 1 (sync) → return {parts}
    ├─── goroutine → Invoke Worker 2 (sync) → return {parts}
    ├─── ...
    └─── goroutine → Invoke Worker N (sync) → return {parts}
    │
    └─── All done → Build CD → CompleteMultipartUpload

The Lambda SDK's synchronous Invoke blocks until the worker returns. With 200 concurrent goroutines, all workers are dispatched instantly. No orchestration overhead, no state size limits for the parts (only CRC32s go to S3), no 35-second result aggregation.

Now the theoretical time is: orchestration time + time to upload the smallest large file that stays orphan after we pair all large files with groups of small files or single large files

Results: 3000 × 5MB benchmark

Approach	Time	Notes
Jérémie Gen1 (Rust, streaming)	212s	Single Lambda, 512MB
Jérémie Gen2 (Rust, UploadPartCopy)	106s	Single Lambda, 640MB
My Step Functions version	41s	Distributed Map, 1120 workers
My orchestrator Lambda	6s	Direct invoke, ~1500 workers

6 seconds to zip 15GB into a single valid ZIP64 archive. That's a 18× improvement over the optimized single-Lambda approach, and 35× over the original.

Worker stats:

Max memory: 85 MB (I initially allocated 3008MB — massively over-provisioned thanks to UploadPartCopy)
Average duration: 516ms per worker
Max duration: 1035ms

What I learned (round 2)

Step Functions Parallel Map adds seconds, not milliseconds. For latency-sensitive fan-out/fan-in, direct Lambda invocation is faster. Step Functions shines when you need retries, visual debugging, long-running workflows, or error handling, or lightning fast step transition speed. This outstanding performance lasts only until you need more than 40 parallel processes.
UploadPartCopy is the killer optimization. When most files are ≥5MB, workers barely do any work — they just tell S3 to copy data server-side. Memory stays under 100MB regardless of file sizes.
The orchestrator pattern is underrated. A single Lambda with goroutines can invoke hundreds of child Lambdas synchronously, collect results, and finalize — all within one execution context. No state machine, no payload limits between states, no aggregation overhead.
Over-parallelization can hurt. 1500 separate assignments created more Step Functions overhead than the actual compute. Grouping into fewer, larger batches would have been better for the SFN approach.

Try it

Code: github.com/psantus/on-demand-archive-on-s3

The repo has both approaches: Step Functions (cmd/planner + cmd/worker + cmd/finalizer) and the orchestrator Lambda (cmd/orchestrator).

Jérémie's challenge repo: github.com/RustyServerless/demo-s3-archiving

What's next?

The theoretical minimum is bounded by Lambda cold start time (~200ms) plus the slowest UploadPart call (if we lack small files, we may need to upload a large file manually to append another file's LOC to it) plus orchestrator overhead (~500ms).

Your move, Jérémie 😏

Edit: with 73.2Gb (15,000 files), my solutions gives quite acceptable performance. Just 20s (probably due to my 1000 account default concurrency, would likely be faster on an unbounded account :D)

Paul out.

AWS Security Agent: 34 Findings in Under 10 Hours. A Real-World Test

Paul SANTUS — Mon, 08 Jun 2026 12:00:34 +0000

Last week, I ran AWS Security Agent against an app I'm building for a client. The app is quite usual: a React front-end, and the backend is powered by a CMS (that's part of my customer's requirements) on top of which I built a custom API. Both run on Lambda, with DSQL as the database layer and quite a lot of AI inside (more on that below). The results were impressive.

Two scans, overnight

I kicked off both scans in the evening:

Code Review completed in 1h26m:

For this, I had to grant read access to my application Github repository (you can provide write access to get fixes, but that was a step I wasn't ready to take just yet.)
18 findings (9 High, 8 Medium, 1 Low) which covered SQL injection, SSRF, XSS, privilege escalation, secret exposure, IAM misconfigurations
2h42m of agent task time

Penetration Test completed in 7h56m:

The pentest was not a full blackbox test. It started with 3 URLs I provided (the app's front-end, the API and the admin app) but I also submitted the repository.
16 findings (2 Critical, 2 High, 12 Medium)
29.16 hours of agent task time

By the time I woke up, I had a downloadable 60+ page report with reproduction steps, CVSS scores, and suggested fixes. And a pleasant UI to see results summary, but also findings details, test logs, etc.

What impressed me

It thinks like a real Pentester

The agent didn't just scan for known CVEs. It understood my application's architecture and chained vulnerabilities together:

It discovered that some API method had no authentication
It found that the some Lambda code called by a Step Function this method was triggering constructed a URL with user-supplied (non-protected) lang parameter
It crafted a payload using # fragment to redirect requests to an attacker-controlled domain it owned
It verified the SSRF by actually receiving DNS callbacks!!

That's a multi-step attack requiring deep understanding of Python's URL parsing, AWS Step Functions workflow, and the application's data flow from API to Lambda to Wikipedia API.
(I provided the repo )

SQL injection: genuinely clever

The code review found a SQL injection vector I would never have caught manually. Our DSQL driver had a shortcut: if a value starts with CAST(, it's passed through unescaped (intended for internal type conversions). The agent traced the full path from user input (POST /api/my-route body) through the CMS' abstraction layer down to the raw pg_query() call, proving the injection was reachable.

A unexpected category.

I was expecting the Agent to report SSRF, path traversal, etc. One category I didn't expect was "Cost Abuse". Since my app runs on serverless, the Agent also provided valuable insights on path were attackers could make my AWS bill fat, especially via the use of Bedrock, Polly and other AI services.

Findings were validated through exploitation

The penetration test didn't just flag theoretical issues. A bug let the CMS "Installation Wizard" accessible even once install once done. AWS Security Agent

Accessed the install page without auth
Successfully connected to the production database (DSQL with IAM auth meant empty credentials worked)
Enumerated installed plugins with exact version numbers
Documented each step with HTTP requests and responses

Both approaches brought original content

Despite the white box approach, only half of findings were redundant. The code review caught architectural issues (IAM over-privilege due to several Lambdas sharing the same IAM role, hardcoded secrets, missing log retention) while the pentest found runtime exploitables (auth bypass, path traversal, IDOR). Together they covered more ground than either alone.

A word on cost

Warning: AWS Security Agent CAN be expensive (yet cost-effective): at standard pricing of $50/agent-hour, the total would have been:

Code review: 2.7h × $50 = $135
Pentest: 29.2h × $50 = $1,460
Total: ~$1,595

For context, a human pentest engagement of equivalent scope (3 URLs, mixed tech stack, 8 hours of active testing) would probably run $8,000–$20,000 and take 1-2 weeks to deliver results.

But here's the kicker: AWS Security Agent includes a generous free tier. New customers get a 2-month trial with up to 400 pentesting task-hours per month. Both my scans (31.8 task-hours total) fit comfortably within that allowance. So the first real-world security audit of my production application cost me exactly $0.

From Findings to Fixes

The actionable output let me fix all findings within a single day:

SQL injection → removed CAST bypass, added intval() on inputs
SSRF → URL scheme allowlist + private IP blocking
Auth bypass on install page → overlay file blocking access
Path traversal → regex validation on URL path parameters
Task token exposure → stripped from API response
Info disclosure → CloudFront response headers policy removes version headers
Missing logging → API Gateway access logs + 30-day retention

The suggested fixes in the report were specific enough to implement directly, not generic "validate your inputs" advice, but exact code locations and replacement patterns.

A few more things

To be exhaustive, I must share that

it didn't find existing application logic bugs (due to the model being instructed to focus solely on security. Attention is all we need, right?)
due to our white box nature of our pentest, a couple findings, while technically correct, required knowledge of our deployment model to be exploited. If you need to know the value of a secret "consider-i-m-an-admin" header, then maybe the risk is not high.. but again the agent thinks like security folks, and probably considered lateral movement after log access like a possible path.
Some findings are CMS upstream issues that I can't fix without modifying vendor code. I submitted findings to their security team.

Verdict

AWS Security Agent is not a full replacement for security expertise: you still need to understand your architecture to prioritize and fix findings. But as a first pass that runs overnight and produces a professional-grade report? It's remarkably good; literally 0 findings were non-relevant. The multi-step attack chains, the code-level precision, and the actual exploitation validation put it well above traditional SAST/DAST tools.

For a solo developer or small team shipping on AWS, this is a no-brainer at the free tier. Run it before every major release, fix what it finds, and sleep better.

S3 zipper challenge: a parallel zip assembly that beats the single Lambda approach

Paul SANTUS — Mon, 01 Jun 2026 21:27:25 +0000

I recently read Jérémie Rodon's excellent article On-Demand Archives on S3, where he describes an elegant Rust solution for zipping 3,000 × 5MB files from S3 within a single Lambda function.

His approach is impressive: streaming a ZIP archive through a custom Rotating Slab Buffer, saturating bandwidth with concurrent downloads, all within 512MB of RAM. The result: 3 minutes 35 seconds.

I thought it was a good challenge to reach better performance. His article ends with an open invitation: "do you think you can do better with your favorite language?" Well, my favorite language is not Rust nor Go nor.. however, I'm fluent in serverless ;) so I took a different angle entirely.

A Different Approach: Why Not Parallelize the Problem?

Jérémie's constraint was a single Lambda. That's elegant, but it means you're bound by one machine's network bandwidth (~600 Mbps). No matter how perfect your streaming is, physics wins: 15GB at 600 Mbps ≈ 200 seconds minimum.

My question was: what if we break that single-machine bottleneck?

The key insight is that ZIP files in STORE mode (no compression) have deterministic byte offsets. Each entry is exactly 50 + len(filename) + filesize bytes (local header + ZIP64 extra field + data). If you know all filenames and sizes upfront, you can pre-calculate exactly where every file will land in the final archive, before downloading a single byte.

This means independent workers can each build their portion of the zip in parallel, and S3's multipart upload lets them write their chunks independently (parts can be uploaded in any order by different processes sharing the same upload ID).

Architecture

Planner Lambda → Step Functions Distributed Map → N Worker Lambdas → Finalizer Lambda
     │                        │ │ │                        │
     │ CreateMultipartUpload  │ │ │ UploadPart (parallel)  │ CompleteMultipartUpload
     ▼                        ▼ ▼ ▼                        ▼
                         S3 Output Bucket

Planner: Lists all source files, computes zip byte offsets, initiates multipart upload, divides work into balanced batches (equal data volume per worker).
Workers (N concurrent): Each downloads its assigned files, constructs zip local file headers + raw data, computes CRC32 on the fly, streams to S3 as multipart parts.
Finalizer: Builds the central directory with real CRC32 values, uploads it as the final part, calls CompleteMultipartUpload.

Results

With a quota-constrained training account (I had 10 concurrency limit so used only 5 concurrent Lambdas, 3008MB each), zipping 6.9GB across 160 files:

Metric	Single Lambda (Jérémie's Rust)	Parallel (this project)
Approach	Stream within 1 Lambda	Fan-out N workers
Time (15GB, 3000 files)	~215s	Estimated ~10-15s with 100+ workers
Time (6.9GB, 160 files, 5 workers)	-	35s
Memory per worker	512MB	3008MB (could be lower)
Language	Rust 🦀	Go

With a production account (1000 concurrent Lambdas), the 3000 × 5MB scenario would complete in under 15 seconds (each worker handles ~150MB, downloads take ~2s at 600Mbps, upload ~2s). The bottleneck shifts from bandwidth to Lambda cold start (~200ms for Go on ARM64).

Tradeoffs

Jérémie's approach is simpler to deploy (one Lambda, no orchestration) and cheaper per invocation (512MB × 215s vs N × 3008MB × few seconds). It's the right choice when you want minimal infrastructure.

The parallel approach wins on wall-clock time, and dramatically so. It's the right choice when the user is waiting and you want the archive ready in seconds, not minutes.

	Single Lambda	Parallel Fan-Out
Wall-clock time	Bounded by bandwidth	Bounded by slowest worker
Complexity	Low	Medium (Step Functions + 3 Lambdas)
Cost per archive	Lower	Higher (more Lambda-seconds total)
Scalability	Fixed ceiling (~600Mbps)	Linear with concurrency
Memory efficiency	Excellent (512MB)	Good (3GB, could optimize)

If I were to use it in prod, there are plenty of room for optimization (our current Lambda used at most 1875mb, well below our allocated 3Gb, we could use Jérémie's streaming optimizations to cut that to by 10). Yet, we'd probably still have some overhead compared to Jeremie's solutions (cold starts, TLS negociations...) and so far it's just a vanity project :)

What I Learned

ZIP STORE mode is embarrassingly parallel: deterministic offsets mean zero coordination between workers during the data phase.
S3 multipart upload is the perfect primitive: parts uploaded out of order, by different processes, assembled by S3 at the end.
Step Functions Distributed Map is ideal for this pattern: it handles fan-out, concurrency limits, retries, and result collection.
The real bottleneck at scale is Lambda concurrency limits, not bandwidth or compute. With sufficient concurrency, you can zip 15GB in the time it takes to download one 5MB file.

Try It

The code is at github.com/psantus/on-demand-archive-on-s3.

And if you want to try Jérémie's challenge with the single-Lambda constraint, his demo project is at github.com/RustyServerless/demo-s3-archiving.

Both approaches are valid, it just depends on whether you're optimizing for simplicity or speed.

Keep the challenge going?

So, « do you think you can do better with your favorite ~~language~~ architecture?»

And what does "better" even mean for you? :)

Hackez votre AWS CLI pour ajouter le support CloudShell et transformer votre terminal en bastion

Paul SANTUS — Fri, 29 May 2026 12:43:23 +0000

J'utilise AWS CloudShell depuis la Console depuis un moment. C'est pratique : un shell pré-authentifié dans votre navigateur, directement dans la Console AWS. Mais je me suis toujours demandé : pourquoi je ne peux pas l'utiliser depuis mon terminal ? Pourquoi n'y a-t-il pas de commande aws cloudshell ?

Il s'avère que c'est possible. L'API existe, elle n'est simplement pas publique. Et une fois que vous avez accès à CloudShell en CLI, vous pouvez faire des choses intéressantes avec, comme utiliser un CloudShell attaché à un VPC comme bastion pour atteindre vos instances RDS privées.

Consultez le dépôt compagnon en lisant cet article.

CloudShell : une API non documentée

AWS CloudShell n'a pas de support officiel SDK ou CLI. Mais la Console doit bien communiquer avec quelque chose, non ? En regardant ce que fait le navigateur quand vous ouvrez CloudShell, vous pouvez rétro-ingénierer l'API.

Heureusement, Jérôme Guyon a déjà fait ce travail et publié un modèle de service compatible boto3. Son travail a rendu tout cela possible.

L'API est simple : créer des environnements, les démarrer/arrêter, créer des sessions, uploader/télécharger des fichiers. Le mécanisme de session utilise le protocole WebSocket de SSM sous le capot, ce qui signifie que session-manager-plugin (le même binaire qui fait tourner aws ssm start-session) peut se connecter aux sessions CloudShell.

Apprendre un nouveau tour à l'AWS CLI

L'AWS CLI a une fonctionnalité peu connue : aws configure add-model. Donnez-lui un modèle de service JSON, et soudain la CLI connaît un nouveau service. AWS utilise ça en interne pour les previews privées.

(Le modèle boto3 du dépôt de Jérôme a juste besoin d'un champ "version": "2.0" ajouté au niveau racine pour devenir compatible CLI.)

Exécutez :

aws configure add-model \
  --service-model file://cloudshell-cli-model.json \
  --service-name cloudshell

C'est tout. Maintenant j'ai aws cloudshell avec l'auto-complétion et tout :

$ aws cloudshell help

AVAILABLE COMMANDS
       create-environment
       create-session
       delete-environment
       describe-environments
       get-environment-status
       start-environment
       stop-environment
       ...

Se connecter à CloudShell depuis le terminal

Le workflow est simple :

# Créer ou trouver un environnement
aws cloudshell create-environment --region eu-west-1

# Attendre qu'il soit RUNNING
aws cloudshell get-environment-status --environment-id <ID> --region eu-west-1

# Créer une session et se connecter
session-manager-plugin "$(aws cloudshell create-session \
  --environment-id <ID> \
  --session-type TMUX \
  --tab-id "$(uuidgen | tr '[:upper:]' '[:lower:]')" \
  --q-cli-disabled \
  --region eu-west-1 \
  --query '{SessionId:SessionId,TokenValue:TokenValue,StreamUrl:StreamUrl}' \
  --output json)" eu-west-1 StartSession

Et vous y êtes. Un shell complet sur une instance CloudShell, depuis votre terminal. Pas besoin de navigateur.

Le problème des credentials

Il y a un hic. Quand vous utilisez CloudShell depuis la Console, AWS injecte vos credentials automatiquement via un appel API PutCredentials. Celui-ci utilise votre token de session console (l'auth par cookie de votre connexion navigateur) pour alimenter le endpoint de métadonnées du conteneur en credentials temporaires.

Quand vous vous connectez par programme, ça ne se fait pas. Le endpoint de credentials du conteneur renvoie une erreur 500. Vous devez injecter les credentials vous-même :

# Exécutez localement, puis collez la sortie dans votre session CloudShell
aws configure export-credentials --profile my-profile --format env

Pas idéal, mais ça fonctionne.

Le cas d'usage bastion

C'est là que ça devient intéressant. Vous pouvez créer un environnement CloudShell attaché à un VPC :

aws cloudshell create-environment \
  --environment-name db-access \
  --vpc-config '{
    "VpcId": "vpc-abc123",
    "SubnetIds": ["subnet-private-1"],
    "SecurityGroupIds": ["sg-allowed-by-rds"]
  }' \
  --region eu-west-1

Mettez-le dans le même security group que celui autorisé par votre RDS, et soudain vous pouvez vous connecter à votre base de données directement depuis le shell :

mysql -h my-instance.xxx.eu-west-1.rds.amazonaws.com -u admin -p

Pas d'instance EC2 bastion à maintenir. Pas de clés SSH à gérer. Pas de coût horaire quand vous ne l'utilisez pas (CloudShell est gratuit). L'environnement se suspend après 20 minutes d'inactivité et vous pouvez le maintenir en vie avec aws cloudshell send-heart-beat.

Ce qui ne marche pas (et j'ai essayé..)

J'ai passé pas mal de temps à essayer de faire fonctionner CloudShell comme un vrai bastion de port-forwarding, pour pouvoir utiliser des outils locaux comme DBeaver contre un RDS distant à travers lui. Voici ce que j'ai trouvé :

Le port forwarding basé sur SSM ne fonctionne pas.

ECS, par exemple, enregistre les conteneurs comme cibles SSM. Son identifiant SSM n'est pas documenté mais une fois qu'on le connaît, ça marche bien, comme je l'ai décrit dans un précédent article. De cette façon vous pouvez lancer aws ssm start-session --document-name AWS-StartPortForwardingSessionToRemoteHost.
Les notebooks SageMaker ont un comportement similaire.

Les instances/conteneurs CloudShell ne semblent pas être enregistrés comme instances managées SSM. Ou s'ils le sont, c'est caché et à ce jour, personne chez AWS n'a divulgué le format de leur ID :) J'ai essayé toutes les combinaisons d'ID d'environnement, d'ID de session et de format de préfixe auxquelles j'ai pu penser. Aucune ne fonctionne.

Le port forwarding local à travers le PTY ne fonctionne pas non plus. La session est un terminal, pas un flux TCP brut. Vous ne pouvez pas faire passer des données binaires du protocole MySQL à travers. J'ai même essayé de mettre en place un relais ncat à l'intérieur de CloudShell et de tunneler à travers la session. Le relais fonctionne bien en interne, mais il n'y a aucun moyen de l'exposer comme un port TCP local sur votre machine.

Le hole punching UDP est théoriquement possible mais nécessite que le CloudShell ait accès à internet (NAT Gateway sur son subnet), et même là vous vous battez contre des problèmes de symétrie NAT des deux côtés. J'ai réussi à faire fonctionner STUN depuis CloudShell, mais le hole punch complet est fragile et impraticable pour un usage en production.

Alors à quoi ça sert ?

Honnêtement, à pas mal de choses :

Accès rapide à la base de données sans maintenir une instance EC2 bastion. Connectez-vous, exécutez vos requêtes, déconnectez-vous. Gratuit.
Automatisation. Vous pouvez scripter l'exécution de commandes sur CloudShell via Python + session-manager-plugin. Utile pour exécuter des choses à l'intérieur d'un VPC sans déployer une Lambda ou une tâche Fargate.
Débogage de connectivité réseau. Lancez un CloudShell dans une combinaison subnet/SG spécifique et testez ce qui peut atteindre quoi.
Transfert de fichiers (depuis les environnements publics). Les APIs get-file-upload-urls et get-file-download-urls vous donnent des URLs S3 présignées.

La limitation principale est que vous êtes limité à exécuter des commandes à l'intérieur du shell. Vous ne pouvez pas l'utiliser comme un tunnel transparent pour vos outils locaux. Pour ça, vous avez toujours besoin d'une instance EC2 avec l'agent SSM, ou d'une tâche ECS avec execute-command activé.

Essayez vous-même

J'ai publié le modèle et un script d'exemple ici : github.com/psantus/cloudshell-cli

L'installation se fait en une commande. Le tout est un seul fichier JSON qui apprend un nouveau service à votre AWS CLI. Rappelez-vous juste : c'est une API non documentée. AWS peut la modifier ou la casser à tout moment. Ne construisez rien de critique dessus.

Mais pour un accès VPC rapide depuis votre terminal ? C'est plutôt génial.

Générer des données structurées avec un LLM : quelques astuces pour plus de fiabilité

Paul SANTUS — Fri, 29 May 2026 12:41:31 +0000

Les LLMs sont excellents pour générer du texte. Ils sont mauvais pour générer des données structurées de manière fiable. Si vous avez déjà essayé de faire produire à un agent un objet JSON avec un schéma précis, vous connaissez le douloureux résultat : champs manquants, clés hallucinées, types incohérents, et des sorties qui cassent votre pipeline en aval.

Dépassant le stade du code de démo pour travailler sur de vraies applications IA en production, j'ai été confronté au problème et j'ai trouvé une approche qui fonctionne remarquablement bien pour une application IA que je développe : utiliser les outils comme le pattern Builder de la programmation orientée objet. Au lieu de demander au modèle de produire un blob JSON final, vous lui donnez des outils qui construisent la sortie de manière incrémentale - comme appeler des méthodes sur un objet. Le modèle ne voit ni ne produit jamais la structure finale directement. Il appelle simplement des outils, et la sortie structurée émerge comme un effet de bord.

C'est particulièrement important quand votre agent traite des documents volumineux (formulaires d'assurance, dossiers juridiques, dossiers médicaux) qui consomment la majeure partie de la fenêtre de contexte. Quand l'entrée est volumineuse et que la tâche comporte plusieurs étapes, vous ne pouvez pas vous permettre de réserver aussi de l'espace pour une sortie structurée massive à la fin. Le pattern accumulateur vous permet de compresser la conversation en cours de route sans perdre aucune des données structurées déjà collectées, car ces données vivent entièrement en dehors de la fenêtre de contexte.

Défis

"Génère-moi un gros JSON" : les soucis

L'approche naïve - demander au modèle de produire une structure JSON complète - échoue de manière quasi systématique lorsque le volume augmente :

Dérive de schéma. Le modèle oublie des champs obligatoires, en invente de nouveaux, ou change les types d'une exécution à l'autre. Un champ date peut être une chaîne une fois et un objet la suivante.
Tout-ou-rien. Si le modèle fait une seule erreur dans une sortie JSON de 200 lignes, l'ensemble est impossible à parser. Vous devez soit relancer toute la génération, soit écrire du code de correction fragile.
Pas de progrès incrémental. Quand un agent doit collecter des informations et produire une sortie structurée, lui demander de faire les deux en une seule passe signifie qu'il ne peut pas itérer. Il s'engage sur une structure avant d'avoir tous les faits.

Pourquoi `response_format` et les schémas de function-calling ne suffisent pas

Les modes de sortie structurée (comme response_format: json_schema d'OpenAI ou les schémas de résultats d'outils de Bedrock) aident avec la syntaxe - vous obtiendrez du JSON valide. Mais ils ne résolvent pas le problème sémantique. Le modèle doit toujours produire la structure entière en une seule passe, et il hallucine toujours du contenu pour remplir les champs obligatoires.

Un problème répandu

Toute équipe qui construit des agents autonomes ou semi-autonomes fait face à ce problème, pas seulement moi. Kiro CLI, le compagnon de développement agentique d'AWS, par exemple, a beaucoup galéré avec les grandes structures de données à son lancement.

Depuis, ses mainteneurs ont équipé son harnais de capacités JSON (manipulations jq, par exemple) et de multiples stratégies (utilisation extensive de grep, glob, tail..) pour éviter de remplir la fenêtre de contexte.

Ça fait quand même plaisir de savoir que je ne suis pas le seul à avoir galéré :)

Mes solutions

Voici quelques astuces que j'ai utilisées avec succès pour contrôler à la fois la sortie de l'agent et la fenêtre de contexte. Comme je ne prétends pas avoir toutes les recettes, n'hésitez pas à commenter les vôtres ou à me taguer dans vos propres posts :)

Utiliser les outils comme des Builder méthodes

L'idée centrale : définir des outils qui agissent comme des méthodes Builder en POO. Chaque appel d'outil ajoute un élément bien typé à un accumulateur. Le travail du modèle passe de "produis cette structure" à "appelle ces fonctions dans le bon ordre."

Voici le pattern - imaginez un agent qui traite des sinistres d'assurance en lisant des documents et en construisant une évaluation structurée :

from strands import tool

# L'accumulateur - c'est votre sortie structurée
claim_output = {
    "parties": [],
    "events": [],
    "damages": [],
    "evidence": [],
    "assessment": None,
}

def reset_output():
    claim_output["assessment"] = None
    for k in ["parties", "events", "damages", "evidence"]:
        claim_output[k] = []


@tool
def add_party(name: str, role: str, policy_id: str = "") -> str:
    """Enregistrer une partie impliquée dans le sinistre.

    Args:
        name: Nom complet de la personne ou de l'organisation.
        role: Un parmi : claimant, insured, witness, adjuster, third_party
        policy_id: Numéro de police si applicable.

    Returns:
        Confirmation avec les détails de la partie.
    """
    if role not in ("claimant", "insured", "witness", "adjuster", "third_party"):
        return f"Error: invalid role '{role}'. Must be one of: claimant, insured, witness, adjuster, third_party"

    claim_output["parties"].append({
        "name": name,
        "role": role,
        "policy_id": policy_id,
    })
    return f"Added {role}: {name}"


@tool
def add_event(description: str, date: str, location: str = "") -> str:
    """Enregistrer un événement chronologique pertinent pour le sinistre.

    Args:
        description: Ce qui s'est passé (1-3 phrases).
        date: Date au format ISO (AAAA-MM-JJ).
        location: Où cela s'est produit (optionnel).
    """
    claim_output["events"].append({
        "description": description,
        "date": date,
        "location": location,
    })
    return f"Recorded event on {date} ({len(claim_output['events'])} events total)"


@tool
def add_damage(item: str, amount: float, category: str, evidence_ref: str = "") -> str:
    """Enregistrer un poste de dommage avec le coût estimé.

    Args:
        item: Description de l'élément endommagé ou du coût.
        amount: Coût estimé en dollars.
        category: Un parmi : property, medical, liability, lost_income
        evidence_ref: Référence à une preuve justificative (optionnel).
    """
    if category not in ("property", "medical", "liability", "lost_income"):
        return f"Error: invalid category '{category}'."

    claim_output["damages"].append({
        "item": item,
        "amount": amount,
        "category": category,
        "evidence_ref": evidence_ref,
    })
    total = sum(d["amount"] for d in claim_output["damages"])
    return f"Added damage: {item} (${amount:.2f}). Running total: ${total:.2f}"

L'agent reçoit ces outils et un prompt système qui lui dit de traiter un sinistre. Au fur et à mesure qu'il lit les documents et découvre des informations, il appelle add_party, add_event et add_damage. La sortie structurée se construit de manière incrémentale.

Validation à la frontière

Chaque appel d'outil est un point de contrôle de validation. Vous pouvez rejeter les entrées invalides immédiatement :

@tool
def add_damage(item: str, amount: float, category: str, evidence_ref: str = "") -> str:
    if category not in ("property", "medical", "liability", "lost_income"):
        return f"Error: invalid category '{category}'."
    if amount <= 0:
        return f"Error: amount must be positive, got {amount}."
    if evidence_ref and evidence_ref not in [e["id"] for e in claim_output["evidence"]]:
        return f"Error: evidence '{evidence_ref}' not registered. Call add_evidence first."
    # ...

Le modèle reçoit un feedback instantané. S'il essaie de référencer une preuve qu'il n'a pas encore enregistrée, l'outil le lui dit. Le modèle se corrige au tour suivant. Comparez cela à la validation d'un blob JSON de 500 lignes après coup - à ce moment-là, le modèle est passé à autre chose et ne peut plus corriger ses erreurs dans le contexte.

Décorréler la phase de réflexion de la construction de la sortie

Un avantage clé : le même agent peut avoir des outils de lecture et des outils d'écriture. Les outils de lecture récupèrent et explorent les données. Les outils d'écriture construisent la sortie. Le modèle les entrelace naturellement :

agent = Agent(
    model=model,
    system_prompt=prompt,
    tools=[
        # Outils de lecture
        read_document,
        search_policy,
        get_weather_report,
        # Outils d'écriture (méthodes Builder)
        add_party,
        add_event,
        add_damage,
        add_evidence,
        set_assessment,
        # Suivi de progression
        mark_step_done,
    ],
)

# Un seul appel - l'agent lit les documents ET construit la sortie structurée
agent("Process this claim: " + claim_text)

# La sortie est prête
print(claim_output)

Le modèle lit un rapport de police, extrait une partie, lit une facture médicale, enregistre un poste de dommage, vérifie la police d'assurance, et ainsi de suite. Recherche et construction de la sortie sont entrelacées plutôt que séquentielles.

Suivi de progression et récupération

Parce que la sortie s'accumule de manière incrémentale, vous obtenez la récupération après crash gratuitement :

STEPS = [
    "1. Identify all parties",
    "2. Establish timeline of events",
    "3. Catalog damages with evidence",
    "4. Cross-reference policy coverage",
    "5. Produce assessment",
]
completed_steps: list[int] = []

@tool
def mark_step_done(step_number: int) -> str:
    """Marquer une étape de traitement comme terminée."""
    completed_steps.append(step_number)
    remaining = [s for i, s in enumerate(STEPS, 1) if i not in completed_steps]
    return f"Step {step_number} done. Remaining: {', '.join(remaining)}"

Si l'agent atteint une limite de fenêtre de contexte ou plante, vous avez déjà des résultats partiels - chaque partie identifiée, chaque événement enregistré, chaque poste de dommage catalogué jusqu'à ce point. Vous pouvez reprendre ou utiliser ce que vous avez.

Gestion du contexte par injection d'état

C'est là que ce pattern prend tout son sens. Quand votre agent ingère un document de 30 pages puis fait des dizaines d'appels d'outils pour récupérer des sources supplémentaires, la fenêtre de contexte se remplit vite. Dans une approche traditionnelle, vous perdriez votre sortie structurée en même temps que la conversation quand vous atteignez la limite. Mais parce que l'accumulateur vit dans la mémoire Python - pas dans l'historique des messages - vous pouvez compresser agressivement la conversation sans perdre un seul point de données.

Un gestionnaire de conversation personnalisé (une possibilité offerte, par exemple, par le SDK Strands Agents) remplace les anciens messages par un résumé d'état compact dérivé de l'accumulateur :

class ClaimConversationManager(ConversationManager):
    def apply_management(self, agent, **kwargs):
        messages = agent.messages
        if len(messages) <= 2:
            return

        # Garder le premier message + les 2 derniers messages
        # Remplacer tout le reste par un résumé d'état
        first_msg = messages[0]
        recent = messages[-2:]

        state = self._build_state_summary()
        state_msg = {
            "role": "user",
            "content": [{"text": f"[STATE]\n{state}\n\nContinue."}],
        }
        messages[:] = [first_msg, state_msg] + recent

    def _build_state_summary(self) -> str:
        """Résumer ce qui a été fait en utilisant l'état de l'accumulateur."""
        lines = []
        if claim_output["parties"]:
            parties = [f"{p['name']} ({p['role']})" for p in claim_output["parties"]]
            lines.append(f"Parties: {', '.join(parties)}")
        if claim_output["damages"]:
            total = sum(d["amount"] for d in claim_output["damages"])
            lines.append(f"Damages: {len(claim_output['damages'])} items, ${total:.2f} total")
        if claim_output["events"]:
            lines.append(f"Events: {len(claim_output['events'])} recorded")
        return "\n".join(lines)

Parce que la sortie structurée vit en Python (pas dans la conversation), la compression du contexte ne perd aucune donnée. Le modèle peut toujours voir ce qu'il a déjà produit en lisant le résumé d'état.

Bénéfices

Sûreté de typage sans coercition

Chaque outil a des paramètres typés imposés par le framework. Le modèle doit fournir une category parmi property, medical, liability, lost_income - non pas parce que vous parsez du JSON et vérifiez après coup, mais parce que la signature de l'outil l'exige. Les appels invalides sont rejetés avec des messages d'erreur clairs.

Composabilité

Les outils se composent naturellement. Vous pouvez ajouter de nouveaux champs de sortie en ajoutant de nouveaux outils sans modifier les existants. Vous voulez suivre les pièces justificatives ? Ajoutez un outil add_evidence. Vous voulez une recommandation finale ? Ajoutez un outil set_assessment. Le modèle découvre les nouvelles capacités via sa liste d'outils.

Testabilité

Chaque outil est une fonction pure (ou presque). Vous pouvez les tester unitairement de manière indépendante :

def test_add_damage_rejects_invalid_category():
    reset_output()
    result = add_damage(item="Roof repair", amount=5000, category="cosmetic")
    assert "Error" in result
    assert len(claim_output["damages"]) == 0

def test_add_damage_tracks_total():
    reset_output()
    add_damage(item="Roof repair", amount=5000, category="property")
    add_damage(item="Water damage", amount=2000, category="property")
    assert len(claim_output["damages"]) == 2
    assert sum(d["amount"] for d in claim_output["damages"]) == 7000

Schéma de sortie déterministe

Le schéma de sortie est défini par votre code Python, pas par l'interprétation du modèle d'un prompt. claim_output a toujours les mêmes clés avec les mêmes types. Les consommateurs en aval peuvent compter sur la structure de manière inconditionnelle.

Dégradation gracieuse

Si le modèle manque de contexte ou rencontre une erreur, vous avez tout ce qu'il a produit jusqu'à ce point. Vous pouvez même détecter une sortie vide et relancer avec un coup de pouce :

try:
    agent(claim_text)
except Exception:
    pass

if not claim_output["parties"] and not claim_output["events"]:
    agent("You haven't started processing. Begin by identifying the parties involved.")

Comportement naturel de l'agent

Le modèle n'a pas besoin de basculer entre "réfléchir" et "formater." Il réfléchit en appelant des outils. La sortie structurée est un sous-produit du travail de l'agent, pas un fardeau de formatage supplémentaire ajouté par-dessus.

Ce pattern - outils comme Builder, accumulateur comme sortie, validation à la frontière - est la manière la plus fiable que j'ai trouvée pour obtenir des données structurées d'un workflow agentique. Ça fonctionne parce que c'est aligné avec la façon dont les modèles à appels d'outils se comportent déjà : ils raisonnent, ils agissent, ils observent les résultats, et ils agissent à nouveau. Vous faites simplement en sorte que "agir" signifie "construire un morceau de la sortie."

LLMs suck at generating large, structured data. Tips on how to get your AI agent to do it reliably

Paul SANTUS — Fri, 29 May 2026 12:06:19 +0000

LLMs are great at generating text. They're terrible at generating structured data reliably. If you've ever tried to get an agent to produce a JSON object with a specific schema, you know the pain: missing fields, hallucinated keys, inconsistent types, and outputs that break your downstream pipeline.

As I got past toy examples and labs to work on real, production-grade AI apps, I faced the problem and found an approach that works remarkably well for an AI app I'm building: use tools like object-oriented programming Builder pattern. Instead of asking the model to produce a final JSON blob, you give it tools that incrementally build the output - like calling methods on an object. The model never sees or produces the final structure directly. It just calls functions, and the structured output emerges as a side effect.

This matters especially when your agent processes large documents (like insurance forms, legal filings, medical records) that eat up most of the context window. When the input is big and the task is multi-step, you can't afford to also reserve space for a massive structured output at the end. The accumulator pattern lets you compress the conversation mid-flight without losing any of the structured data you've already collected, because that data lives outside the token window entirely.

Challenges

The "generate JSON" problem

The naive approach - asking a model to output a complete JSON structure - fails in predictable ways:

Schema drift. The model forgets required fields, invents new ones, or changes types between runs. A date field might be a string one time and an object the next.
All-or-nothing failure. If the model makes one mistake in a 200-line JSON output, the entire thing is unparseable. You either retry the whole generation or write brittle fixup code.
No incremental progress. If the model hits a context limit or stops mid-generation, you lose everything. There's no partial result to recover from.
Hallucination in structure. Models are more likely to hallucinate when producing structured output in one shot. They fill in fields they're uncertain about rather than leaving them empty, because the structure demands completeness.
Coupling research and output. When an agent needs to gather information and produce structured output, asking it to do both in one pass means it can't iterate. It commits to a structure before it has all the facts.

Why `response_format` and function-calling schemas aren't enough

Structured output modes (like OpenAI's response_format: json_schema or Bedrock's tool result schemas) help with syntax - you'll get valid JSON. But they don't solve the semantic problem. The model still has to produce the entire structure in one shot, and it still hallucinates content to fill required fields.

A wide-spread issue

Any team building autonomous or semi-autonomous agents face this, not just me. Kiro CLI, AWS' agentic dev companion, for instance, struggled hard with large data structures when first launched.

Since then, its maintainers have equipped its harness with JSON capabilities (jq manipulations, for instance) and multiples strategies (extensive use of grep, glob, tail..) to avoid filling the context window.

Still, happy to know I'm not alone in facing this :)

My solutions

Here are a few tricks I have used successfully to control both agent output and context window. As I don't claim to have all the recipes, don't hesitate to comment your own or tag my in your own posts :)

Tools as Builder methods

The core idea: define tools that act like OOP builder methods. Each tool call adds one well-typed element to an accumulator. The model's job shifts from "produce this structure" to "call these functions in the right order."

Here's the pattern - imagine an agent that processes insurance claims by reading documents and building a structured claim assessment:

from strands import tool

# The accumulator - this is your structured output
claim_output = {
    "parties": [],
    "events": [],
    "damages": [],
    "evidence": [],
    "assessment": None,
}

def reset_output():
    claim_output["assessment"] = None
    for k in ["parties", "events", "damages", "evidence"]:
        claim_output[k] = []


@tool
def add_party(name: str, role: str, policy_id: str = "") -> str:
    """Register a party involved in the claim.

    Args:
        name: Full name of the person or organization.
        role: One of: claimant, insured, witness, adjuster, third_party
        policy_id: Policy number if applicable.

    Returns:
        Confirmation with party details.
    """
    if role not in ("claimant", "insured", "witness", "adjuster", "third_party"):
        return f"Error: invalid role '{role}'. Must be one of: claimant, insured, witness, adjuster, third_party"

    claim_output["parties"].append({
        "name": name,
        "role": role,
        "policy_id": policy_id,
    })
    return f"Added {role}: {name}"


@tool
def add_event(description: str, date: str, location: str = "") -> str:
    """Record a chronological event relevant to the claim.

    Args:
        description: What happened (1-3 sentences).
        date: ISO date string (YYYY-MM-DD).
        location: Where it happened (optional).
    """
    claim_output["events"].append({
        "description": description,
        "date": date,
        "location": location,
    })
    return f"Recorded event on {date} ({len(claim_output['events'])} events total)"


@tool
def add_damage(item: str, amount: float, category: str, evidence_ref: str = "") -> str:
    """Register a damage item with estimated cost.

    Args:
        item: Description of the damaged item or cost.
        amount: Estimated cost in dollars.
        category: One of: property, medical, liability, lost_income
        evidence_ref: Reference to supporting evidence (optional).
    """
    if category not in ("property", "medical", "liability", "lost_income"):
        return f"Error: invalid category '{category}'."

    claim_output["damages"].append({
        "item": item,
        "amount": amount,
        "category": category,
        "evidence_ref": evidence_ref,
    })
    total = sum(d["amount"] for d in claim_output["damages"])
    return f"Added damage: {item} (${amount:.2f}). Running total: ${total:.2f}"

The agent is given these tools and a system prompt that tells it to process a claim. As it reads documents and discovers information, it calls add_party, add_event, and add_damage. The structured output builds up incrementally.

Validation at the boundary

Each tool call is a validation checkpoint. You can reject bad input immediately:

@tool
def add_damage(item: str, amount: float, category: str, evidence_ref: str = "") -> str:
    if category not in ("property", "medical", "liability", "lost_income"):
        return f"Error: invalid category '{category}'."
    if amount <= 0:
        return f"Error: amount must be positive, got {amount}."
    if evidence_ref and evidence_ref not in [e["id"] for e in claim_output["evidence"]]:
        return f"Error: evidence '{evidence_ref}' not registered. Call add_evidence first."
    # ...

The model gets instant feedback. If it tries to reference evidence it hasn't registered yet, the tool tells it. The model self-corrects on the next turn. Compare this to validating a 500-line JSON blob after the fact - by then, the model has moved on and can't fix its mistakes in context.

Separating research from output construction

A key benefit: the same agent can have reading tools and writing tools. Reading tools fetch and explore data. Writing tools construct the output. The model interleaves them naturally:

agent = Agent(
    model=model,
    system_prompt=prompt,
    tools=[
        # Reading tools
        read_document,
        search_policy,
        get_weather_report,
        # Writing tools (builder methods)
        add_party,
        add_event,
        add_damage,
        add_evidence,
        set_assessment,
        # Progress tracking
        mark_step_done,
    ],
)

# One call - the agent reads documents AND builds structured output
agent("Process this claim: " + claim_text)

# Output is ready
print(claim_output)

The model reads a police report, extracts a party, reads a medical bill, registers a damage item, cross-references the policy, and so on. Research and output construction are interleaved rather than sequential.

Progress tracking and recovery

Because output accumulates incrementally, you get crash recovery for free:

STEPS = [
    "1. Identify all parties",
    "2. Establish timeline of events",
    "3. Catalog damages with evidence",
    "4. Cross-reference policy coverage",
    "5. Produce assessment",
]
completed_steps: list[int] = []

@tool
def mark_step_done(step_number: int) -> str:
    """Mark a processing step as completed."""
    completed_steps.append(step_number)
    remaining = [s for i, s in enumerate(STEPS, 1) if i not in completed_steps]
    return f"Step {step_number} done. Remaining: {', '.join(remaining)}"

If the agent hits a context window limit or errors out, you already have partial results - every party identified, every event recorded, every damage item cataloged up to that point. You can resume or use what you have.

Context management with state injection

Here's where this pattern really pays off. When your agent ingests a 30-page document and then makes dozens of tool calls to fetch additional sources, the context window fills up fast. In a traditional approach, you'd lose your structured output along with the conversation when you hit the limit. But because the accumulator lives in Python memory - not in the message history - you can aggressively compress the conversation without losing a single data point.

A custom conversation manager (a possibility offered, for instance, by the Strands Agents SDK) replaces old messages with a compact state summary derived from the accumulator:

class ClaimConversationManager(ConversationManager):
    def apply_management(self, agent, **kwargs):
        messages = agent.messages
        if len(messages) <= 2:
            return

        # Keep first message + last 2 messages
        # Replace everything in between with a state summary
        first_msg = messages[0]
        recent = messages[-2:]

        state = self._build_state_summary()
        state_msg = {
            "role": "user",
            "content": [{"text": f"[STATE]\n{state}\n\nContinue."}],
        }
        messages[:] = [first_msg, state_msg] + recent

    def _build_state_summary(self) -> str:
        """Summarize what's been done using the accumulator state."""
        lines = []
        if claim_output["parties"]:
            parties = [f"{p['name']} ({p['role']})" for p in claim_output["parties"]]
            lines.append(f"Parties: {', '.join(parties)}")
        if claim_output["damages"]:
            total = sum(d["amount"] for d in claim_output["damages"])
            lines.append(f"Damages: {len(claim_output['damages'])} items, ${total:.2f} total")
        if claim_output["events"]:
            lines.append(f"Events: {len(claim_output['events'])} recorded")
        return "\n".join(lines)

Because the structured output lives in Python (not in the conversation), context compression doesn't lose any data. The model can always see what it's already produced by reading the state summary.

Benefits

Type safety without type coercion

Each tool has typed parameters enforced by the framework. The model must provide a category that's one of property, medical, liability, lost_income - not because you're parsing JSON and checking after the fact, but because the tool signature demands it. Invalid calls get rejected with clear error messages.

Composability

Tools compose naturally. You can add new output fields by adding new tools without changing existing ones. Want to track evidence attachments? Add an add_evidence tool. Want a final recommendation? Add a set_assessment tool. The model discovers new capabilities through its tool list.

Testability

Each tool is a pure function (or close to it). You can unit test them independently:

def test_add_damage_rejects_invalid_category():
    reset_output()
    result = add_damage(item="Roof repair", amount=5000, category="cosmetic")
    assert "Error" in result
    assert len(claim_output["damages"]) == 0

def test_add_damage_tracks_total():
    reset_output()
    add_damage(item="Roof repair", amount=5000, category="property")
    add_damage(item="Water damage", amount=2000, category="property")
    assert len(claim_output["damages"]) == 2
    assert sum(d["amount"] for d in claim_output["damages"]) == 7000

Deterministic output schema

The output schema is defined by your Python code, not by the model's interpretation of a prompt. claim_output always has the same keys with the same types. Downstream consumers can rely on the structure unconditionally.

Graceful degradation

If the model runs out of context or hits an error, you have everything it produced up to that point. You can even detect empty output and retry with a nudge:

try:
    agent(claim_text)
except Exception:
    pass

if not claim_output["parties"] and not claim_output["events"]:
    agent("You haven't started processing. Begin by identifying the parties involved.")

Natural agent behavior

The model doesn't have to context-switch between "thinking" and "formatting." It thinks by calling tools. The structured output is a byproduct of the agent doing its job, not an additional formatting burden layered on top.

This pattern - tools as Builder, accumulator as output, validation at the boundary - has been the most reliable way I've found to get structured data out of an agentic workflow. It works because it aligns with how tool-calling models already behave: they reason, they act, they observe results, and they act again. You're just making "act" mean "build one piece of the output."

Hack your AWS CLI to add CloudShell support and turn your terminal into a bastion

Paul SANTUS — Thu, 28 May 2026 10:10:55 +0000

I've been using AWS CloudShell from the Console for a while. It's convenient: a pre-authenticated shell in your browser, right there in the AWS Console. But I always wondered: why can't I use it from my terminal? Why is there no aws cloudshell command?

Turns out, you can make it happen. The API exists, it's just not public. And once you have CLI access to CloudShell, you can do interesting things with it, like using a VPC-attached CloudShell as a bastion to reach your private RDS instances.

Checkout the companion repository as you read through this blog post.

CloudShell: an undocumented API

AWS CloudShell has no official SDK or CLI support. But the Console has to talk to something, right? By looking at what the browser does when you open CloudShell, you can reverse-engineer the API.

Thankfully, Jérôme Guyon already did that work and published a boto3-compatible service model. His work made this whole thing possible.

The API is straightforward: create environments, start/stop them, create sessions, upload/download files. The session mechanism uses SSM's WebSocket protocol under the hood, which means session-manager-plugin (the same binary that powers aws ssm start-session) can connect to CloudShell sessions.

Teaching the AWS CLI a new trick

The AWS CLI has a little-known feature: aws configure add-model. Give it a JSON service model, and suddenly the CLI knows about a new service. AWS uses this internally for private previews.

(The boto3 model from Jérôme's repo just needs a "version": "2.0" field added at the top level to become CLI-compatible.)

Run:

aws configure add-model \
  --service-model file://cloudshell-cli-model.json \
  --service-name cloudshell

That's it. Now I have aws cloudshell with tab completion and everything:

$ aws cloudshell help

AVAILABLE COMMANDS
       create-environment
       create-session
       delete-environment
       describe-environments
       get-environment-status
       start-environment
       stop-environment
       ...

Connecting to CloudShell from the terminal

The workflow is simple:

# Create or find an environment
aws cloudshell create-environment --region eu-west-1

# Wait for it to be RUNNING
aws cloudshell get-environment-status --environment-id <ID> --region eu-west-1

# Create a session and connect
session-manager-plugin "$(aws cloudshell create-session \
  --environment-id <ID> \
  --session-type TMUX \
  --tab-id "$(uuidgen | tr '[:upper:]' '[:lower:]')" \
  --q-cli-disabled \
  --region eu-west-1 \
  --query '{SessionId:SessionId,TokenValue:TokenValue,StreamUrl:StreamUrl}' \
  --output json)" eu-west-1 StartSession

And you're in. A full shell on a CloudShell instance, from your terminal. No browser needed.

The credentials problem

There's a catch. When you use CloudShell from the Console, AWS injects your credentials automatically via a PutCredentials API call. This uses your console session token (the cookie-based auth from your browser login) to feed temporary credentials into the container's metadata endpoint.

When you connect programmatically, that doesn't happen. The container's credential endpoint returns a 500 error. You need to inject credentials yourself:

# Run locally, then paste the output into your CloudShell session
aws configure export-credentials --profile my-profile --format env

Not ideal, but it works.

The bastion use case

Here's where it gets interesting. You can create a VPC-attached CloudShell environment:

aws cloudshell create-environment \
  --environment-name db-access \
  --vpc-config '{
    "VpcId": "vpc-abc123",
    "SubnetIds": ["subnet-private-1"],
    "SecurityGroupIds": ["sg-allowed-by-rds"]
  }' \
  --region eu-west-1

Put it in the same security group that your RDS allows, and suddenly you can connect to your database directly from the shell:

mysql -h my-instance.xxx.eu-west-1.rds.amazonaws.com -u admin -p

No EC2 bastion instance to maintain. No SSH keys to manage. No hourly cost when you're not using it (CloudShell is free). The environment suspends after 20 minutes of inactivity and you can keep it alive with aws cloudshell send-heart-beat.

What doesn't work (and I tried..)

I spent a fair amount of time trying to make CloudShell work as a proper port-forwarding bastion, so you could use local tools like DBeaver against a remote RDS through it. Here's what I found:

SSM-based port forwarding doesn't work.

ECS, for instance, registers containers as SSM targets. Its SSM identifier is undocumented but once you know it, it works well, as I have described in a previous blog post. This way you can run aws ssm start-session --document-name AWS-StartPortForwardingSessionToRemoteHost.
SageMaker notebooks have kinda the same behaviour.

CloudShell instances/containers seem not to be registered as SSM managed instances. Or if they are, it's hidden and as of today, no one at AWS leaked their ID format :) I tried every combination of environment ID, session ID, and prefix format I could think of. None of them work.

Local port forwarding through the PTY doesn't work either. The session is a terminal, not a raw TCP stream. You can't pipe binary MySQL protocol data through it. I even tried setting up an ncat relay inside CloudShell and tunneling through the session. The relay works fine internally, but there's no way to expose it as a local TCP port on your machine.

UDP hole punching is theoretically possible but requires the CloudShell to have internet access (NAT Gateway on its subnet), and even then you're fighting NAT symmetry issues on both ends. I got STUN working from CloudShell, but the full hole punch is fragile and impractical for production use.

So what is it good for?

Honestly, quite a lot:

Quick database access without maintaining a bastion EC2 instance. Connect, run your queries, disconnect. Free.
Automation. You can script command execution on CloudShell via Python + session-manager-plugin. Useful for running things inside a VPC without deploying a Lambda or Fargate task.
Debugging network connectivity. Spin up a CloudShell in a specific subnet/SG combination and test what can reach what.
File transfer (from public environments). The get-file-upload-urls and get-file-download-urls APIs give you presigned S3 URLs.

The main limitation is that you're stuck running commands inside the shell. You can't use it as a transparent tunnel for local tools. For that, you still need an EC2 instance with SSM agent, or an ECS task with execute-command enabled.

Try it yourself

I published the model and a sample script here: github.com/psantus/cloudshell-cli

Installation is one command. The whole thing is a single JSON file that teaches your AWS CLI a new service. Just remember: this is an undocumented API. AWS can change or break it at any time. Don't build anything mission-critical on top of it.

But for quick VPC access from your terminal? It's pretty great.

Ne lâchez pas la bride à votre LLM

Paul SANTUS — Fri, 15 May 2026 13:27:27 +0000

Shannon l'avait prédit : un bon prompt ne remplacera jamais une boucle de feedback.

Une idée séduisante qui circule : donnez un bon prompt à un LLM, et il vous génère une application complète. Simple, rapide, magique. Vibe !

Sauf que c'est physiquement impossible. Claude Shannon l'expliquait dès 1948.

Le problème informationnel

La théorie de l'information nous enseigne un principe fondamental : on ne peut pas créer de l'information à partir de rien. Un canal de communication ne peut pas produire en sortie plus d'information qu'il n'en reçoit en entrée.

Or, regardons ce qu'on demande :

En entrée : un prompt de quelques lignes. Quelques centaines de bits d'information utile. Des intentions vagues, des contraintes implicites, des choix de design non formulés.
En sortie attendue : une application complète. Des milliers de décisions d'architecture, de design, d'UX, de gestion d'erreurs, de cas limites. Des millions de bits d'information.

Ce que le LLM apporte (et ce qu'il n'apporte pas)

Soyons honnêtes : le LLM ajoute bien de l'information. Il ne tire pas à pile ou face. Il puise dans un immense corpus d'entraînement pour combler les vides ; ses milliards de paramètres enrichissent vos 300 tokens péniblement accouchés.

Mais de quelle information s'agit-il, exactement ?

Du boilerplate. De la connaissance formelle. Les patterns classiques d'une API REST. La façon idiomatique de connecter une base de données en Python. La structure standard d'un composant React. Les conventions de nommage. Les imports habituels.

Tout ce qui relève du "comment fait-on cela généralement ?" est couvert, et c'est précieux. C'est ce qui rend le LLM si bluffant sur les démos : il produit du code qui ressemble à une vraie application, parce que la coquille formelle est correcte.

Mais votre logique métier ? Les compromis d'architecture spécifiques à votre contexte ? Le comportement exact attendu dans ce cas limite que seul votre utilisateur connaît ? Cette info n'est dans aucun corpus. C'est dans votre tête, et nulle part ailleurs.

Le LLM fournit le squelette. Vous fournissez l'âme. En termes formels : la complexité de Kolmogorov de votre application (la quantité minimale d'information pour la décrire entièrement) est bien supérieure à celle de votre prompt. Le LLM comble l'écart avec de l'information générique. Mais l'information spécifique à votre contexte est incompressible. Seul vous pouvez la fournir.

Une expérience de pensée.

Prenez une application qui fonctionne. Demandez au meilleur LLM du monde : "écris-moi le prompt parfait pour générer cette application à l'identique." Puis soumettez ce prompt à ce même LLM. Vous n'obtiendrez pas la même application. Jamais. Vous pouvez recommencer une fois, deux fois, cent fois : cent résultats différents, aucun identique à l'original.

Un prompt est à une application ce qu'un hash SHA1 est à un fichier : une réduction irréversible (un "hash"). Personne ne s'attend à reconstruire un fichier à partir de son empreinte. Pourquoi s'attendrait-on à reconstruire une application à partir de son prompt ?

Comme une impression de déjà-vu

Cette situation n'est pas nouvelle. En fait, le monde du logiciel l'a vécue pendant des décennies.

Les projets "effet tunnel" des années 2000 fonctionnaient exactement sur ce principe : on rédigeait un cahier des charges (l'équivalent d'un gros prompt), on l'envoyait à une équipe de développement (l'équivalent d'un LLM), et on attendait le résultat final des mois plus tard.

Le résultat ? Systématiquement décevant. Et pourtant, ces spécifications faisaient des centaines de pages, infiniment plus détaillées qu'un prompt. Des équipes entières passaient des mois à les rédiger. Malgré cela, le produit livré ne correspondait jamais aux attentes réelles.

Pourquoi ? Parce que même des centaines de pages de spécifications ne contiennent pas assez d'information pour décrire un logiciel complet. Les vrais besoins émergent à l'usage. Les bonnes décisions se prennent face au concret, pas dans l'abstrait.

L'agilité avait la réponse

L'industrie a mis quinze ans à comprendre et à adopter la solution : raccourcir la boucle de feedback.

L'agilité ne dit pas "ne spécifiez pas". Elle dit : "spécifiez peu, livrez vite, observez, ajustez, recommencez". L'information manquante dans la spécification initiale est injectée itération après itération, par le retour du réel.

C'est exactement le mécanisme qui compense le déficit informationnel de Shannon : chaque itération est un nouveau message sur le canal, qui apporte l'information que le message précédent ne contenait pas.

Avec l'IA, le piège est le même, en pire

Le LLM accélère spectaculairement la génération de code. C'est indéniable. Mais cette vitesse crée une illusion dangereuse : puisque le code sort vite, on croit que le produit avance vite.

Or le travail produit (comprendre le besoin, valider les choix, vérifier l'adéquation) reste incompressible. Il nécessite un humain dans la boucle, des retours fréquents, des corrections de trajectoire.

Générer 2000 lignes de code en 30 secondes pour découvrir après coup que l'architecture est inadaptée, c'est du waterfall à la vitesse de la lumière. C'est pire que l'effet tunnel classique parce que, le coût perçu étant faible, on recommence sans remettre en question l'approche.

L'illusion de la fenêtre de contexte

"Mais les LLM ont maintenant des fenêtres de contexte énormes !" Oui. Et ça aggrave le problème.

Une grande fenêtre de contexte donne l'illusion que le LLM peut traiter un sujet en profondeur, tout seul, en accumulant du raisonnement interne. En pratique, sans apport externe, il s'enferme dans sa propre "intuition". Il tourne en boucle. Il reformule. Il essaie des variantes de la même mauvaise idée. Il crame des tokens.

On a tous vu passer sur LinkedIn ces posts : "Claude a brûlé mon quota mensuel de tokens en 2h." Ce n'est pas un bug. C'est Shannon qui se manifeste : le LLM n'a pas reçu d'information nouvelle, donc il ne peut pas converger. Il génère du volume, pas de la valeur.

J'en ai fait l'expérience à répétition : un LLM coincé dans une spirale infernale depuis vingt minutes, accumulant des tentatives de plus en plus alambiquées. La solution ? Réduire le contexte (une simple commande /compact dans mon outil favori, Kiro CLI), puis injecter une minuscule correction humaine : "essaie plutôt ça" ou "je pense que X est l'origine du problème". Cinq mots. Et le LLM repart immédiatement dans la bonne direction.

Ces cinq mots contiennent plus d'information utile que les 50 000 tokens que le LLM venait de se générer à lui-même. Parce que c'est de l'information externe, qui brise la circularité. C'est exactement le signal sur le canal que Shannon décrit : sans nouveau message de l'émetteur, le récepteur ne peut pas corriger sa trajectoire.

La bonne posture

Ne lâchez pas la bride à votre LLM. Travaillez avec lui comme vous travailleriez en agile.

La cybernétique appelle ça la loi de la variété requise (Ashby, 1956) : pour piloter un système complexe, votre mécanisme de contrôle doit avoir au moins autant de variété que le système lui-même. Une application a une variété énorme (tous les comportements, états, cas limites possibles). Un seul prompt est une seule action de contrôle. Une action ne peut pas contraindre un système à millions d'états. Il en faut beaucoup, appliquées séquentiellement. Autrement dit : des itérations.

Itérations courtes : demandez un petit morceau, validez-le, puis passez au suivant. (Ce billet, par exemple ^^ : quelques idées en deux phrases, expansées par une IA, puis au moins cinq ou six itérations de feedback humain pour arriver à ce que vous lisez. CQFD.)
Feedback constant : relisez, testez, corrigez la trajectoire à chaque étape.
Décisions explicites : chaque choix de design que vous ne formulez pas est un choix que le LLM fera à votre place. Probablement pas comme vous l'auriez voulu.
Incréments fonctionnels : préférez un résultat partiel qui marche à un résultat complet qui ne correspond à rien.

Itérez !

Shannon nous le dit depuis 78 ans : l'information ne se crée pas spontanément. Un prompt court ne peut pas produire une application complète qui corresponde à vos besoins. Le LLM comble le vide avec ce qu'il connaît (le boilerplate, les patterns, les conventions), mais pas avec ce qu'il ne peut pas connaître : votre intention précise.

L'écart informationnel doit être comblé quelque part, et ce quelque part, c'est la boucle de rétroaction entre vous et votre outil.

L'IA accélère la frappe, pas la réflexion. Elle amplifie votre capacité d'exécution, pas votre capacité de décision. Gardez la main. Itérez. Ne confondez pas vitesse de génération et vitesse de création de valeur.

Le vrai superpouvoir, ce n'est pas le prompt parfait. C'est la prise de décision continue.

Remplacez AWS Transfer Family SFTP par S3 Files + Atmoz SFTP

Paul SANTUS — Wed, 08 Apr 2026 08:37:49 +0000

AWS vient de lancer l'une des fonctionnalités de stockage les plus attendues : S3 Files. S3 Files place une interface de système de fichiers compatible EFS directement devant vos buckets S3. Quand j'ai entendu parler de cette fonctionnalité (dans le cadre du programme Community Builders) j'ai tout de suite pensé au cas d'usage du SFTP sur AWS.

Si vous payez actuellement AWS Transfer Family pour donner à vos partenaires un accès SFTP à S3, lisez attentivement ce qui suit. Il existe désormais une alternative nettement moins chère et plus puissante.

Qu'est-ce que S3 Files ?

S3 Files crée un système de fichiers NFS haute performance adossé à un bucket S3. Voyez-le comme une couche EFS qui lit et écrit directement dans les objets S3, avec une synchronisation bidirectionnelle automatique. Tout fichier écrit via le système de fichiers apparaît comme un objet S3, et tout objet uploadé dans S3 devient visible via le système de fichiers.

Les propriétés clés :

Latence sub-milliseconde pour les opérations fichier
Synchronisation automatique entre le système de fichiers et le bucket S3 (via EventBridge sous le capot)
Montable sur ECS Fargate, ECS Managed Instances, EKS et EC2
Protocole NFS standard — pas de client spécial nécessaire côté compute (ECS/EKS le gèrent nativement)
Points d'accès avec contrôle d'identité POSIX (uid/gid)
S3 Versioning requis et exploité pour la cohérence

Le problème avec AWS Transfer Family

AWS Transfer Family est la solution "officielle" pour exposer des endpoints SFTP adossés à S3. Ça fonctionne, mais avec des inconvénients sérieux :

C'est cher

Transfer Family facture 0,30 $/heure rien que pour l'endpoint — soit ~216 $/mois avant même de transférer un seul octet. Ajoutez les coûts de transfert de données par-dessus. Pour un service que beaucoup d'équipes utilisent pour quelques dépôts de fichiers quotidiens, c'est difficile à justifier.

C'est une boîte noire

Vous obtenez un endpoint SFTP, mais vous ne contrôlez pas le serveur. L'authentification personnalisée nécessite des hooks Lambda. Le logging est limité. Vous ne pouvez pas vous connecter en SSH pour débugger. Vous ne pouvez pas personnaliser le comportement du serveur SFTP, ajouter des scripts de pré/post-traitement, ou exécuter quoi que ce soit à côté.

La nouvelle architecture : atmoz/sftp + S3 Files sur ECS Fargate

Voici ce que nous allons mettre en place :

Les composants :

Bucket S3 avec versioning activé (requis par S3 Files)
Système de fichiers S3 Files pointant vers le bucket, avec des mount targets dans votre VPC
Volume EFS pour les clés SSH persistantes (empreinte stable entre les redémarrages et le scaling)
Service ECS Fargate exécutant atmoz/sftp avec le volume S3 Files monté sur /home
Network Load Balancer exposant le port 22
Enregistrement DNS pour sftp.votredomaine.com (optionnel)

Les fichiers uploadés via SFTP atterrissent sur le montage S3 Files → apparaissent dans S3 en quelques secondes → déclenchent les notifications S3 pour le traitement en aval.

Comparaison des coûts

Composant	Transfer Family	S3 Files + Fargate
Coût de base	0,30 $/h (~216 $/mois)	NLB : ~16 $/mois
Compute	Inclus	Fargate 0.25 vCPU / 512 Mo : ~9 $/mois
Stockage	Tarification S3	Tarification S3 (identique)
Transfert de données	0,04 $/Go via SFTP	Tarification NLB standard
Minimum mensuel	~216 $	~25 $

C'est environ 8 fois moins cher au niveau de base. Pour les cas d'usage SFTP à trafic faible à moyen (c'est-à-dire la majorité), les économies sont significatives.

Pourquoi c'est mieux que du SFTP sur EFS

Avant S3 Files, l'approche DIY classique consistait à monter EFS sur Fargate et faire tourner atmoz/sftp. C'est exactement ce que nous faisions. Ça marchait, mais avec une limitation fondamentale : vos fichiers vivaient dans EFS, pas dans S3.

Ça signifiait :

Pas de notifications S3 à l'arrivée des fichiers
Pas de politiques de cycle de vie S3
Pas de réplication cross-region S3
Pas d'accès direct aux fichiers via l'API S3
Tarification EFS (0,30 $/Go pour Standard) vs S3 (0,023 $/Go)
Stratégie de backup séparée nécessaire

Avec S3 Files, les données vivent dans S3. Vous bénéficiez de tout l'écosystème S3 — notifications, règles de cycle de vie, réplication, analytics, tiering Glacier — tout en ayant un système de fichiers montable pour votre serveur SFTP.

Traitement événementiel des fichiers

Transfer Family et notre approche S3 Files écrivent tous deux dans S3, donc vous obtenez les mêmes capacités événementielles dans les deux cas :

Notifications S3 → SQS/SNS/Lambda pour un traitement immédiat à l'arrivée d'un fichier
Notifications S3 → EventBridge pour des règles de routage complexes
S3 Inventory pour l'audit
S3 Object Lock pour la conformité
S3 Replication pour répliquer les fichiers uploadés vers une autre région ou un autre compte

La différence n'est pas dans les fonctionnalités — c'est dans le coût. Vous obtenez exactement le même pipeline événementiel S3 pour ~25 $/mois au lieu de ~216 $/mois.

L'implémentation Terraform

Comme aws_s3files_file_system n'est pas encore dans le provider Terraform AWS (PR #47325 ouverte et priorisée), nous gérons les ressources S3 Files via terraform_data avec des provisioners local-exec appelant l'AWS CLI.

Les ressources clés :

# Système de fichiers S3 Files — créé via AWS CLI
resource "terraform_data" "s3files_file_system" {
  provisioner "local-exec" {
    command = <<-EOT
      aws s3files create-file-system \
        --bucket "$BUCKET_ARN" \
        --role-arn "$ROLE_ARN" \
        --accept-bucket-warning \
        --region "$REGION"
    EOT
  }
}

# Mount targets dans chaque sous-réseau privé
resource "terraform_data" "s3files_mount_targets" {
  for_each = toset(var.private_subnet_ids)
  provisioner "local-exec" {
    command = <<-EOT
      aws s3files create-mount-target \
        --file-system-id "$FS_ID" \
        --subnet-id "${each.value}" \
        --security-groups "$SG_ID"
    EOT
  }
}

# La task definition ECS utilise s3filesVolumeConfiguration
volume = {
  sftp-home = {
    s3files_volume_configuration = {
      file_system_arn = local.s3files_fs_arn
      root_directory  = "/"
    }
  }
}

Le code Terraform complet est disponible en tant que module Terraform. Le provider Terraform AWS ne supporte pas encore aws_s3files_file_system (PR #47325 ouverte et priorisée), donc les ressources S3 Files sont actuellement gérées via terraform_data + AWS CLI. Je m'engage à mettre à jour ce module pour utiliser les ressources Terraform natives dès que le provider intégrera le support S3 Files.

Configuration IAM

Deux rôles IAM sont nécessaires :

Rôle de service S3 Files — assumé par elasticfilesystem.amazonaws.com pour synchroniser entre le système de fichiers et le bucket S3. Nécessite un accès S3 en lecture/écriture sur le bucket + des permissions EventBridge pour la détection des changements.
Rôle de tâche ECS — nécessite s3files:ClientMount, s3files:ClientWrite, et s3:GetObject/s3:ListBucket sur le bucket pour des lectures optimisées.

Quand Transfer Family reste pertinent

Pour être honnête, Transfer Family n'est pas mort pour tous les cas d'usage :

Gestion managée des clés SFTP et des utilisateurs — Transfer Family intègre nativement des fournisseurs d'identité (AD, authentification Lambda custom). Avec atmoz/sftp, vous gérez les utilisateurs via la configuration.
Support du protocole AS2 — si vous avez besoin d'AS2, Transfer Family reste la seule option managée.
FTPS — Transfer Family supporte FTPS nativement.
Tolérance zéro aux opérations — si vous ne pouvez vraiment pas gérer un conteneur, Transfer Family est entièrement managé.

Mais pour la grande majorité des cas d'usage SFTP — des partenaires qui déposent des fichiers à traiter — l'approche S3 Files est moins chère, plus flexible, et offre une meilleure observabilité.

Pour démarrer

Prérequis : Les commandes aws s3files nécessitent AWS CLI v2.34.26 ou ultérieur. Vous avez également besoin de jq (utilisé par les scripts des provisioners Terraform). Mettez à jour la CLI avec brew upgrade awscli ou consultez le guide d'installation AWS CLI.

Créer un bucket S3 avec le versioning activé
Créer un rôle IAM pour S3 Files avec les politiques de confiance et de permissions requises
Créer un système de fichiers S3 Files via la console ou aws s3files create-file-system
Créer des mount targets dans vos sous-réseaux VPC
Créer un EFS pour les clés SSH persistantes
Déployer un service ECS Fargate avec atmoz/sftp, en montant S3 Files sur /home et EFS sur /etc/ssh/
Placer un NLB devant, pointer votre DNS dessus
Configurer les notifications S3 sur le bucket pour le traitement en aval

Ou utilisez simplement le module Terraform — le tout se déploie en moins de 10 minutes.

Test de bout en bout

Après terraform apply, le serveur SFTP est prêt en environ 8 minutes (l'essentiel du temps est consacré à la mise à disposition des mount targets S3 Files). Voici un test rapide :

# Upload d'un fichier
echo "Hello from S3 Files SFTP!" > test.txt
sshpass -p demo sftp -o StrictHostKeyChecking=no -P 22 demo@<sftp_endpoint> <<EOF
cd upload
put test.txt
bye
EOF

# Vérifier qu'il est arrivé dans S3 (attendre ~30-60s pour la synchro)
aws s3 cp s3://<sftp_bucket_name>/demo/upload/test.txt -
# Output: Hello from S3 Files SFTP!

Nous avons également vérifié que les clés SSH persistent entre les redémarrages de tâches — l'empreinte du serveur reste identique après un redéploiement forcé, grâce au volume EFS monté sur /etc/ssh/.

Conclusion

S3 Files comble le fossé entre système de fichiers et stockage objet d'une manière qui rend beaucoup de services AWS coûteux redondants. Pour le SFTP en particulier, la combinaison atmoz/sftp + S3 Files sur Fargate vous offre :

~8x moins cher que Transfer Family
Contrôle total sur le serveur SFTP
Notifications S3 natives pour le traitement événementiel
S3 comme source de vérité — règles de cycle de vie, réplication, analytics fonctionnent
Infrastructure as Code avec Terraform (même avant le support natif du provider)

L'époque où il fallait payer 216 $/mois minimum pour un endpoint SFTP managé est révolue pour la plupart des équipes. S3 Files est la pièce manquante qui rend le SFTP DIY sur AWS non seulement viable, mais ~8x moins cher.

AWS S3 Files just made Transfer Family SFTP obsolete for most use cases

Paul SANTUS — Wed, 08 Apr 2026 08:15:29 +0000

AWS just launched one of the most impactful storage features in years: S3 Files. It puts an EFS-compatible file system interface directly in front of your S3 buckets. When I was introduced to S3 Files (as part of AWS Community Builder program), I immediately thought of SFTP as the most obvious use case for my clients.

If you're currently paying for AWS Transfer Family to give your partners SFTP access to S3, you should read this carefully. There's now a dramatically cheaper and more powerful alternative.

What is S3 Files?

S3 Files creates a high-performance NFS file system backed by an S3 bucket. Think of it as an EFS-like layer that reads and writes directly to S3 objects, with automatic bidirectional synchronization. Any file written through the file system appears as an S3 object, and any object uploaded to S3 becomes visible through the file system.

The key properties:

Sub-millisecond latency for file operations
Automatic sync between file system and S3 bucket (powered by EventBridge under the hood)
Mountable on ECS Fargate, ECS Managed Instances, EKS, and EC2
Standard NFS protocol — no special client needed on the compute side (ECS/EKS handle it natively)
Access points with POSIX user/group enforcement
S3 Versioning required and leveraged for consistency

The Problem with AWS Transfer Family

AWS Transfer Family has been the "official" way to expose SFTP endpoints backed by S3. It works, but it comes with serious pain points:

It's expensive

Transfer Family charges $0.30/hour just for the endpoint — that's ~$216/month before you transfer a single byte. Add data transfer costs on top. For a service that many teams use for a handful of daily file drops, this is hard to justify.

It's a black box

You get an SFTP endpoint, but you don't control the server. Custom authentication requires Lambda hooks. Logging is limited. You can't SSH in to debug. You can't customize the SFTP server behavior, add pre/post-processing scripts, or run anything alongside it.

The New Architecture: atmoz/sftp + S3 Files on ECS Fargate

Here's what you could run instead:

The components:

S3 bucket with versioning enabled (required by S3 Files)
S3 Files file system pointed at the bucket, with mount targets in your VPC
EFS volume for persistent SSH host keys (stable fingerprint across restarts and scaling)
ECS Fargate service running atmoz/sftp with the S3 Files volume mounted at /home
Network Load Balancer exposing port 22
DNS record for sftp.yourdomain.com (optional)

Files uploaded via SFTP land on the S3 Files mount → appear in S3 within seconds → trigger S3 event notifications for downstream processing.

Cost Comparison

Component	Transfer Family	S3 Files + Fargate
Base cost	$0.30/hr (~$216/mo)	NLB: ~$16/mo
Compute	Included	Fargate 0.25 vCPU / 512MB: ~$9/mo
Storage	S3 pricing	S3 pricing (same)
Data transfer	$0.04/GB over SFTP	Standard NLB pricing
Monthly minimum	~$216	~$25

That's roughly 8x cheaper at the base level. For low-to-medium traffic SFTP use cases (which is most of them), the savings are significant.

Why This Is Better Than EFS-Backed SFTP

Before S3 Files, the common DIY approach was to mount EFS on Fargate and run atmoz/sftp. We did exactly this. It worked, but had a fundamental limitation: your files lived in EFS, not S3.

That meant:

No S3 event notifications when files arrived
No S3 lifecycle policies
No S3 cross-region replication
No direct S3 API access to the files
EFS pricing ($0.30/GB for Standard) vs S3 ($0.023/GB)
Separate backup strategy needed

With S3 Files, the data lives in S3. You get the full S3 feature set — notifications, lifecycle rules, replication, analytics, Glacier tiering — while still having a mountable file system for your SFTP server.

Event-Driven File Processing

Both Transfer Family and our S3 Files approach write to S3, so you get the same event-driven capabilities either way:

S3 Event Notifications → SQS/SNS/Lambda for immediate processing when a file arrives
S3 Event Notifications → EventBridge for complex routing rules
S3 Inventory for auditing
S3 Object Lock for compliance
S3 Replication to replicate uploaded files to another region or account

The difference isn't in features — it's in cost. You get the exact same S3 event-driven pipeline for ~$25/mo instead of ~$216/mo.

The Terraform Implementation

Since aws_s3files_file_system isn't in the Terraform AWS provider yet (PR #47325 is open and prioritized), we manage S3 Files resources through terraform_data with local-exec provisioners calling the AWS CLI.

The key resources:

# S3 Files file system — created via AWS CLI
resource "terraform_data" "s3files_file_system" {
  provisioner "local-exec" {
    command = <<-EOT
      aws s3files create-file-system \
        --bucket "$BUCKET_ARN" \
        --role-arn "$ROLE_ARN" \
        --accept-bucket-warning \
        --region "$REGION"
    EOT
  }
}

# Mount targets in each private subnet
resource "terraform_data" "s3files_mount_targets" {
  for_each = toset(var.private_subnet_ids)
  provisioner "local-exec" {
    command = <<-EOT
      aws s3files create-mount-target \
        --file-system-id "$FS_ID" \
        --subnet-id "${each.value}" \
        --security-groups "$SG_ID"
    EOT
  }
}

# ECS task definition uses s3filesVolumeConfiguration
volume = {
  sftp-home = {
    s3files_volume_configuration = {
      file_system_arn = local.s3files_fs_arn
      root_directory  = "/"
    }
  }
}

The full working Terraform code is available as a Terraform module. The Terraform AWS provider doesn't support aws_s3files_file_system yet (PR #47325 is open and prioritized), so S3 Files resources are currently managed via terraform_data + AWS CLI. I pledge to update this module to use native Terraform resources as soon as the provider ships S3 Files support.

IAM Setup

Two IAM roles are needed:

S3 Files service role — assumed by elasticfilesystem.amazonaws.com to sync between the file system and S3 bucket. Needs S3 read/write on the bucket + EventBridge permissions for change detection.
ECS task role — needs s3files:ClientMount, s3files:ClientWrite, and s3:GetObject/s3:ListBucket on the backing bucket for optimized reads.

When Transfer Family Still Makes Sense

To be fair, Transfer Family isn't dead for every use case:

Managed SFTP keys and user management — Transfer Family has built-in identity provider integration (AD, Lambda custom auth). With atmoz/sftp, you manage users via config.
AS2 protocol support — if you need AS2, Transfer Family is still the only managed option.
FTPS — Transfer Family supports FTPS natively.
Zero ops tolerance — if you truly cannot manage a container, Transfer Family is fully managed.

But for the vast majority of SFTP use cases — partners dropping files that need processing — the S3 Files approach is cheaper, more flexible, and gives you better observability.

Getting Started

Prerequisite: The aws s3files commands require AWS CLI v2.34.26 or later. You also need jq installed (used by the Terraform provisioner scripts). Update the CLI with brew upgrade awscli or see AWS CLI install guide.

Create an S3 bucket with versioning enabled
Create an IAM role for S3 Files with the required trust and permissions policies
Create an S3 Files file system via the console or aws s3files create-file-system
Create mount targets in your VPC subnets
Create an EFS for persistent SSH host keys
Deploy an ECS Fargate service with atmoz/sftp, mounting S3 Files at /home and EFS at /etc/ssh/
Put an NLB in front, point your DNS at it
Set up S3 event notifications on the bucket for downstream processing

Or just use the Terraform module — the whole thing deploys in under 10 minutes.

Testing it end-to-end

After terraform apply, the SFTP server is ready in about 8 minutes (most of the time is S3 Files mount targets becoming available). Here's a quick test:

# Upload a file
echo "Hello from S3 Files SFTP!" > test.txt
sshpass -p demo sftp -o StrictHostKeyChecking=no -P 22 demo@<sftp_endpoint> <<EOF
cd upload
put test.txt
bye
EOF

# Verify it landed in S3 (wait ~30-60s for sync)
aws s3 cp s3://<sftp_bucket_name>/demo/upload/test.txt -
# Output: Hello from S3 Files SFTP!

We also verified that SSH host keys persist across task restarts — the server fingerprint stays the same after a forced redeployment, thanks to the EFS volume mounted at /etc/ssh/.

Conclusion

S3 Files bridges the gap between file system and object storage in a way that makes a lot of expensive AWS services feel redundant. For SFTP specifically, the combination of atmoz/sftp + S3 Files on Fargate gives you:

~8x lower cost than Transfer Family
Full control over the SFTP server
Native S3 event notifications for event-driven processing
S3 as the source of truth — lifecycle rules, replication, analytics all work
Infrastructure as Code with Terraform (even before native provider support)

The days of paying $216/month minimum for a managed SFTP endpoint are over for most teams. S3 Files is the missing piece that makes DIY SFTP on AWS not just viable, but ~8x cheaper.

Agents Bedrock AgentCore en mode VPC : attention aux coûts de NAT Gateway !

Paul SANTUS — Sat, 04 Apr 2026 08:59:14 +0000

La semaine dernière, j'ai reçu une alerte d'anomalie de coûts AWS. L'alerte pointait vers mon compte de formation (où je fais mes démos et aussi mes POCs), signalant une charge inattendue de 29 $ sous — étrangement — Amazon Elastic Block Store. Le type d'utilisation racontait cependant une tout autre histoire : NatGateway-Bytes. 659 Go de données avaient transité par ma NAT Gateway en six jours.

J'avais récemment déployé un agent vocal sur Bedrock AgentCore Runtime en mode VPC, utilisant une NAT Gateway pour l'accès internet sortant (nécessaire pour le relais TURN WebRTC) — voir mon billet de blog ici. Le VPC avait été créé spécifiquement pour cet agent, donc le suspect était évident. Mais je voulais des preuves concrètes avant de tirer des conclusions. Était-ce le trafic WebRTC ? Autre chose ?

Début de l'investigation

Mon premier réflexe a été de consulter les métriques CloudWatch de la NAT Gateway. La métrique BytesOutToDestination (trafic du conteneur vers internet) ne montrait que 2,1 Go au total sur les six jours. Négligeable. Mais BytesInFromDestination (trafic d'internet vers le conteneur à travers la NAT) racontait une tout autre histoire :

Date	Entrant via la NAT
26 mars	6,3 Go
27 mars	240,3 Go
28 mars	149,1 Go
29 mars	149,8 Go
30 mars	102,3 Go
31 mars	15,0 Go
1er avril	5,4 Go (journée partielle)

Ce déséquilibre entre les flux entrants et sortants plaidait contre WebRTC comme responsable du trafic.

De plus, la métrique ActiveConnectionCount montrait un nombre stable d'environ 90 connexions 24h/24, même quand personne n'utilisait l'agent. Le pattern horaire était remarquablement régulier — alternant entre ~850 Mo et ~430 Mo par heure, en continu.

Pour en avoir le cœur net, j'ai vérifié CloudTrail pour les événements InvokeAgentRuntime entre le 28 et le 30 mars. Zéro. Aucune activité utilisateur pendant la période avec le trafic le plus intense. L'agent était complètement inactif.

Activation des VPC Flow Logs

J'avais besoin de voir d'où venait le trafic. J'ai activé les VPC Flow Logs (j'aurais dû le faire dès le premier jour ? Bah, c'était un POC !) sur le VPC, en les envoyant vers un groupe de logs CloudWatch, et j'ai lancé une requête Logs Insights pour identifier les plus gros consommateurs :

stats sum(bytes) as totalBytes by srcAddr, dstAddr, dstPort
| sort totalBytes desc
| limit 20

Les résultats sur une fenêtre de deux heures montraient une poignée d'adresses IP responsables de tout le trafic lourd :

    52.216.58.42 ->       10.0.0.144: 31175     270.1 MB
    16.15.207.229 ->      10.0.0.144: 62935     263.7 MB
    16.15.191.63 ->       10.0.0.144: 25320     263.6 MB
    52.216.12.24 ->       10.0.0.144: 12542     115.8 MB
    3.5.16.209 ->         10.0.0.144: 30762     113.4 MB
    16.15.199.52 ->       10.0.0.144: 49632     113.3 MB
    54.231.160.154 ->     10.0.0.144: 55754      29.6 MB

L'adresse 10.0.0.144 est l'IP privée de la NAT Gateway. Tout le trafic transitait depuis des IP externes, à travers la NAT, vers les ENI du conteneur AgentCore dans les sous-réseaux privés.

Identification de la source

J'avais besoin de savoir à quel service appartenaient ces IP. J'ai utilisé mon outil does-this-ip-belong-to-aws, qui vérifie les IP par rapport aux plages IP officielles AWS publiées sur https://ip-ranges.amazonaws.com/ip-ranges.json.

Chaque IP à fort trafic correspondait à Amazon S3 en us-east-1 !

Tout le trafic — jusqu'au dernier gigaoctet — était des téléchargements S3 transitant par la NAT Gateway.

Le correctif : S3 Gateway Endpoint

Le correctif est simple et gratuit. Un S3 Gateway VPC Endpoint route le trafic S3 directement via le réseau AWS, contournant entièrement la NAT Gateway. Contrairement aux interface endpoints, les gateway endpoints n'ont ni frais horaires ni frais de traitement de données.

resource "aws_vpc_endpoint" "s3" {
  vpc_id       = aws_vpc.main.id
  service_name = "com.amazonaws.${var.aws_region}.s3"
  route_table_ids = [
    aws_route_table.private.id,
    aws_route_table.public.id,
  ]
}

Un terraform apply et les coûts de transfert de données de la NAT Gateway tombent à quasi zéro.

Ce qui soulève une question plus large : pourquoi ne pas toujours avoir un S3 Gateway Endpoint dans un VPC ? C'est gratuit, ça se crée en une seule ressource, et ça prévient exactement ce genre de surprise. Si vous créez des VPC avec des sous-réseaux privés et des NAT Gateways, ajoutez un S3 Gateway Endpoint par défaut. Il n'y a aucun inconvénient. Les S3 Gateway endpoints sont bons pour votre portefeuille, sinon pour votre âme.

La cause racine : recyclage du warm pool

Après avoir ouvert un ticket de support, l'équipe du service Bedrock AgentCore a identifié la cause racine.

AgentCore Runtime maintient un warm pool de VM pour garantir des invocations à faible latence. Chaque VM du pool télécharge l'image du conteneur depuis ECR — et ECR stocke les couches d'images dans S3. Mon image de conteneur faisait environ 435 Mo compressée.

Trois facteurs se sont combinés pour produire la facture de 659 Go :

Premièrement, les 21 appels API UpdateAgentRuntime que j'ai effectués le 27 mars (une journée de débogage et redéploiement intensifs) ont chacun déclenché un cycle asynchrone de re-provisionnement du warm pool. Plusieurs séries de provisionnement de 10 VM, chacune téléchargeant l'image de 435 Mo, ont produit le pic de ~240 Go observé ce jour-là.

Deuxièmement, le warm pool a continué à recycler les VM les jours suivants pour les garder fraîches et prêtes. Avec 10 VM téléchargeant chacune l'image périodiquement, le trafic stable de ~150 Go/jour du 28 au 30 mars est cohérent avec un recyclage régulier.

Troisièmement, après environ 72 heures sans invocations, le warm pool a automatiquement réduit sa taille de 10 VM à 1 VM. Cela explique la chute de ~150 Go/jour à ~15 Go/jour le 31 mars.

Le recyclage du warm pool est un comportement attendu de la plateforme — c'est ce qui permet à AgentCore de servir les requêtes avec une faible latence. Le problème était que tous ces téléchargements S3 passaient par ma NAT Gateway à 0,045 $/Go au lieu de rester sur le réseau interne AWS.

Lancer autant de VM pour si peu d'invocations me semble un peu comme tirer au bazooka pour tuer une mouche ; je me demande si c'est soutenable... Cela dit, AWS a un bon historique de gestion d'activités rentables à grande échelle : qui suis-je pour juger ?

Quoi qu'il en soit, l'équipe du service a promis de mettre à jour la documentation pour que pas (trop) d'utilisateurs ne se retrouvent face à ces charges (franchement) indues.

Points à retenir

Si vous utilisez Bedrock AgentCore Runtime en mode VPC, trois choses à garder en tête :

Ajoutez un S3 Gateway Endpoint à votre VPC. C'est gratuit et ça élimine ce qui s'est avéré être la source dominante de coûts de transfert de données de la NAT Gateway — les téléchargements d'images ECR par le warm pool. AWS a confirmé qu'ils mettent à jour leur documentation VPC pour recommander cela plus visiblement. Il n'y a véritablement aucune raison de ne pas en avoir un dans chaque VPC avec des sous-réseaux privés.
Soyez attentif à la taille de votre image de conteneur. Mon image de 435 Mo, téléchargée par un warm pool de 10 VM avec recyclage régulier, a généré des centaines de gigaoctets de transfert. Réduire l'image (builds multi-étapes, moins de dépendances, base Alpine) réduit directement ce coût — même avec le endpoint S3 en place, des images plus petites signifient des démarrages à froid plus rapides.
Surveillez vos métriques NAT Gateway tôt. Les métriques BytesInFromDestination et BytesOutToSource dans CloudWatch vous montreront si quelque chose d'inattendu se passe. Je ne m'en suis rendu compte que grâce à l'alerte d'anomalie de coûts — à ce moment-là, 29 $ avaient déjà été dépensés. Les VPC Flow Logs combinés avec CloudWatch Logs Insights ont rendu le diagnostic simple une fois que j'ai regardé.

Paul Santus est consultant cloud indépendant chez TerraCloud. Il accompagne les organisations dans la construction et le déploiement d'applications IA sur AWS. Retrouvez-le sur LinkedIn.

VPC-connected Bedrock AgentCore Runtime-hosted agents: beware of NAT Gateway costs!

Paul SANTUS — Fri, 03 Apr 2026 14:05:35 +0000

Last week I received a cost anomaly alert from AWS. The alert pointed at my training account, flagging an unexpected $29 charge under — oddly enough — Amazon Elastic Block Store. The usage type, however, told a different story: NatGateway-Bytes. 659 GB of data had flowed through my NAT Gateway in six days.

I had recently deployed a voice agent on Bedrock AgentCore Runtime in VPC mode, using a NAT Gateway for outbound internet access (required for WebRTC TURN relay) - see my blog post here. The VPC had been created specifically for this agent, so the suspect was obvious. But I wanted ground truth before jumping to conclusions. Was it WebRTC traffic? Something else?

Starting the investigation

My first stop was CloudWatch metrics on the NAT Gateway. The BytesOutToDestination metric (traffic from the container to the internet) showed only 2.1 GB total over the six days. Negligible. But BytesInFromDestination (traffic from the internet into the container through the NAT) told a very different story:

Date	Inbound through NAT
Mar 26	6.3 GB
Mar 27	240.3 GB
Mar 28	149.1 GB
Mar 29	149.8 GB
Mar 30	102.3 GB
Mar 31	15.0 GB
Apr 01	5.4 GB (partial)

This unbalanced metrics values between inbound and outbound flows pleaded against WebRTC as the traffic culprit.

Moreover, the ActiveConnectionCount metric showed a steady ~90 connections 24/7, even when nobody was using the agent. The hourly pattern was remarkably regular — alternating between ~850 MB and ~430 MB per hour, around the clock.

Just to be sure, I checked CloudTrail for InvokeAgentRuntime events between March 28 and March 30. Zero. No user activity at all during the period with the heaviest traffic. The agent was completely idle.

Enabling VPC Flow Logs

I needed to see where the traffic was coming from. I enabled VPC Flow Logs (shouldn't have it done on day 1? Nay, this was a POC workload!) on the VPC, sending them to a CloudWatch log group, and ran a Logs Insights query to identify the top talkers:

stats sum(bytes) as totalBytes by srcAddr, dstAddr, dstPort
| sort totalBytes desc
| limit 20

The results over a two-hour window showed a handful of IP addresses responsible for all the heavy traffic:

    52.216.58.42 ->       10.0.0.144: 31175     270.1 MB
    16.15.207.229 ->      10.0.0.144: 62935     263.7 MB
    16.15.191.63 ->       10.0.0.144: 25320     263.6 MB
    52.216.12.24 ->       10.0.0.144: 12542     115.8 MB
    3.5.16.209 ->         10.0.0.144: 30762     113.4 MB
    16.15.199.52 ->       10.0.0.144: 49632     113.3 MB
    54.231.160.154 ->     10.0.0.144: 55754      29.6 MB

The 10.0.0.144 address is the NAT Gateway's private IP. All the traffic was flowing from external IPs, through the NAT, to the AgentCore container ENIs in the private subnets.

Identifying the source

I needed to know what service these IPs belonged to. I used my does-this-ip-belong-to-aws tool, which checks IPs against the official AWS IP ranges published at https://ip-ranges.amazonaws.com/ip-ranges.json.

Every single high-traffic IP resolved to Amazon S3 in us-east-1!

All the traffic — every last gigabyte — was S3 pulls flowing through the NAT Gateway.

The fix: S3 Gateway Endpoint

The fix is straightforward and free. An S3 Gateway VPC Endpoint routes S3 traffic directly through the AWS network, bypassing the NAT Gateway entirely. Unlike interface endpoints, gateway endpoints have no hourly charge and no data processing fee.

resource "aws_vpc_endpoint" "s3" {
  vpc_id       = aws_vpc.main.id
  service_name = "com.amazonaws.${var.aws_region}.s3"
  route_table_ids = [
    aws_route_table.private.id,
    aws_route_table.public.id,
  ]
}

One terraform apply and the NAT Gateway data transfer cost drops to near zero.

This raises a broader question: why would you ever not have an S3 Gateway Endpoint in a VPC? It's free, takes one resource to create, and prevents exactly this kind of surprise. If you're creating VPCs with private subnets and NAT Gateways, add an S3 Gateway Endpoint as a default. There's no downside. S3 Gateway endpoints are good for you wallet, if not for your soul.

The root cause: warm pool recycling

After filing a support case, the Bedrock AgentCore service team identified the root cause.

AgentCore Runtime maintains a warm pool of VMs to ensure low-latency invocations. Each VM in the pool pulls the container image from ECR — and ECR stores image layers in S3. My container image was ~435 MB compressed.

Three things combined to produce the 659 GB bill:

First, the 21 UpdateAgentRuntime API calls I made on March 27 (a day of heavy debugging and redeployment) each triggered an asynchronous warm pool re-provisioning cycle. Multiple rounds of 10-VM provisioning, each pulling the 435 MB image, produced the ~240 GB spike that day.

Second, the warm pool continued recycling VMs over the following days to keep them fresh and ready. With 10 VMs each pulling the image periodically, the steady ~150 GB/day on March 28-30 is consistent with regular recycling.

Third, after approximately 72 hours with no invocations, the warm pool automatically downscaled from 10 VMs to 1 VM. This explains the drop from ~150 GB/day to ~15 GB/day on March 31.

The warm pool recycling is expected platform behavior — it's what makes AgentCore able to serve requests with low latency. The problem was that all those S3 pulls were routing through my NAT Gateway at $0.045/GB instead of staying on the AWS internal network.

Firing these many VMs for so few invocations seems to mme like shooting a bazooka to kill a fly; I wonder how sustainable that is.. yet AWS has a good track record at operating profitable business at scale: who am I to judge?

Anyway, the service team promised they'll make an update to the documentation so that not (too) many users face these (frankly) undue charges.

Takeaways

If you're running Bedrock AgentCore Runtime in VPC mode, three things to keep in mind:

add an S3 Gateway Endpoint to your VPC. It's free and eliminates what turned out to be the dominant source of NAT Gateway data transfer costs — ECR image pulls from the warm pool. AWS has confirmed they are updating their VPC documentation to more prominently recommend this. There is genuinely no reason not to have one in every VPC with private subnets.
be mindful of container image size. My 435 MB image, pulled across a 10-VM warm pool with regular recycling, generated hundreds of gigabytes of transfer. Slimming the image (multi-stage builds, fewer dependencies, Alpine base) directly reduces this cost — even with the S3 endpoint in place, smaller images mean faster cold starts.
monitor your NAT Gateway metrics early. The BytesInFromDestination and BytesOutToSource metrics in CloudWatch will show you if something unexpected is happening. I only noticed because of the cost anomaly alert — by then, $29 had already been spent. VPC Flow Logs combined with CloudWatch Logs Insights made the diagnosis straightforward once I looked.

Paul Santus is an independent cloud consultant at TerraCloud. He helps organizations build and deploy AI-powered applications on AWS. Connect with him on LinkedIn.