Automating Replica & Memory Config Sync Across 27+ AWS Repositories with Claude Sonnet 4.6

#automation #aws #claude #devops

Earlier this month I noticed a pattern in our performance (perf) environment that made my stomach drop—the number of ECS task replicas and the memory configuration in perf didn’t match production. Because we use our perf environment for load testing and cost‑analysis, any discrepancy between perf and production makes those metrics meaningless.
I started digging and discovered this mismatch wasn’t a one‑off. Across 27 different repositories, the AWS CloudFormation/ECS service definitions were configured correctly for production but had lower task counts or smaller memory allocations in perf. That meant our performance testing used fewer containers and less memory than production, leading us to underestimate memory pressure and concurrency issues. Fixing each repo manually would take ages, so I asked Claude Sonet 4.6 for help.

Why perf should mirror production

A core principle of performance testing is that your perf environment should behave like production. BlazeMeter’s performance‑testing guide notes that your testing environment should mimic the production environment’s hardware, server configurations, network settings and operating systems . Without matching resources and configurations, load tests are misleading. I knew that our mismatched task counts and memory limits were skewing our results.

In Amazon ECS/Fargate, a service’s replica scheduling strategy ensures that the desired number of tasks are running. Datadog’s monitoring guide explains that the replica strategy places and maintains a desired number of tasks across your cluster, distributing them evenly across availability zones and only to container instances with sufficient resources . If your perf environment uses fewer replicas than production, you aren’t testing under the same concurrency. Similarly, container memory settings influence both stability and cost. As Shih‑Ting Yuan explains, ECS defines memory at two levels:

memoryReservation (soft limit) – the amount of memory a task needs when placed on a container instance. The sum of the memoryReservation values for all tasks on an instance cannot exceed the available memory on that instance .
memory (hard limit) – the upper bound on how much memory a container can use; exceeding it causes the container to be killed . The available memory on a container instance is reduced by memoryReservation, or by memory if memoryReservation isn’t set .

Matching these settings in perf and production ensures that your application experiences the same memory pressure and scaling behaviour during tests as it will in the real world.

The challenge: 27 misaligned repos

Our infrastructure-as-code templates had grown organically over time. Some repos used CloudFormation, others used Terraform or CDK. Each defined its own ECS service with a desiredCount, CPU and memory settings. Engineers often tuned perf services to consume fewer resources, thinking it would save costs. But those changes resulted in perf using fewer replicas or lower memory allocations, which made our load tests less realistic.

Auditing them manually would take hours because I’d need to search every repo, find the perf configuration and compare it against production. That’s when I turned to Claude Sonet 4.6.

Using Claude Sonet 4.6 to automate the fix

I provided Claude with the high‑level goal: “check every repository for ECS services and ensure that the perf environment uses the same number of replicas and memory configuration as production.” Within 25 minutes the agent understood the project structure, iterated through all 27 repositories and produced patches.

Here’s what the process looked like:

Identification. Claude scanned each repository, locating the ECS service definitions in our CDK/CloudFormation/terraform files. It captured the desiredCount, memory, and memoryReservation values for production and perf.
Comparison. For each repo, the agent compared the perf settings to the production settings. If the perf desiredCount or memory settings were lower, Claude flagged them.
Update generation. Using code insight, Claude created pull requests that aligned the perf configuration with production. This included setting the same desiredCount and memory values.
Pull requests. The agent opened PRs in each repository with a clear description of the change, referencing the reasoning (e.g., “Perf environment should mirror production to ensure accurate performance testing” with links to relevant docs).
Review & merge. I reviewed a couple of PRs manually to confirm they made the correct changes, then used batch‑merge tools to merge the rest. The entire process—from issue discovery to merged PRs—took less than half an hour.

Lessons learned

Automate configuration drift detection. Manual audits across dozens of repos don’t scale. An LLM‑powered agent can quickly parse infrastructure code, compare environments and suggest fixes.
Mirror perf and production environments. As BlazeMeter emphasizes, performance environments should match production as closely as possible . Aligning ECS desiredCount and memory settings ensures that load tests are meaningful.
Understand ECS scheduling and memory limits. Knowing how replica scheduling works (tasks are spread across availability zones and require sufficient resources ) and how memory settings impact scheduling (soft vs hard limits ) helps you configure services correctly.
Small changes have big effects. Reducing replicas or memory in perf might seem harmless, but it can hide concurrency issues or memory‑related crashes that only appear in production. Keeping the configurations identical helps catch these problems earlier.

Final thoughts

This experience changed the way I think about managing infrastructure. Instead of accepting configuration drift as inevitable, we can use agents like Claude Sonet 4.6 to constantly validate and align our environments. Ensuring perf mirrors production—not just in code but also in resource allocation—gives us confidence that our performance tests reflect real‑world behaviour. And with automation, keeping dozens of repositories consistent doesn’t have to be a full‑time job.

Have you used AI agents to manage your infrastructure configurations? I’d love to hear about your experiences in the comments.

Top comments (1)

Victor Okefie • Apr 27

You didn't just fix replicas. You fixed the gap between "works in perf" and "works when it counts." Most teams let that drift happen because manually auditing 27 repos feels impossible. So they don't. Then they wonder why load tests lied.

Claude didn't do anything magical. It just did the tedious comparison humans won't do because the friction is too high. That's the real value. Not intelligence. Just patience at scale. The lesson isn't about AI. It's about treating drift like a bug instead of a feature you stopped noticing.