Understanding AWS Autoscaling with Grafana

aws ecs execute-command --cluster recipe-finder-prod-cluster \ --task a55518997ca84f24bc2fd614cbc18f20 \ --container recipe-finder-api \ --interactive \ --command "/bin/sh -c 'while true; do :; done & while true; do :; done'"

Event	Timestamp
CPU crossed 70%	12:09:00
Alarm triggered	12:13:25
Desired tasks increased	12:14:00
New task running	12:15:00

Event

Timestamp

CPU crossed 70%

12:09:00

Alarm triggered

12:13:25

Desired tasks increased

12:14:00

New task running

12:15:00

Event	Timestamp	Notes
CPU fell below scale-in threshold (<63%)	12:34	Based on Grafana
Low alarm triggered (OK → ALARM)	12:49	15-min evaluation period complete
ECS desired tasks decreased	12:50	ECS starts stopping tasks
Extra task stopped (scale-in complete)	12:52	Task fully terminated

Event

Timestamp

Notes

CPU fell below scale-in threshold (<63%)

12:34

Based on Grafana

Low alarm triggered (OK → ALARM)

12:49

15-min evaluation period complete

ECS desired tasks decreased

12:50

ECS starts stopping tasks

Extra task stopped (scale-in complete)

12:52

Task fully terminated

Closing Note:

Autoscaling ensures your application can handle spikes, but it comes with temporary performance trade-offs:

During scale-out: When CPU spikes and new Fargate tasks are being launched, your application may briefly return 5xx errors or slower responses. In our experiment, we did see 5% errors for a few minutes during the initial warm-up period before the new tasks fully came online. This “warm-up latency” is an inherent part of reactive autoscaling.

During scale-in: ECS gradually terminates idle tasks once the Low alarm confirms sustained low CPU. This process is intentionally slow to avoid task flapping, ensuring that users aren’t suddenly impacted if traffic spikes again.

Observing CPU, alarm state, and task events together helps understand exactly how long users may experience degraded performance during scaling, and informs decisions about pre-warming, thresholds, and evaluation periods to minimize those user-facing impacts.

Github link:

Shireenbanu / AI-recipe-finder

Application Overview

This application helps users manage their health by securely storing medical history, lab reports, and personal profile information. Based on a patient’s conditions, it generates personalized healthy recipes using a recommendation engine integrated with the Gemini API. The goal is to provide actionable nutrition guidance while maintaining HIPAA compliance, data privacy, and secure storage. It also caches generated recipes for quick retrieval and seamless user experience.

How to install:

    terraform apply
    terrform destroy  #to destroy the infra

Recipe Finder Application:

1) Profile (CRUD + Database Reads/Writes)

Users can view and update profile information.

This workflow represents the most typical web-app traffic pattern: read and write operations to the database

2) Medical History (Uploads + Processing + AI Pipeline)

Users can upload lab reports and add medical conditions.

Once submitted, the backend processes the medical data and sends it to a recommendation engine, which then forwards structured…

View on GitHub

DEV Community