Final Goal
Build an observability system where:
EC2 Alloy
↓ sends logs
ALB for Loki
↓ forwards traffic
Loki running in ECS/Fargate
↓ stores logs
Grafana running in ECS/Fargate
↓ visualizes logs
Final architecture:
ECS/Fargate:
- prod-app-service
- prometheus-service
- grafana-service
- loki-service
EC2:
- Alloy agent
- Node Exporter
- system logs
Networking:
- ALB in front of Loki
- Target group using IP target type
Part 1: ECS Cluster
1. Created ECS Cluster
Cluster name:
prod-observability
Launch type:
AWS Fargate
Region:
us-east-2
Purpose:
Run observability services as serverless containers.
Part 2: IAM Roles
2. Task Execution Role
Used:
ecsTaskExecutionRole
Purpose:
Allows ECS/Fargate to pull container images and send logs to CloudWatch.
Attached policy:
AmazonECSTaskExecutionRolePolicy
3. Task Role
Used:
ecsAppTaskRole
Purpose:
Allows application containers to call AWS services if needed.
For this lab, it was not heavily used.
Part 3: Demo Application Service
4. Created demo app task definition
Task family:
prod-observability
Container name:
demo-app
Final working image:
nginx:latest
Port:
80
Why we changed image:
Earlier image failed:
ghcr.io/brancz/prometheus-example-app:v0.5.0
Error:
CannotPullContainerError 403 Forbidden
Reason:
ECS could not pull from GitHub Container Registry anonymously.
Fix:
Use public nginx image.
5. Created demo app ECS service
Service name:
prod-app-service
Desired tasks:
1
Result:
Running
Part 4: Prometheus Service
6. Created Prometheus task definition
Task family:
prometheus
Container name:
prometheus
Image URI:
prom/prometheus:latest
Port:
9090
CloudWatch logs:
/ecs/prometheus-service
Stream prefix:
prometheus
7. Created Prometheus service
Service name:
prometheus-service-2
Desired tasks:
1
Result:
Running
Purpose:
Prometheus stores and queries metrics.
Important concept:
Prometheus = metrics database
Loki = logs database
Grafana = visualization
Part 5: Grafana Service
8. Created Grafana task definition
Task family:
grafana
Container name:
grafana
Image URI:
grafana/grafana:latest
Port:
3000
CloudWatch logs:
/ecs/grafana
Stream prefix:
grafana
9. Created Grafana service
Service name:
grafana-service
Desired tasks:
1
Result:
Running
Opened Grafana:
http://GRAFANA_PUBLIC_IP:3000
Default login:
admin
admin
Purpose:
Grafana visualizes metrics and logs.
Part 6: Loki Service
10. Created Loki task definition
Task family:
loki
Container name:
loki
Image URI:
grafana/loki:latest
Port:
3100
CloudWatch logs:
/ecs/loki
Stream prefix:
loki
11. Created Loki ECS service
Service name:
loki-service
Desired tasks:
1
Result:
Running
Purpose:
Loki receives and stores logs.
Part 7: Why Alloy Failed in Fargate
12. Tried Alloy in ECS/Fargate
Image used:
grafana/alloy:latest
Port:
12345
It kept failing with:
Rollback failed
Reason:
Alloy is not like nginx, Grafana, or Loki.
Alloy is an agent.
It needs a config file.
Without config, Alloy exits.
We tried:
run
run,/etc/alloy/fargate.alloy
empty command
But it failed because:
/etc/alloy/fargate.alloy did not exist in Fargate container.
Conclusion:
Alloy is easier on EC2 because EC2 has normal Linux filesystem and config files.
Part 8: Decision — Use EC2 for Alloy
13. Final architecture decision
We kept ECS because the lab teaches:
ECS
Fargate
task definitions
services
networking
service communication
load balancing
observability
But we moved Alloy to EC2 because:
Alloy needs config files.
EC2 is easier for agents.
EC2 gives filesystem access.
EC2 is better for troubleshooting.
Final decision:
ECS/Fargate:
- demo app
- Prometheus
- Grafana
- Loki
EC2:
- Alloy
- Node Exporter
Part 9: Existing EC2 Alloy Machine
14. Checked Alloy on EC2
Command:
alloy --version
Output showed:
alloy version v1.16.1
This confirmed:
Alloy is installed directly on Linux, not Docker.
15. Checked Alloy config
Config file:
/etc/alloy/config.alloy
It had:
local.file_match "system_logs" {
path_targets = [
{
__path__ = "/var/log/syslog",
job = "syslog",
},
{
__path__ = "/var/log/auth.log",
job = "auth",
},
{
__path__ = "/var/log/nginx/access.log",
job = "nginx_access",
},
{
__path__ = "/var/log/nginx/error.log",
job = "nginx_error",
},
]
}
loki.source.file "log_scrape" {
targets = local.file_match.system_logs.targets
forward_to = [loki.write.local.receiver]
}
This means Alloy collects:
/var/log/syslog
/var/log/auth.log
/var/log/nginx/access.log
/var/log/nginx/error.log
Part 10: Created ALB for Loki
16. Why ALB was needed
EC2 Alloy cannot use ECS internal service name like:
loki-service:3100
Because that name works only inside ECS networking.
So we created ALB in front of Loki.
Flow:
EC2 Alloy
↓
Loki ALB
↓
Loki ECS task
17. Created Application Load Balancer
ALB name:
loki-alb
Scheme:
Internet-facing
Type:
Application Load Balancer
VPC:
vpc-02703ab5833607268
Listener:
HTTP:3100
ALB DNS:
loki-alb-838622355.us-east-2.elb.amazonaws.com
Part 11: Created Target Group
18. Important mistake we fixed
First target group was created as:
Target type: Instance
That was wrong for Fargate.
Fargate requires:
Target type: IP
Because Fargate tasks use ENIs and private IPs.
19. Correct Target Group
Target group name:
Loki-target-gr
Target type:
IP
Protocol:
HTTP
Port:
3100
Health check path:
/ready
VPC:
vpc-02703ab5833607268
Part 12: Registered Loki Task IP
20. Found Loki task private IP
Went to:
ECS → prod-observability → Tasks → Loki task → Networking
Found:
Private IP: 172.31.12.112
21. Registered target
Went to:
EC2 → Target Groups → Loki-target-gr → Register targets
Added:
IP: 172.31.12.112
Port: 3100
Result:
Healthy = 1
This confirmed:
ALB can reach Loki ECS task.
Part 13: Fixed 503 Error
22. Browser showed:
503 Service Temporarily Unavailable
Meaning:
ALB works, but no healthy target was registered.
After registering:
172.31.12.112:3100
Target became:
Healthy
Then ALB worked.
Part 14: Updated Alloy Config
23. Old Alloy Loki URL
Old config pointed to local Loki:
url = "http://localhost:3100/loki/api/v1/push"
That meant:
Send logs to Loki running on same EC2.
But now Loki is in ECS.
24. New Alloy Loki URL
Changed to:
url = "http://loki-alb-838622355.us-east-2.elb.amazonaws.com:3100/loki/api/v1/push"
Full section:
loki.write "local" {
endpoint {
url = "http://loki-alb-838622355.us-east-2.elb.amazonaws.com:3100/loki/api/v1/push"
}
}
Saved file:
CTRL + O
ENTER
CTRL + X
Restarted Alloy:
sudo systemctl restart alloy
Checked status:
sudo systemctl status alloy
Expected:
active (running)
Part 15: Grafana Loki Datasource
25. In Grafana
Went to:
Connections → Data sources
Selected:
Loki
URL:
http://loki-alb-838622355.us-east-2.elb.amazonaws.com:3100
Then:
Save & Test
Part 16: Query Logs in Grafana
26. First mistake
You queried this inside Prometheus:
{job="syslog"}
That showed:
No data
Reason:
Prometheus is for metrics.
Loki is for logs.
27. Correct query
Changed datasource from:
Prometheus
to:
Loki
Then ran:
{job="syslog"}
Result:
84 lines displayed
Logs appeared successfully.
Final Working Queries
Use these in Grafana Explore with Loki datasource:
{job="syslog"}
{job="auth"}
{job="nginx_access"}
{job="nginx_error"}
Final Working Architecture
EC2 Machine
├── Alloy
├── Node Exporter
└── Linux logs
↓
Application Load Balancer
↓
ECS Fargate Loki Service
↓
Grafana Loki Datasource
↓
Grafana Explore
Full observability stack:
ECS/Fargate:
├── prod-app-service
├── prometheus-service-2
├── grafana-service
└── loki-service
EC2:
└── Alloy agent
What Each Tool Does
ECS
Runs containers as services.
Fargate
Serverless compute for containers.
Demo App
Application container.
Prometheus
Stores metrics.
Loki
Stores logs.
Grafana
Visualizes logs and metrics.
Alloy
Collects logs and sends them to Loki.
ALB
Exposes Loki from ECS so EC2 Alloy can send logs to it.
Target Group
Connects ALB to Loki ECS task private IP.
Most Important Lessons
1. Fargate hides the server
That is why agents like Alloy are harder in Fargate.
2. Alloy needs config
It cannot run empty.
3. Fargate target groups must use IP
Not Instance.
4. 503 from ALB means no healthy target
The ALB was working, but target group was empty/unhealthy.
5. Prometheus is not for logs
Prometheus = metrics.
6. Loki is for logs
Loki + Grafana Explore shows log lines.
7. Hybrid architecture is realistic
EC2 agent → ALB → ECS Loki → Grafana is a real SRE-style pattern.
Final Success Proof
Grafana showed:
84 lines displayed
For:
{job="syslog"}
That means the full pipeline works:
EC2 logs
→ Alloy
→ Loki ALB
→ ECS Loki
→ Grafana
This lab is complete.
Top comments (0)