DEV Community

Aisalkyn Aidarova
Aisalkyn Aidarova

Posted on

Full Lab: ECS Fargate Observability with EC2 Alloy

Final Goal

Build an observability system where:

EC2 Alloy
   ↓ sends logs
ALB for Loki
   ↓ forwards traffic
Loki running in ECS/Fargate
   ↓ stores logs
Grafana running in ECS/Fargate
   ↓ visualizes logs
Enter fullscreen mode Exit fullscreen mode

Final architecture:

ECS/Fargate:
- prod-app-service
- prometheus-service
- grafana-service
- loki-service

EC2:
- Alloy agent
- Node Exporter
- system logs

Networking:
- ALB in front of Loki
- Target group using IP target type
Enter fullscreen mode Exit fullscreen mode

Part 1: ECS Cluster

1. Created ECS Cluster

Cluster name:

prod-observability
Enter fullscreen mode Exit fullscreen mode

Launch type:

AWS Fargate
Enter fullscreen mode Exit fullscreen mode

Region:

us-east-2
Enter fullscreen mode Exit fullscreen mode

Purpose:

Run observability services as serverless containers.
Enter fullscreen mode Exit fullscreen mode

Part 2: IAM Roles

2. Task Execution Role

Used:

ecsTaskExecutionRole
Enter fullscreen mode Exit fullscreen mode

Purpose:

Allows ECS/Fargate to pull container images and send logs to CloudWatch.
Enter fullscreen mode Exit fullscreen mode

Attached policy:

AmazonECSTaskExecutionRolePolicy
Enter fullscreen mode Exit fullscreen mode

3. Task Role

Used:

ecsAppTaskRole
Enter fullscreen mode Exit fullscreen mode

Purpose:

Allows application containers to call AWS services if needed.
Enter fullscreen mode Exit fullscreen mode

For this lab, it was not heavily used.


Part 3: Demo Application Service

4. Created demo app task definition

Task family:

prod-observability
Enter fullscreen mode Exit fullscreen mode

Container name:

demo-app
Enter fullscreen mode Exit fullscreen mode

Final working image:

nginx:latest
Enter fullscreen mode Exit fullscreen mode

Port:

80
Enter fullscreen mode Exit fullscreen mode

Why we changed image:

Earlier image failed:

ghcr.io/brancz/prometheus-example-app:v0.5.0
Enter fullscreen mode Exit fullscreen mode

Error:

CannotPullContainerError 403 Forbidden
Enter fullscreen mode Exit fullscreen mode

Reason:

ECS could not pull from GitHub Container Registry anonymously.
Enter fullscreen mode Exit fullscreen mode

Fix:

Use public nginx image.
Enter fullscreen mode Exit fullscreen mode

5. Created demo app ECS service

Service name:

prod-app-service
Enter fullscreen mode Exit fullscreen mode

Desired tasks:

1
Enter fullscreen mode Exit fullscreen mode

Result:

Running
Enter fullscreen mode Exit fullscreen mode

Part 4: Prometheus Service

6. Created Prometheus task definition

Task family:

prometheus
Enter fullscreen mode Exit fullscreen mode

Container name:

prometheus
Enter fullscreen mode Exit fullscreen mode

Image URI:

prom/prometheus:latest
Enter fullscreen mode Exit fullscreen mode

Port:

9090
Enter fullscreen mode Exit fullscreen mode

CloudWatch logs:

/ecs/prometheus-service
Enter fullscreen mode Exit fullscreen mode

Stream prefix:

prometheus
Enter fullscreen mode Exit fullscreen mode

7. Created Prometheus service

Service name:

prometheus-service-2
Enter fullscreen mode Exit fullscreen mode

Desired tasks:

1
Enter fullscreen mode Exit fullscreen mode

Result:

Running
Enter fullscreen mode Exit fullscreen mode

Purpose:

Prometheus stores and queries metrics.
Enter fullscreen mode Exit fullscreen mode

Important concept:

Prometheus = metrics database
Loki = logs database
Grafana = visualization
Enter fullscreen mode Exit fullscreen mode

Part 5: Grafana Service

8. Created Grafana task definition

Task family:

grafana
Enter fullscreen mode Exit fullscreen mode

Container name:

grafana
Enter fullscreen mode Exit fullscreen mode

Image URI:

grafana/grafana:latest
Enter fullscreen mode Exit fullscreen mode

Port:

3000
Enter fullscreen mode Exit fullscreen mode

CloudWatch logs:

/ecs/grafana
Enter fullscreen mode Exit fullscreen mode

Stream prefix:

grafana
Enter fullscreen mode Exit fullscreen mode

9. Created Grafana service

Service name:

grafana-service
Enter fullscreen mode Exit fullscreen mode

Desired tasks:

1
Enter fullscreen mode Exit fullscreen mode

Result:

Running
Enter fullscreen mode Exit fullscreen mode

Opened Grafana:

http://GRAFANA_PUBLIC_IP:3000
Enter fullscreen mode Exit fullscreen mode

Default login:

admin
admin
Enter fullscreen mode Exit fullscreen mode

Purpose:

Grafana visualizes metrics and logs.
Enter fullscreen mode Exit fullscreen mode

Part 6: Loki Service

10. Created Loki task definition

Task family:

loki
Enter fullscreen mode Exit fullscreen mode

Container name:

loki
Enter fullscreen mode Exit fullscreen mode

Image URI:

grafana/loki:latest
Enter fullscreen mode Exit fullscreen mode

Port:

3100
Enter fullscreen mode Exit fullscreen mode

CloudWatch logs:

/ecs/loki
Enter fullscreen mode Exit fullscreen mode

Stream prefix:

loki
Enter fullscreen mode Exit fullscreen mode

11. Created Loki ECS service

Service name:

loki-service
Enter fullscreen mode Exit fullscreen mode

Desired tasks:

1
Enter fullscreen mode Exit fullscreen mode

Result:

Running
Enter fullscreen mode Exit fullscreen mode

Purpose:

Loki receives and stores logs.
Enter fullscreen mode Exit fullscreen mode

Part 7: Why Alloy Failed in Fargate

12. Tried Alloy in ECS/Fargate

Image used:

grafana/alloy:latest
Enter fullscreen mode Exit fullscreen mode

Port:

12345
Enter fullscreen mode Exit fullscreen mode

It kept failing with:

Rollback failed
Enter fullscreen mode Exit fullscreen mode

Reason:

Alloy is not like nginx, Grafana, or Loki.
Alloy is an agent.
It needs a config file.
Without config, Alloy exits.
Enter fullscreen mode Exit fullscreen mode

We tried:

run
run,/etc/alloy/fargate.alloy
empty command
Enter fullscreen mode Exit fullscreen mode

But it failed because:

/etc/alloy/fargate.alloy did not exist in Fargate container.
Enter fullscreen mode Exit fullscreen mode

Conclusion:

Alloy is easier on EC2 because EC2 has normal Linux filesystem and config files.
Enter fullscreen mode Exit fullscreen mode

Part 8: Decision — Use EC2 for Alloy

13. Final architecture decision

We kept ECS because the lab teaches:

ECS
Fargate
task definitions
services
networking
service communication
load balancing
observability
Enter fullscreen mode Exit fullscreen mode

But we moved Alloy to EC2 because:

Alloy needs config files.
EC2 is easier for agents.
EC2 gives filesystem access.
EC2 is better for troubleshooting.
Enter fullscreen mode Exit fullscreen mode

Final decision:

ECS/Fargate:
- demo app
- Prometheus
- Grafana
- Loki

EC2:
- Alloy
- Node Exporter
Enter fullscreen mode Exit fullscreen mode

Part 9: Existing EC2 Alloy Machine

14. Checked Alloy on EC2

Command:

alloy --version
Enter fullscreen mode Exit fullscreen mode

Output showed:

alloy version v1.16.1
Enter fullscreen mode Exit fullscreen mode

This confirmed:

Alloy is installed directly on Linux, not Docker.
Enter fullscreen mode Exit fullscreen mode

15. Checked Alloy config

Config file:

/etc/alloy/config.alloy
Enter fullscreen mode Exit fullscreen mode

It had:

local.file_match "system_logs" {
  path_targets = [
    {
      __path__ = "/var/log/syslog",
      job      = "syslog",
    },
    {
      __path__ = "/var/log/auth.log",
      job      = "auth",
    },
    {
      __path__ = "/var/log/nginx/access.log",
      job      = "nginx_access",
    },
    {
      __path__ = "/var/log/nginx/error.log",
      job      = "nginx_error",
    },
  ]
}

loki.source.file "log_scrape" {
  targets    = local.file_match.system_logs.targets
  forward_to = [loki.write.local.receiver]
}
Enter fullscreen mode Exit fullscreen mode

This means Alloy collects:

/var/log/syslog
/var/log/auth.log
/var/log/nginx/access.log
/var/log/nginx/error.log
Enter fullscreen mode Exit fullscreen mode

Part 10: Created ALB for Loki

16. Why ALB was needed

EC2 Alloy cannot use ECS internal service name like:

loki-service:3100
Enter fullscreen mode Exit fullscreen mode

Because that name works only inside ECS networking.

So we created ALB in front of Loki.

Flow:

EC2 Alloy
   ↓
Loki ALB
   ↓
Loki ECS task
Enter fullscreen mode Exit fullscreen mode

17. Created Application Load Balancer

ALB name:

loki-alb
Enter fullscreen mode Exit fullscreen mode

Scheme:

Internet-facing
Enter fullscreen mode Exit fullscreen mode

Type:

Application Load Balancer
Enter fullscreen mode Exit fullscreen mode

VPC:

vpc-02703ab5833607268
Enter fullscreen mode Exit fullscreen mode

Listener:

HTTP:3100
Enter fullscreen mode Exit fullscreen mode

ALB DNS:

loki-alb-838622355.us-east-2.elb.amazonaws.com
Enter fullscreen mode Exit fullscreen mode

Part 11: Created Target Group

18. Important mistake we fixed

First target group was created as:

Target type: Instance
Enter fullscreen mode Exit fullscreen mode

That was wrong for Fargate.

Fargate requires:

Target type: IP
Enter fullscreen mode Exit fullscreen mode

Because Fargate tasks use ENIs and private IPs.


19. Correct Target Group

Target group name:

Loki-target-gr
Enter fullscreen mode Exit fullscreen mode

Target type:

IP
Enter fullscreen mode Exit fullscreen mode

Protocol:

HTTP
Enter fullscreen mode Exit fullscreen mode

Port:

3100
Enter fullscreen mode Exit fullscreen mode

Health check path:

/ready
Enter fullscreen mode Exit fullscreen mode

VPC:

vpc-02703ab5833607268
Enter fullscreen mode Exit fullscreen mode

Part 12: Registered Loki Task IP

20. Found Loki task private IP

Went to:

ECS → prod-observability → Tasks → Loki task → Networking
Enter fullscreen mode Exit fullscreen mode

Found:

Private IP: 172.31.12.112
Enter fullscreen mode Exit fullscreen mode

21. Registered target

Went to:

EC2 → Target Groups → Loki-target-gr → Register targets
Enter fullscreen mode Exit fullscreen mode

Added:

IP: 172.31.12.112
Port: 3100
Enter fullscreen mode Exit fullscreen mode

Result:

Healthy = 1
Enter fullscreen mode Exit fullscreen mode

This confirmed:

ALB can reach Loki ECS task.
Enter fullscreen mode Exit fullscreen mode

Part 13: Fixed 503 Error

22. Browser showed:

503 Service Temporarily Unavailable
Enter fullscreen mode Exit fullscreen mode

Meaning:

ALB works, but no healthy target was registered.
Enter fullscreen mode Exit fullscreen mode

After registering:

172.31.12.112:3100
Enter fullscreen mode Exit fullscreen mode

Target became:

Healthy
Enter fullscreen mode Exit fullscreen mode

Then ALB worked.


Part 14: Updated Alloy Config

23. Old Alloy Loki URL

Old config pointed to local Loki:

url = "http://localhost:3100/loki/api/v1/push"
Enter fullscreen mode Exit fullscreen mode

That meant:

Send logs to Loki running on same EC2.
Enter fullscreen mode Exit fullscreen mode

But now Loki is in ECS.


24. New Alloy Loki URL

Changed to:

url = "http://loki-alb-838622355.us-east-2.elb.amazonaws.com:3100/loki/api/v1/push"
Enter fullscreen mode Exit fullscreen mode

Full section:

loki.write "local" {
  endpoint {
    url = "http://loki-alb-838622355.us-east-2.elb.amazonaws.com:3100/loki/api/v1/push"
  }
}
Enter fullscreen mode Exit fullscreen mode

Saved file:

CTRL + O
ENTER
CTRL + X
Enter fullscreen mode Exit fullscreen mode

Restarted Alloy:

sudo systemctl restart alloy
Enter fullscreen mode Exit fullscreen mode

Checked status:

sudo systemctl status alloy
Enter fullscreen mode Exit fullscreen mode

Expected:

active (running)
Enter fullscreen mode Exit fullscreen mode

Part 15: Grafana Loki Datasource

25. In Grafana

Went to:

Connections → Data sources
Enter fullscreen mode Exit fullscreen mode

Selected:

Loki
Enter fullscreen mode Exit fullscreen mode

URL:

http://loki-alb-838622355.us-east-2.elb.amazonaws.com:3100
Enter fullscreen mode Exit fullscreen mode

Then:

Save & Test
Enter fullscreen mode Exit fullscreen mode

Part 16: Query Logs in Grafana

26. First mistake

You queried this inside Prometheus:

{job="syslog"}
Enter fullscreen mode Exit fullscreen mode

That showed:

No data
Enter fullscreen mode Exit fullscreen mode

Reason:

Prometheus is for metrics.
Loki is for logs.
Enter fullscreen mode Exit fullscreen mode

27. Correct query

Changed datasource from:

Prometheus
Enter fullscreen mode Exit fullscreen mode

to:

Loki
Enter fullscreen mode Exit fullscreen mode

Then ran:

{job="syslog"}
Enter fullscreen mode Exit fullscreen mode

Result:

84 lines displayed
Enter fullscreen mode Exit fullscreen mode

Logs appeared successfully.


Final Working Queries

Use these in Grafana Explore with Loki datasource:

{job="syslog"}
Enter fullscreen mode Exit fullscreen mode
{job="auth"}
Enter fullscreen mode Exit fullscreen mode
{job="nginx_access"}
Enter fullscreen mode Exit fullscreen mode
{job="nginx_error"}
Enter fullscreen mode Exit fullscreen mode

Final Working Architecture

EC2 Machine
  ├── Alloy
  ├── Node Exporter
  └── Linux logs
        ↓
Application Load Balancer
        ↓
ECS Fargate Loki Service
        ↓
Grafana Loki Datasource
        ↓
Grafana Explore
Enter fullscreen mode Exit fullscreen mode

Full observability stack:

ECS/Fargate:
  ├── prod-app-service
  ├── prometheus-service-2
  ├── grafana-service
  └── loki-service

EC2:
  └── Alloy agent
Enter fullscreen mode Exit fullscreen mode

What Each Tool Does

ECS

Runs containers as services.

Fargate

Serverless compute for containers.

Demo App

Application container.

Prometheus

Stores metrics.

Loki

Stores logs.

Grafana

Visualizes logs and metrics.

Alloy

Collects logs and sends them to Loki.

ALB

Exposes Loki from ECS so EC2 Alloy can send logs to it.

Target Group

Connects ALB to Loki ECS task private IP.


Most Important Lessons

1. Fargate hides the server

That is why agents like Alloy are harder in Fargate.

2. Alloy needs config

It cannot run empty.

3. Fargate target groups must use IP

Not Instance.

4. 503 from ALB means no healthy target

The ALB was working, but target group was empty/unhealthy.

5. Prometheus is not for logs

Prometheus = metrics.

6. Loki is for logs

Loki + Grafana Explore shows log lines.

7. Hybrid architecture is realistic

EC2 agent → ALB → ECS Loki → Grafana is a real SRE-style pattern.


Final Success Proof

Grafana showed:

84 lines displayed
Enter fullscreen mode Exit fullscreen mode

For:

{job="syslog"}
Enter fullscreen mode Exit fullscreen mode

That means the full pipeline works:

EC2 logs
→ Alloy
→ Loki ALB
→ ECS Loki
→ Grafana
Enter fullscreen mode Exit fullscreen mode

This lab is complete.

Top comments (0)