Discussion on: Deploy Rails in Amazon ECS: Part 4 - Create an ECS Cluster

View post

Hi @jamby1100 ! Thank you for this great post!

I followed it step by step and had issues in the final step! When I connected through SSH and runned docker ps, I got an "amazon/amazon-ecs-agent:latest" instead of the image with the name I created in the ECS part.
The only difference was that, as I'm migrating from Elastic Beanstalk, I created DB restoring it from a snapshot, the app that I pushed was quite more complex than yours and I added a lot more ENV variables (I am using secrets.yml instead of MASTER_KEY).

Anyway, as I assumed I didn't need to create and migrate the database, I tried to open in the browser the DNS of the Load Balancer and got 503 error.
Any thoughts?

Again, thank you very much for all this series of posts, they are being very helpful!!

Raphael Jambalos • Nov 30 '19 • Edited

Hi Nico,

Thank you for following through the post series. I’m glad you found it helpful.

The amazon-ecs-agent container should always be there. But I think the problem is that your application’s container is not being deployed. There are a number of things you can check:

(1) check if the instance is detected by amazon ecs. Go to the clusters > yourcluster, and then go the ecs instances tab. If you see your ec2 instance there, this is not a problem

(2) check if there is a problem with the amazon ecs service. Go to the clusters > yourcluster > yourservice. In the events tab, you should see logs of what happened to the service. The most common problem there is the CPU and memory that your EC2 instance has is not enough for the CPU and memory demands of your containers. If this is so, you can either change the cpu and memory requirements of your containers (by editing the task definition and deploying that) or you can change the instance type of the ec2 instances in the fleet by going to clusters > myclusters and then go to the ecs instances tab.

Kindly let me know if this helped you resolve the issue :D

NicoBuchhalter • Dec 1 '19

Hi Raphael! Thank you for you quick response.

My EC2 instance is there (actually it was that path the one I followed to get to it and then connect by SSH, where I realized I didn't have the image and just the ECS agent.

In the events tab of the service, I can see repeated logs indicating:

service myservice has started 1 tasks: task task_id.
service myservice deregistered 1 targets in target-group default-target
service myservice has begun draining connections on 1 tasks.

I made a new revision on the task definition with 1024 for CPU and memory and then updated the revision in the service but nothing change, if I connect to the instance, there's still only one image, the ecs-agent. I don't think it's the EC2 instance as in EB I'm using a t2.small.

Any thoughts?

Raphael Jambalos • Dec 2 '19 • Edited

Hi Nico,

From your reply, it seems to me that the your app fails the health check. I recommend the following:

(i) I think your application has some configuration problems that you might need to address. For this approach, we would look at your application logs.

Go to yourcluster > yourservice, and go to the "Tasks" tab. Inside the tab, find "Task Status:" and then click "Stopped". You will see a list of tasks that have been stopped. Click one of those tasks. You will be redirected to a page with information about your task. Try playing around with this page. You will see a reason why the task was killed. If you don't find anything useful, go to the "logs" tab. You will see application logs from that specific task (assuming you did step 7.6-7.7 perfectly ).

(ii) If your app is perfectly normal, then I think the load balancer is killing your app even before it has the chance to turn on. For this approach, we would add a grace period.

Go to your service and find the option for the Health Check grace period. If it's zero, turn it to 300s. If it's more than zero, double it.

Kindly let me know if this helps!

NicoBuchhalter • Dec 3 '19

Hi Raphael! Thank you again for the help.

So, yes, clearly the tasks are being stopped but I can't understand the reason why. The logs don't give me any information. The only message I see in the task is "Essential container in task exited" and in the logs "Switch to inspect mode".
I tried to deploy a previous version of the app, to make sure it was stable, and still didn't create the image.
I changed the grace period and the image still didn't appear in the EC2 instance.

I think everything points to my app itslef malfuctioning, but locally I can run the image just fine and same with docker-compose.

I don't want to spam so much this thread so if you prefer, we can chat directly. Thank you a lot!

Raphael Jambalos • Dec 8 '19

Hi Nico,

Sorry for the delayed reply. It's been a long week at work. I think what you have to do is SSH to the EC2 instance directly. Then, do docker ps and find the container. If it's not there, do docker ps -a to see containers who recently died. Try to do docker log <chash> first.

Then, try to revive the container via docker start <chash>, and then you'd be able to go inside the container via docker exec -it <chash>. Then, explore your app. Look for log files that may contain clues to why your app failed.

If you prefer, you can also send screenshots of your task to me (via PM). [mycluster > myservice > tasks > click on one of the tasks].