Discussion on: Apply KISS to Infrastructure

View post

the idea is fine, but the lack of knowledge about ECS and downtime bothers me. you should have NO downtimes if you're doing it right

Gregory Ledray • Oct 19 '21 • Edited

You're right, this assumes you are OK with prod downtime during deployment. I am OK with temporary prod downtime because in my front end code I have a request wrapper which implements both retries and a call to the next best environment if prod is down. For example, if I try to reach example.com/api/a and it is unreachable, the code then tries to reach staging.example.com/api/a, which must be working or else I wouldn't have deployed to prod. Obviously though this requires additional setup I didn't touch on during this post, isn't always practical, etc.

I wish I knew a way to implement this easily in AWS on the networking side (perhaps API Gateway has a way to call endpoint B if endpoint A's response fails??) but as you point out, I don't know how.

Tizi • Oct 19 '21

You can allow ECS to run an additional server while deploying, so it creates a new instance, drains the connections to the old one, then kill it: stackoverflow.com/questions/407311...

Gregory Ledray • Oct 20 '21

This is good to know. I have no doubt that if I understood ECS better I could do deployments with zero downtime. But after spending dozens of hours debugging ECS only to realize the problem wasn't with ECS, it was with my VPC not having DNS set up properly, I've basically lost confidence in AWS documentation and debuggability and I'm trying out simpler solutions like the one in this post.