My GitHub Actions deploy workflow was 87 lines of YAML.
It had grown over 18 months from a clean 20-line file into something I was genuinely afraid to touch. It broke whenever a dependency updated. It had three hardcoded ARNs from an AWS account I was no longer using. It had a comment that said # TODO: fix this that had been there for 11 months.
Last month I deleted all 87 lines and replaced them with one command.
Here's exactly how I did it — and what I learned along the way.
The YAML graveyard
This was my deploy workflow. See if any of this feels familiar:
name: Deploy to Production
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Set up Node.js
uses: actions/setup-node@v3
with:
node-version: '20'
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run tests
run: npm test
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v1
- name: Build Docker image
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
IMAGE_TAG: ${{ github.sha }}
run: |
docker build -t $ECR_REGISTRY/my-app:$IMAGE_TAG .
docker push $ECR_REGISTRY/my-app:$IMAGE_TAG
echo "IMAGE=$ECR_REGISTRY/my-app:$IMAGE_TAG" >> $GITHUB_ENV
- name: Download task definition
run: |
aws ecs describe-task-definition --task-definition my-app \
--query taskDefinition > task-definition.json
- name: Update ECS task definition
id: task-def
uses: aws-actions/amazon-ecs-render-task-definition@v1
with:
task-definition: task-definition.json
container-name: my-app
image: ${{ env.IMAGE }}
- name: Deploy to ECS
uses: aws-actions/amazon-ecs-deploy-task-definition@v1
with:
task-definition: ${{ steps.task-def.outputs.task-definition }}
service: my-app-service
cluster: my-app-cluster
wait-for-service-stability: true
- name: Notify on failure
if: failure()
run: |
curl -X POST ${{ secrets.SLACK_WEBHOOK_URL }} \
-H 'Content-type: application/json' \
--data '{"text":"Deploy failed! Check Actions."}'
I wrote this. I'm not proud of it.
The real problem wasn't the YAML itself. The problem was everything hidden underneath the YAML:
- An ECR repository I had to provision manually
- An ECS cluster, service, and task definition I had to set up in the console
- IAM roles with the exact right permissions (I guessed wrong twice)
- A
DockerfileI maintained separately - AWS credentials rotated manually every 90 days
The pipeline was the visible part. The invisible part was 3 days of setup I did 18 months ago that I could no longer remember well enough to recreate.
When a new teammate joined and asked "how does deploy work?" — I sent them the workflow file and said "it's complicated."
That's not an answer. That's a warning sign.
The breaking point
In February, I switched from Node 18 to Node 20. The Docker build broke because my base image was pinned to node:18-alpine in three different places — the Dockerfile, the Actions workflow, and a .nvmrc file I had forgotten existed.
The fix took 45 minutes. The error message was not helpful. I fixed it by diffing my Dockerfile against a Stack Overflow answer from 2023.
Two weeks later, AWS deprecated the amazon-ecs-render-task-definition@v1 action. The pipeline broke silently — it ran, reported success, but the new image never actually deployed. I found out because a user filed a bug for something I had definitely already fixed.
That was the moment I decided: the pipeline is not worth maintaining.
What I tried first
I looked at Render and Railway. Both are good products. Neither deploys into my own AWS account — they provision their own infrastructure. My company has a compliance requirement that customer data stays in a customer-owned AWS environment. So those were out.
I looked at AWS CodePipeline. I wanted to solve complexity, not add more of it.
Then a colleague mentioned NEXUS AI. He described it as "a CLI that handles all the ECS stuff so you don't have to." I was skeptical. That's what everyone says.
The initial deploy
I installed the CLI:
npm install -g @nexusai/cli
nexus login
Then I pointed it at my repo:
nexus deploy source \
--repo https://github.com/myorg/my-app \
--name my-app \
--provider aws_ecs_fargate
I expected this to fail immediately. My expectations for new DevOps tools are calibrated by years of experience.
It didn't fail. Four and a half minutes later I got back a URL. The app was running. The same app. In my AWS account.
I checked the AWS console out of habit. There was an ECS cluster. A task definition. A service. An ECR repository with the image in it. NEXUS AI had provisioned all of it.
I had not written a Dockerfile. I had not configured any IAM roles. I had not touched the AWS console.
I sat with that for a moment.
What actually happens under the hood
Here's what nexus deploy source does, in order:
-
Reads your repo — detects the runtime from
package.json,requirements.txt,go.mod, etc. - Builds the container on NEXUS AI's build infrastructure — not your machine, not a GitHub runner
- Pushes the image to an ECR repository it provisions in your account
- Creates (or updates) the ECS infrastructure — cluster, task definition, service, load balancer
- Issues a TLS certificate via ACM and wires it to the load balancer
- Waits for health checks to pass before returning the live URL
Steps 3–5 are the 3 days of manual work I did 18 months ago. They now run in parallel and take about 3 minutes.
The new GitHub Actions workflow
Here's my deploy workflow today:
name: Deploy to Production
on:
push:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- run: npm test
deploy:
needs: test
runs-on: ubuntu-latest
steps:
- name: Deploy
run: nexus deploy redeploy --deployment-id ${{ secrets.NEXUSAI_DEPLOYMENT_ID }}
env:
NEXUSAI_TOKEN: ${{ secrets.NEXUSAI_TOKEN }}
24 lines, including name: fields and blank lines.
The deploy job has one step. It calls nexus deploy redeploy, which tells NEXUS AI to rebuild from the latest commit and roll it out with a rolling update. No Docker commands. No AWS credentials. No ECR. No ECS task definition wrangling.
I kept the test job. NEXUS AI doesn't replace your test suite — it replaces everything after tests pass.
Secrets and environment variables
Before, I had secrets in three places: GitHub Actions secrets (for the pipeline), AWS Secrets Manager (for the app), and a .env.example file that was always slightly out of date.
Now:
nexus secret set \
DATABASE_URL=postgres://user:pass@host/db \
STRIPE_SECRET_KEY=sk_live_... \
NODE_ENV=production
These are encrypted at rest and injected as environment variables when the container starts. The pipeline only needs NEXUSAI_TOKEN — one secret instead of seven.
After updating secrets, one command applies them:
nexus deploy redeploy --deployment-id <your-deployment-id>
Rollback
Old workflow rollback: figure out the previous image SHA, manually update the ECS task definition, trigger a new deployment, hope the old image hasn't been cleaned up by the ECR lifecycle policy.
New rollback:
nexus deploy rollback --deployment-id <your-deployment-id>
Reverts to the previous container image. Health checks run. Done. I've used this twice. Both times took under 90 seconds.
Redeployment speed
First deploy: ~4.5 minutes (infrastructure provisioning included).
Subsequent deploys: 60–90 seconds. Infrastructure is already provisioned, so it's just build → push → rolling update.
My old pipeline took 10–12 minutes. Most of that was the Docker build running on GitHub's shared runners plus the ECS service stability wait.
What I lost
Every tool has trade-offs. These are the real ones:
Less visibility into the build environment. With a Dockerfile I wrote, I knew exactly what was in the image. With source-based deployment, NEXUS AI generates the image. You can inspect it — nexus deploy logs gives the full build output — but you're not authoring the Dockerfile. For most apps this is fine. If you have specific system dependencies (custom C extensions, obscure shared libraries), test carefully.
The first deploy takes time. Infrastructure provisioning isn't instant. If you need sub-30-second cold deploys for some reason, this isn't that. But once infrastructure exists, redeployments are fast.
You're adding a dependency. NEXUS AI is now in your deploy path. Worth knowing.
The numbers
| Old workflow | New workflow | |
|---|---|---|
| Lines of YAML | 87 | 24 |
| Pipeline runtime | 10–12 min | 60–90 sec |
| AWS console setup | ~3 days (one-time) | 0 |
| Secrets locations | 3 | 1 |
| Rollback steps | ~6 manual steps | 1 command |
| Last random breakage | November | Hasn't happened |
How to try it
# Install
npm install -g @nexusai/cli
# Authenticate
nexus login
# First deploy — detects Node/Python/Go automatically, no Dockerfile needed
nexus deploy source \
--repo https://github.com/your/repo \
--name my-app \
--provider aws_ecs_fargate # or gcp_cloud_run, azure_container_apps
# Check status
nexus deploy status --deployment-id <id>
# Set environment variables
nexus secret set KEY=value KEY2=value2
# Redeploy (use this in CI)
nexus deploy redeploy --deployment-id <id>
# Rollback
nexus deploy rollback --deployment-id <id>
For CI/CD: add NEXUSAI_TOKEN and NEXUSAI_DEPLOYMENT_ID as secrets in your GitHub repo settings, then replace your deploy steps with the one-liner from the workflow above.
Final thought
The 87-line YAML file wasn't the real cost. The real cost was the cognitive overhead of owning it — the 45-minute debugging session when Node versions drifted, the silent failure when an Action was deprecated, the "it's complicated" I sent to a new teammate.
I don't miss any of that.
If you're maintaining a pipeline like the one I had, it's worth spending 20 minutes to find out how much of it you can delete.
Building something and want to compare notes? Drop it in the comments.
Top comments (0)