My CI pipeline kept rejecting pushes. Same error, every single run.
error: failed to push some refs to 'https://github.com/...'
Updates were rejected because the remote contains work that
you do not have locally.
The first time it happened I thought it was a simple git issue. Pull, rebase, push. Done.
It wasn't done. It happened again on the next run. And the one after that.
This is the story of what was actually broken, why a retry loop didn't fix it, and the architectural change that finally solved it.
What the Pipeline Was Doing
The CI pipeline builds 11 Docker images — one per microservice — and pushes them to Amazon ECR. After each build, it updates a values.yaml file in the repo with the new image tag, commits the change, and pushes it back to Git. ArgoCD watches that file and deploys automatically when it changes.
Here is a simplified version of what each job was doing:
- name: Build, tag, and push image to ECR
run: |
docker build -t "$IMAGE_URI" ./src/${{ matrix.service }}/
docker push "$IMAGE_URI"
- name: Update image tag in Helm values
run: |
yq -i ".${{ matrix.service }}.image.tag = \"${IMAGE_TAG}\"" \
helm-chart/values.yaml
git add helm-chart/values.yaml
git commit -m "ci: update ${{ matrix.service }} image to ${IMAGE_TAG}"
git push
Looks reasonable. The problem is that matrix strategy runs all 11 jobs in parallel.
The Actual Problem: A Race Condition
Here is what happens when 11 jobs all run at the same time:
- All 11 jobs check out the repo at the same commit
- All 11 build their image and push to ECR
- All 11 try to update
values.yamland push at roughly the same time - The first job to push wins
- Every other job gets rejected because the remote has moved forward
! [rejected] vivian -> vivian (fetch first)
error: failed to push some refs
hint: Updates were rejected because the remote contains work
hint: that you do not have locally.
This is a classic race condition. Multiple writers, one file, no coordination.
The First Fix Attempt: Retry with Rebase
The first thing I tried was adding a retry loop. Fetch the latest, rebase, try again. If it fails, wait a random number of seconds and retry up to 5 times.
- name: Update image tag in Helm values
run: |
for i in 1 2 3 4 5; do
git fetch origin
git rebase origin/${{ github.ref_name }}
yq -i ".${{ matrix.service }}.image.tag = \"${IMAGE_TAG}\"" \
helm-chart/values.yaml
git add helm-chart/values.yaml
git diff --staged --quiet && echo "No changes to commit" && exit 0
git commit -m "ci: update ${{ matrix.service }} image to ${IMAGE_TAG}"
git push && echo "Push succeeded" && exit 0
echo "Push failed, attempt $i of 5. Retrying..."
sleep $((RANDOM % 10 + 5))
done
echo "All push attempts failed"
exit 1
This helped but didn't fully solve it. During the rebase on retry, two jobs would hit a merge conflict on the same file because both had modified values.yaml in different ways.
CONFLICT (content): Merge conflict in helm-chart/values.yaml
error: could not apply fad9a46...
A retry loop treats the symptom. The real problem is the design.
The Real Fix: One Writer, Not Eleven
The root cause is that 11 jobs should never be writing to the same file at the same time. The solution is to separate the concerns completely.
Build jobs do one thing: build and push the image.
A single downstream job, running only after all 11 builds finish, updates values.yaml once with all the new tags in a single commit.
Here is the redesigned workflow:
jobs:
build-and-push:
name: Build ${{ matrix.service }}
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
service:
- frontend
- cartservice
- productcatalogservice
- currencyservice
- paymentservice
- shippingservice
- emailservice
- checkoutservice
- recommendationservice
- adservice
- loadgenerator
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
token: ${{ secrets.GITHUB_TOKEN }}
fetch-depth: 0
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v2
- name: Build, tag, and push image to ECR
env:
IMAGE_TAG: ${{ github.sha }}
run: |
docker build -t "$IMAGE_URI" ./src/${{ matrix.service }}/
docker push "$IMAGE_URI"
update-helm-values:
name: Update Helm values
runs-on: ubuntu-latest
needs: build-and-push
if: github.ref == 'refs/heads/main' || github.ref == 'refs/heads/vivian'
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
token: ${{ secrets.GITHUB_TOKEN }}
fetch-depth: 0
- name: Install yq
run: |
sudo wget -qO /usr/local/bin/yq \
https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64
sudo chmod +x /usr/local/bin/yq
- name: Update all image tags in Helm values
env:
IMAGE_TAG: ${{ github.sha }}
run: |
services=(
frontend cartservice productcatalogservice currencyservice
paymentservice shippingservice emailservice checkoutservice
recommendationservice adservice loadgenerator
)
for service in "${services[@]}"; do
yq -i ".$service.image.tag = \"${IMAGE_TAG}\"" \
helm-chart/values.yaml
echo "Updated $service to ${IMAGE_TAG}"
done
- name: Commit and push updated values
env:
IMAGE_TAG: ${{ github.sha }}
run: |
git config user.name "GitHub Actions Bot"
git config user.email "actions@github.com"
git add helm-chart/values.yaml
git diff --staged --quiet && echo "No changes to commit" && exit 0
git commit -m "ci: update all images to ${IMAGE_TAG}"
git push origin ${{ github.ref_name }}
The key line is needs: build-and-push. This tells GitHub Actions to wait until every single build job has completed successfully before running the update job.
What Changed and Why It Works
Before: 11 jobs, each writing to values.yaml as soon as their build finished. No coordination. First push wins, rest fail.
After: 11 jobs build and push images. They do nothing else. One job runs after all of them finish, loops through all services, updates every tag in a single pass, commits once, pushes once.
One writer. No conflicts. Clean history.
The full workflow history tells the story clearly — the red X runs on the left where the old design kept failing, and the green checkmarks after the fix went in.
The Broader Lesson
Race conditions in CI pipelines are easy to miss because each individual step looks correct. The build step is correct. The push step is correct. The commit step is correct. But the system design is wrong.
When you have parallel jobs touching shared state, you need to ask: who owns this resource? In this case, values.yaml should have exactly one writer. Once I framed it that way, the fix was obvious.
If you are building a similar GitOps pipeline with GitHub Actions and Helm, separate your build jobs from your manifest update job from the start. It saves you a frustrating debugging session later.
What Comes Next
This pipeline feeds directly into ArgoCD, which watches values.yaml and syncs the cluster automatically when the file changes. In the next post I'll walk through setting up ArgoCD on EKS, connecting it to the repo, and the pod scheduling problem that showed up once everything was deployed.
This is part of an ongoing series documenting a full DevOps project built on Google's Online Boutique microservices demo, deployed to AWS EKS with Terraform, GitHub Actions, ArgoCD, Helm, Prometheus, and Grafana.



Top comments (0)