How a missing GitHub Actions output caused an ImagePullBackOff and the engineering lessons it taught me about building reliable CI/CD pipelines
Engineering Debug Diaries is a series where I document real debugging sessions, production incidents, and the engineering lessons they taught me.
Every article focuses on the investigation, the root cause, and the practical changes I made afterwards.
The Incident
A deployment failed with:
ImagePullBackOff
My first thought was simple.
- Docker Hub is unavailable.
- Registry credentials expired.
- Kubernetes can't pull the image.
So I checked Docker Hub.
The image was there.
docker.io/mycompany/api-service:a1b2c3d-2026-05-24-17-52
Exactly where I expected it.
At that point I had two conflicting facts.
- The image existed.
- Kubernetes couldn't pull it.
One of my assumptions had to be wrong.
Time to follow the evidence.
The Investigation
Step 1 - Verify the Registry
First, I ruled out Docker Hub.
- Image exists
- Registry credentials are valid
- Build pipeline completed successfully
So the registry wasn't the problem.
Step 2 - Inspect the Pod
Next, I inspected the pod events.
kubectl describe pod api-service-staging-xxx -n staging
The output immediately caught my attention.
Failed to pull image
docker.io/mycompany/api-service:a1b2c3d-2026-05-24-17-54
Wait...
Docker Hub contained:
17:52
But Kubernetes was requesting:
17:54
Same commit.
Different timestamp.
Just two minutes apart.
Step 3 - Follow the Pipeline
The GitHub Actions logs finally explained the mismatch.
Build
Generated image tag:
a1b2c3d-2026-05-24-17-52
Push
Successfully pushed:
a1b2c3d-2026-05-24-17-52
Deploy
Deploying image:
a1b2c3d-2026-05-24-17-54
There it was.
The deployment was trying to pull an image that had never been pushed.
The Root Cause
The issue wasn't Docker.
It wasn't Kubernetes.
It was the GitHub Actions workflow.
The build job generated the image tag.
build:
outputs:
image-tag: ${{ steps.meta.outputs.tag }}
The push job correctly consumed that output.
push:
needs: build
But the deploy job only depended on push, while still trying to access build outputs.
deploy:
needs: push
env:
IMAGE_TAG: ${{ needs.build.outputs.image-tag }}
Since build wasn't a direct dependency, GitHub Actions returned an empty value.
No warning.
No error.
Just an empty variable.
The Silent Failure
The deployment script had a fallback.
if [ -z "${IMAGE_TAG:-}" ]; then
export IMAGE_TAG=$(git rev-parse --short HEAD)-$(date +%F-%H-%M)
fi
At first glance, it looks like defensive programming.
In reality, it hid the configuration mistake.
Instead of failing immediately, the deployment generated a brand new image tag using the current times
The sequence looked like this:
Kubernetes wasn't trying to pull the image that existed.
It was trying to pull an image that had never been built.
The Fix
The solution turned out to be surprisingly small.
First, I re-exported the build output.
push:
needs: build
outputs:
image-tag: ${{ needs.build.outputs.image-tag }}
Then I updated the deploy job to consume the output from push.
deploy:
needs: [build, push]
env:
IMAGE_TAG: ${{ needs.push.outputs.image-tag }}
Finally, I removed the silent fallback and added validation.
if [ -z "${IMAGE_TAG}" ]; then
echo "ERROR: IMAGE_TAG was not provided."
exit 1
fi
Now the pipeline fails immediately instead of silently deploying the wrong image.
What Changed Afterwards
This wasn't the most complicated bug I've ever debugged.
But it permanently changed how I think about CI/CD pipelines.
Since then, every pipeline I build follows a few simple rules.
1. Pass data explicitly
If one job produces important data, another job should explicitly consume it.
Don't rely on assumptions.
2. Validate critical inputs
Every deployment now checks that required variables exist before doing anything.
Missing data should stop the pipeline immediately.
3. Remove silent fallbacks
Fallback logic often hides configuration mistakes.
It's usually better to fail fast than continue with incorrect data.
4. Enable strict shell mode
set -euo pipefail
This small change catches undefined variables before they become production issues.
Lessons Learned
I spent several minutes investigating what looked like a Kubernetes problem.
In reality, Kubernetes was doing exactly what it had been told to do.
It was trying to pull an image that didn't exist.
The registry was healthy.
The cluster was healthy.
The mistake wasn't in the infrastructure.
It was in the automation feeding the infrastructure.
Just six characters in a YAML file.
- needs: push
+ needs: [build, push]
That small change fixed the deployment.
More importantly, it reinforced an engineering principle I'll continue to follow:
Automation should fail loudly, not silently.
Key Takeaways
- Make dependencies explicit.
- Pass outputs deliberately between jobs.
- Validate every critical variable.
- Prefer failing fast over silent recovery.
- Log the values your pipeline receives.
The infrastructure can be as complex as it needs to be.
The interfaces between pipeline stages should never be.
Final Thoughts
The more distributed our systems become, the more important the interfaces between them become.
Kubernetes wasn't the problem.
GitHub Actions wasn't the problem.
The problem was an assumption that data would magically appear where it was needed.
Now every pipeline I build follows one simple principle:
Make data flow explicit. Validate it. Fail fast if it's missing.
Small habits like these prevent surprisingly large production issues.
Thanks for reading Engineering Debug Diaries #1.
If you've ever tracked down a production issue that turned out to have a surprisingly simple root cause, I'd love to hear about it in the comments.
Happy debugging!

Top comments (0)