Recently, I found a strange Pod behavior in Google Kubernetes Engine - GKE
When Rolling-out deployment. Most pods are healthy and up, but some pod error with CrashLoopBack.
Digging down to Pod Event, everything good. Except unhealthy. Going more to The log of unhealthy pod. Some hint is showing
Error: Could not load the default credentials. Browse to https://cloud.google.com/docs/authentication/getting-started for more information.
at GoogleAuth.getApplicationDefaultAsync (/usr/src/app/node_modules/google-auth-library/build/src/auth/googleauth.js:183:19)
We found some interesting fix by adding Environment variable to pod.
DETECT_GCP_RETRIES=3
🎉🎉🎉 Now pods are fully up without error.
Cause of this issue is when rolling deployments, multiple pods made request to GCP for authentication, sometime it timeout without retry then fail.
Top comments (0)