DEV Community

sudhesh G
sudhesh G

Posted on

DevOps RealWorld Series #1 --> Jenkins Pipelines Colliding on the Same Kubernetes Agent Pod

We recently hit a strange CI/CD failure pattern in our Kubernetes-based Jenkins setup. Under parallel load, multiple pipelines were triggered but builds started failing randomly after the first job completed.

At first glance, it looked like Jenkins instability.It wasn’t.

The real issue was Kubernetes pod scheduling and node resource pressure.

This post walks through the symptoms, investigation, root cause, and the production fix that stabilized our pipelines.

🧩 Environment Context
Our setup looked like this:

  • Jenkins running inside Kubernetes
  • Dynamic Kubernetes agents created per pipeline
  • Shared agent template across pipelines
  • Parallel builds enabled
  • Nodes with limited ephemeral storage
  • No pod spread rules defined

Expected behavior:
Each pipeline → separate agent pod → isolated execution.

Actual behavior:
Pipelines indirectly collided due to scheduling concentration.

🚨 Symptoms
When multiple pipelines ran at the same time:

  • Only one agent pod appeared initially
  • First pipeline completed successfully
  • Agent pod terminated afterward
  • Other pipelines failed waiting for executors
  • Jenkins logs showed executor loss
  • Kubernetes events showed resource pressure
  • Agent pods repeatedly scheduled onto the same node
  • This made Jenkins look unstable but Jenkins was not the failing layer.

🔍 Investigation Steps
We verified:

  • Jenkins executor configuration ✅
  • Kubernetes plugin pod templates ✅
  • Pipeline definitions ✅
  • Agent provisioning logs ✅
  • Pod lifecycle events ✅
  • Node describe output ✅

Key observation:
Agent pods were consistently landing on the same node under load.

That node showed:

  • Ephemeral storage pressure
  • Resource exhaustion warnings
  • Pod eviction events
  • Pods were being created correctly but placed poorly.

Root Cause
No scheduling distribution rules were defined for Jenkins agent pods.
Kubernetes scheduler packed multiple agent pods onto the same node.

That caused:

  • Rapid ephemeral storage consumption
  • Node pressure conditions
  • Pod termination after first job
  • Waiting pipelines losing executors and then we found that
  • This was a pod placement problem, not a Jenkins provisioning problem.

🛠 Fix Implemented
We updated the Jenkins agent pod template to include topology spread constraints and better resource sizing.

Added topology spread constraints

topologySpreadConstraints:
- maxSkew: 1
  topologyKey: kubernetes.io/hostname
  whenUnsatisfiable: ScheduleAnyway
  labelSelector:
    matchLabels:
      app: jenkins-agent
Enter fullscreen mode Exit fullscreen mode

Resource tuning

  • Increased ephemeral storage limits
  • Increased CPU & memory requests
  • Prevented node-level overload from agent bursts

Result After Fix
After rollout:

  • Agent pods distributed across nodes
  • No repeated scheduling concentration
  • No executor loss after first pipeline
  • Stable parallel builds
  • No unexpected agent pod termination
  • CI behavior became predictable again.

🎯 Key Lesson
When Jenkins pipelines fail under parallel load:
Do not inspect Jenkins alone.
Also check:

  • Pod scheduling patterns
  • Node resource pressure
  • Ephemeral storage limits
  • Pod distribution rules
  • Poor pod spread can look exactly like CI instability.

🔜 Next in This Series

DevOps Real-World Series #2 — Ephemeral Storage: The Silent CI/CD Pipeline Killer

Top comments (0)