DevOps RealWorld Series #1 --> Jenkins Pipelines Colliding on the Same Kubernetes Agent Pod

#devops #cicd #kubernetes #jenkins

We recently hit a strange CI/CD failure pattern in our Kubernetes-based Jenkins setup. Under parallel load, multiple pipelines were triggered but builds started failing randomly after the first job completed.

At first glance, it looked like Jenkins instability.It wasn’t.

The real issue was Kubernetes pod scheduling and node resource pressure.

This post walks through the symptoms, investigation, root cause, and the production fix that stabilized our pipelines.

🧩 Environment Context
Our setup looked like this:

Jenkins running inside Kubernetes
Dynamic Kubernetes agents created per pipeline
Shared agent template across pipelines
Parallel builds enabled
Nodes with limited ephemeral storage
No pod spread rules defined

Expected behavior:
Each pipeline → separate agent pod → isolated execution.

Actual behavior:
Pipelines indirectly collided due to scheduling concentration.

🚨 Symptoms
When multiple pipelines ran at the same time:

Only one agent pod appeared initially
First pipeline completed successfully
Agent pod terminated afterward
Other pipelines failed waiting for executors
Jenkins logs showed executor loss
Kubernetes events showed resource pressure
Agent pods repeatedly scheduled onto the same node
This made Jenkins look unstable but Jenkins was not the failing layer.

🔍 Investigation Steps
We verified:

Jenkins executor configuration ✅
Kubernetes plugin pod templates ✅
Pipeline definitions ✅
Agent provisioning logs ✅
Pod lifecycle events ✅
Node describe output ✅

Key observation:
Agent pods were consistently landing on the same node under load.

That node showed:

Ephemeral storage pressure
Resource exhaustion warnings
Pod eviction events
Pods were being created correctly but placed poorly.

⚙ Root Cause
No scheduling distribution rules were defined for Jenkins agent pods.
Kubernetes scheduler packed multiple agent pods onto the same node.

That caused:

Rapid ephemeral storage consumption
Node pressure conditions
Pod termination after first job
Waiting pipelines losing executors and then we found that
This was a pod placement problem, not a Jenkins provisioning problem.

🛠 Fix Implemented
We updated the Jenkins agent pod template to include topology spread constraints and better resource sizing.

Added topology spread constraints

topologySpreadConstraints:
- maxSkew: 1
  topologyKey: kubernetes.io/hostname
  whenUnsatisfiable: ScheduleAnyway
  labelSelector:
    matchLabels:
      app: jenkins-agent

Resource tuning