DEV Community

Vinay Veerappaji
Vinay Veerappaji

Posted on

My Kubernetes Background Jobs Were Running 3x (Here's the Simple Fix)

Had a background job in Kubernetes. Super simple - check for pending work every 5 minutes:

setInterval(() => {
  checkPendingJobs(); // Send emails, process data, cleanup tasks
}, 5 * 60 * 1000);
Enter fullscreen mode Exit fullscreen mode

Deployed to 3 pods for "high availability." Big mistake.

What Actually Happened

Monday morning: Users getting duplicate emails, database CPU spiking. Checking logs:

Pod-A: Processing job #123 - sending welcome email
Pod-B: Processing job #123 - sending welcome email  
Pod-C: Processing job #123 - sending welcome email
Enter fullscreen mode Exit fullscreen mode

All 3 pods wake up at the exact same time, query the database, find the same pending jobs, and process them in parallel.

Classic race condition.

How I Fixed It (Wrong Way First)

Tried database locks. Tried Redis locks. Tried leader election. All complex, all brittle.

Then realized something obvious: Why am I trying to coordinate 3 workers when I only need 1 job to run?

The Simple Solution

Deleted the setInterval. Made it an HTTP endpoint:

app.post('/jobs/process', async (req, res) => {
  await checkPendingJobs();
  res.json({status: 'done'});
});
Enter fullscreen mode Exit fullscreen mode

Added external scheduler (GCP Cloud Scheduler):

gcloud scheduler jobs create http my-job \
  --schedule="*/5 * * * *" \
  --uri="https://myapp.com/jobs/process"
Enter fullscreen mode Exit fullscreen mode

That's it. Scheduler hits the endpoint, load balancer picks one pod, job runs once.

Why This Beats Locking

  • No race conditions possible - only one request
  • Way less code - no coordination logic
  • Better reliability - managed scheduler vs app timers
  • Easier debugging - clear request logs
  • Works everywhere - AWS EventBridge, cron, whatever

The Pattern

Stop asking "How do I coordinate multiple instances?"

Start asking "Do I need multiple instances doing this?"

Most background jobs are perfect for external triggers:

  • File processing → storage events
  • Cache warming → deployment hooks
  • Cleanup tasks → schedulers
  • Health checks → external monitoring

Been running this for months. Zero duplicate jobs. Zero 2 AM alerts.

Sometimes the best distributed systems solution is... not distributing the work.


Similar war stories? Drop them below.

Top comments (0)