Vitaly Bicov

Posted on Jul 27

Kubernetes CronJob + Sidecar: A Love Story Gone Wrong (And How to Fix It)

#devops

I work at a large product company with a sprawling Kubernetes infrastructure. We run thousands of workloads, process massive amounts of data, and rely on automation to keep things running smoothly. So when we needed to execute a scheduled task in Kubernetes, using a CronJob seemed like a no-brainer.

At first, everything worked perfectly. Our CronJob fired up a Job, the task ran, completed, and exited cleanly.

But then, as always, the requirements changed:

• The script was opening too many database connections, so we added an SQL proxy to optimize connection pooling.

• The task became mission-critical, meaning we needed real-time monitoring to ensure failures wouldn’t go unnoticed.

• We added sidecar containers for these enhancements… and that’s when everything broke.

The Problem: CronJob Stopped Running

Kubernetes CronJobs work by creating Jobs, which spin up Pods to execute the actual work. A Job is considered complete only when all containers in the pod reach the Succeeded state.

Our main container was completing successfully, transitioning to Succeeded.

But our sidecar containers – SQL Proxy and Monitoring – were running indefinitely.

Since they never exited, the Job never finished, and the CronJob never scheduled the next execution.

Oops.

Why We Needed These Sidecars in the First Place

SQL Proxy: Our script was making hundreds of direct DB connections, overwhelming the database. Adding a SQL proxy helped pool connections, reducing the load.
Monitoring: The job wasn’t just some background task – it was mission-critical. If it failed silently, key business processes would break. We needed real-time logs and metrics to ensure it was running correctly.

So removing the sidecars wasn’t an option. Instead, we needed to teach them when to exit.

The Fix: Graceful Shutdown via File Signaling

We needed a way to tell the sidecars:

“Hey, the main job is done. Time to shut down.”

Here’s the new strategy:

The main container creates a file (/shared-data/done) in a shared volume at startup.
The sidecars monitor the file using inotifywait.
When the main job finishes, it deletes the file.
The sidecars detect this, terminate gracefully using SIGTERM, and exit.
The Job completes, and the CronJob can schedule the next run.

This Problem Isn’t New

This issue has been around for a while, and various workarounds have been proposed. There are even specialized projects like K8S Job Sidecar Terminator, which help manage sidecar shutdown for Kubernetes Jobs.

However, our approach is much simpler and doesn’t require any additional components – just a shared volume and a simple script inside the containers.

Implementation: The Helm Chart

Shared Volume
We’ll use an emptyDir volume so all containers in the pod can access the same file.

volumes:
  - name: shared-data
    emptyDir: {}

The Main Job Container

Our main job script will:

• Execute the actual task.

• Create /pod/terminated (or /pod/error) when it have finished.

containers:
  - name: main-job
    image: my-job-image:latest
    command:
      - /bin/sh
      - -c
      - |
        trap '[ $? -eq 0 ] && touch /pod/terminated || touch /pod/error' EXIT;
        while [ ! -S /tmp/proxysql.sock ]; do sleep 1; done;  # Check sidecar service availability
        ./run-my-task.sh
    volumeMounts:
      - name: shared-data
        mountPath: /pod

Sidecar Containers (SQL Proxy or Monitoring)

We’ll use the same graceful shutdown approach for both.

  - name: proxysql-sidecar
    image: proxysql/proxysql:latest
    command:
      - /bin/sh
      - -c
      - |
        proxysql -f -c "$CONFIG_PATH" & CHILD_PID=$!
        (while true; do if [ -f "/pod/terminated" ] || [ -f "/pod/error" ]; then kill $CHILD_PID; echo "Killed $CHILD_PID because the main container terminated."; fi; sleep 1; done) &
        wait $CHILD_PID
        if [ -f "/pod/error" ]; then
          echo "Job completed with error. Exiting...";
          exit 1;
        elif [ -f "/pod/terminated" ]; then
          echo "Job completed. Exiting...";
          exit 0;
        fi
    volumeMounts:
      - name: shared-data
        mountPath: /pod

We just need to deploy two instances of this container, one as SQL Proxy and the other as Monitoring, using different images or configurations if needed.

How It Works Now

The main job starts.
The sidecar starts the main process in background and termination file waiting loop in foreground.
The main job finishes, creates /pod/terminated (or /pod/error), and exits.
The sidecars detect the termination file, catch the signal, and exit cleanly.
The Job completes, and the CronJob schedules the next run.

No more stuck Jobs, no more missing CronJob executions.

Mission complete!

Final Thoughts

Adding sidecars to Jobs and CronJobs can be tricky, but with a bit of clever process signaling, it’s totally manageable.

If your CronJob mysteriously stops running, check if your sidecars are stuck in Running state. If they are, they’re the problem.

This approach – file signaling + SIGTERM traps – is a simple, reliable fix.

For alternative solutions and further discussion, check out these resources:

• Kubernetes GitHub Issue #25908

Hope this helps! Now go forth and deploy with confidence. 🚀

DEV Community

Kubernetes CronJob + Sidecar: A Love Story Gone Wrong (And How to Fix It)

Top comments (0)