If you work with HPC clusters, you likely use sbatch every day. You submit a script and expect it to run.
But that single command triggers a full workflow inside Slurm.
Understanding this internal flow helps you debug issues faster, optimize job performance, and better understand how your cluster behaves.
⸻
Step 1: Submitting the Job
When you run:
sbatch job.sh
You are not starting the job. You are submitting a request to Slurm.
The script includes:
- Resource requirements such as CPUs, memory, GPUs
- Job metadata like name and output paths
- The actual commands to execute
At this point, Slurm simply accepts the job.
⸻
Step 2: Communication with slurmctld
The sbatch command sends the job to the Slurm controller daemon, slurmctld.
This daemon:
- Assigns a Job ID
- Stores the job details
- Marks the job as PENDING
Nothing is running yet.
⸻
Step 3: Job Enters the Queue
The job is now placed in the scheduling queue.
evaluates:
- Job priority
- Fairshare usage
- Partition limits
- Resource availability
This determines when your job will run.
⸻
Step 4: Scheduling Decision
The scheduler continuously checks:
- Free nodes
- Resource fragmentation
- Backfill opportunities
If your job fits available resources, it gets selected. Otherwise, it stays pending.
⸻
Step 5: Resource Allocation
Once selected, Slurm:
- Assigns specific compute nodes
- Reserves CPUs, memory, and GPUs
- Changes job state to RUNNING
Now your job has allocated resources.
⸻
Step 6: Node-Level Communication
Each compute node runs a daemon called slurmd.
The controller sends job details to these nodes. The nodes prepare the execution environment.
⸻
Step 7: Job Execution via slurmstepd
On the compute node, slurmstepd is launched.
This process:
- Starts your application
- Manages job steps
- Handles output and error streams
- Enforces resource limits using cgroups
Your script begins executing here.
⸻
Step 8: Monitoring During Execution
While the job runs:
- Slurm tracks resource usage
- Logs are written to output files
- Accounting data is collected
You can monitor the job using:
squeue
scontrol show job <jobid>
⸻
Step 9: Job Completion
When the job finishes:
- slurmstepd exits
- Resources are released
- Temporary processes are cleaned up
The job state becomes COMPLETED, FAILED, TIMEOUT, or CANCELLED.
⸻
Step 10: Accounting and Logs
Finally:
- Job statistics are stored
- Output files remain available
- Usage data is recorded
You can check this using:
sacct
⸻
Full Flow Summary
- Submit job using sbatch
- slurmctld receives and queues it
- Scheduler evaluates priority
- Resources are allocated
- slurmd prepares nodes
- slurmstepd runs the job
- Job completes and resources are released
⸻
Common Misconceptions
“sbatch runs the job immediately”
It only submits the job.
“Pending means failure”
It usually means waiting for resources.
“Slurm just runs scripts”
It manages scheduling, allocation, execution, and cleanup.
⸻
Final Thought
sbatch may look simple, but it triggers a complete orchestration pipeline inside Slurm.
Once you understand this flow, debugging becomes easier, performance tuning improves, and cluster behavior becomes predictable.
⸻
Top comments (1)
Good breakdown. The handoff between slurmctld and slurmd is where most people get lost, so it’s nice seeing it laid out cleanly.