Skip to content

Why is my batch job queuing so long?

Queuing is inevitable when there are more jobs than resources. CSC uses the fair share prioritization algorithm of SLURM, which means that the more resources you've used recently, the lower the initial priority of your next jobs will be. The priority of the jobs will increase as they queue, and eventually they will run.

You can check the current situation of running and pending jobs with the squeue command.

In general, if you want your jobs to queue as little as possible, it is a good idea to reserve only those resources that the jobs really need.

Especially requesting too much memory will surely make your job queue for longer. Computing time on the other hand is not so critical here, unless the requested runtime is very short (less than 30 minutes or so), in which case the back-filler might find a slot for your job before it would run due to its actual priority. However, it is not advised to run too short jobs to minimize scheduling overhead.

If you've submitted a job to the longrun partition, and squeue tells you that the reason for your job to be in pending state is QOSGrpCpuLimit, it means that the partition is currently full. You will very likely get resources faster from another partition such as small. See Available batch job partitions.

You might want to check these FAQ entries as well: