slurm - Submitting job with multiple parallel steps - Stack Overflow

I am trying to set up jobs with multiple steps, essentially running many independent copies of the same

I am trying to set up jobs with multiple steps, essentially running many independent copies of the same program on a single core each time. I decided to use this approach instead of job arrays, as there is a limit of 20 jobs per user on the cluster I am accessing, while the maximum steps per job is set to the default 40000. An example batch script looks like:

#!/bin/sh

#SBATCH --partition parallel
#SBATCH --ntasks=100
#SBATCH --cpus-per-task=1
#SBATCH --mem=1000M
#SBATCH --job-name test
#SBATCH --output test.out

for ((i=0; i<$SLURM_NTASKS; i++))
do
    srun -N1 -n1 --mem-per-cpu=10M --exact -u sleep 1200 &
done

wait

From my understanding, the expected behavior of the above script is that after the allocation of resources, 100 job steps will be launched in parallel, each one taking up a single CPU. I have also included explicit memory allocation and the --exact flag, as suggested on a similar post to prevent the first srun from taking up the entire allocated memory: parallel but different Slurm srun job step invocations not working

Nevertheless, I still end up getting the message Step creation temporarily disabled, retrying (Requested nodes are busy), after a few (10-20) job steps start running. Since all necessary resources are being allocated and memory is being evenly distributed (verified by checking with sacct), what could be preventing all the jobs from running at the same time?

PS. I am adding below a typical sacct output. Everything seems to follow the expected behavior. The only things that are not clear to me are:

  1. The fact that the .batch step seems to take up many CPUs and more memory than is actually allocated to it
  2. The fact that even though steps are allocated on 5 nodes, almost all of them are concentrated in only one node. This is the typical behavior of all my tests so far, so I cannot consider it a coincidence.
          JobID     MaxRSS                                AllocTRES             NodeList 
--------------- ---------- ---------------------------------------- -------------------- 
         624245                billing=441,cpu=100,mem=1500M,node=5  ibiscohpc-wn[26-30] 
   624245.batch    408612K                   cpu=22,mem=330M,node=1       ibiscohpc-wn26 
       624245.0       632K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
       624245.1       632K                     cpu=1,mem=10M,node=1       ibiscohpc-wn27 
       624245.2       628K                     cpu=1,mem=10M,node=1       ibiscohpc-wn28 
       624245.3       632K                     cpu=1,mem=10M,node=1       ibiscohpc-wn29 
       624245.4       632K                     cpu=1,mem=10M,node=1       ibiscohpc-wn30 
       624245.5       632K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
       624245.6       636K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
       624245.7       628K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
       624245.8       628K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
       624245.9       636K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.10       632K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.11       632K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.12       628K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.13       632K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.14       632K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.15       632K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.16       628K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.17       632K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.18       628K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.19       632K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.20       628K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.21       632K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.22       636K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.23       628K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.24       628K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.25       628K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 

I am trying to set up jobs with multiple steps, essentially running many independent copies of the same program on a single core each time. I decided to use this approach instead of job arrays, as there is a limit of 20 jobs per user on the cluster I am accessing, while the maximum steps per job is set to the default 40000. An example batch script looks like:

#!/bin/sh

#SBATCH --partition parallel
#SBATCH --ntasks=100
#SBATCH --cpus-per-task=1
#SBATCH --mem=1000M
#SBATCH --job-name test
#SBATCH --output test.out

for ((i=0; i<$SLURM_NTASKS; i++))
do
    srun -N1 -n1 --mem-per-cpu=10M --exact -u sleep 1200 &
done

wait

From my understanding, the expected behavior of the above script is that after the allocation of resources, 100 job steps will be launched in parallel, each one taking up a single CPU. I have also included explicit memory allocation and the --exact flag, as suggested on a similar post to prevent the first srun from taking up the entire allocated memory: parallel but different Slurm srun job step invocations not working

Nevertheless, I still end up getting the message Step creation temporarily disabled, retrying (Requested nodes are busy), after a few (10-20) job steps start running. Since all necessary resources are being allocated and memory is being evenly distributed (verified by checking with sacct), what could be preventing all the jobs from running at the same time?

PS. I am adding below a typical sacct output. Everything seems to follow the expected behavior. The only things that are not clear to me are:

  1. The fact that the .batch step seems to take up many CPUs and more memory than is actually allocated to it
  2. The fact that even though steps are allocated on 5 nodes, almost all of them are concentrated in only one node. This is the typical behavior of all my tests so far, so I cannot consider it a coincidence.
          JobID     MaxRSS                                AllocTRES             NodeList 
--------------- ---------- ---------------------------------------- -------------------- 
         624245                billing=441,cpu=100,mem=1500M,node=5  ibiscohpc-wn[26-30] 
   624245.batch    408612K                   cpu=22,mem=330M,node=1       ibiscohpc-wn26 
       624245.0       632K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
       624245.1       632K                     cpu=1,mem=10M,node=1       ibiscohpc-wn27 
       624245.2       628K                     cpu=1,mem=10M,node=1       ibiscohpc-wn28 
       624245.3       632K                     cpu=1,mem=10M,node=1       ibiscohpc-wn29 
       624245.4       632K                     cpu=1,mem=10M,node=1       ibiscohpc-wn30 
       624245.5       632K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
       624245.6       636K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
       624245.7       628K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
       624245.8       628K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
       624245.9       636K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.10       632K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.11       632K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.12       628K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.13       632K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.14       632K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.15       632K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.16       628K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.17       632K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.18       628K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.19       632K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.20       628K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.21       632K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.22       636K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.23       628K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.24       628K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
      624245.25       628K                     cpu=1,mem=10M,node=1       ibiscohpc-wn26 
Share Improve this question edited Nov 21, 2024 at 11:59 Mark Rotteveel 109k229 gold badges156 silver badges221 bronze badges asked Nov 20, 2024 at 16:24 ChristosChristos 112 bronze badges 4
  • Can you try replacing --mem=1000M with --mem-per-cpu=15M in the header of the submission script? the --mem option is per job and does not necessarily ensure that each task has access to the same portion of allocated memory on each node. Or, if the nodes have more than 100 CPUs, use --nodes=1 to run all tasks on the same node, in which case --mem does not have that limitation – damienfrancois Commented Nov 21, 2024 at 10:39
  • I have already tried it, it doesn't seem to do the trick. If I contain all the tasks in one node (with a maximum of 96 CPUs) things work as expected, but it is really important for me to be able to share tasks among nodes to minimize waiting time since there are quite a lot of people using the cluster. This seems like a very basic feature so I feel like I am missing something obvious here. – Christos Commented Nov 21, 2024 at 11:04
  • The memory requirements might be too tight, they do not take into account the memory required by the job submission script itself. Can you try with --mem-per-cpu=1G in the submission script? I assume the cluster has more than 1G per CPU available on the nodes? – damienfrancois Commented Nov 22, 2024 at 8:51
  • The cluster has around 4GB of memory per CPU if I'm not mistaken, but increasing the memory allocation that didn't seem to change the behavior. – Christos Commented Nov 23, 2024 at 10:26
Add a comment  | 

1 Answer 1

Reset to default 0

I have not been able to find a satisfying answer to submitting tasks on multiple nodes using job steps. However I found that in my case (multiple identical runs) what works really well is to submit only one job step split in many tasks. The batch script would then look like:

#!/bin/sh

#SBATCH --partition parallel
#SBATCH --ntasks=100
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=100M
#SBATCH --job-name test
#SBATCH --output test.out

srun -n100 -u exec.sh

with the executable script exec.sh containing expressions with the variable $SLURM_PROCID to differentiate between the tasks. For example:

#!/bin/sh

echo $SLURM_PROCID
sleep 1200

This will result in the desired behavior, but from what I understand it has some drawbacks compared to submitting separate job steps when it comes to the independently controlling each task. However, until a better alternative is found, this is the only approach that seems to work for this use case.

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1742344955a4426359.html

相关推荐

  • slurm - Submitting job with multiple parallel steps - Stack Overflow

    I am trying to set up jobs with multiple steps, essentially running many independent copies of the same

    4小时前
    10

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信