GPU Jobs
The SCF hosts a number of GPUs, available only by submitting a job through our SLURM scheduling software. The GPUs are quite varied in their hardware configurations (different generations of GPUS, with different speeds and GPU memory). We have documented the GPU servers to guide you in selecting which GPU you may want to use.
- General job submission info
-
To use the GPUs, you need to submit a job via our SLURM scheduling software. In doing so, you need to specifically request that your job use the GPU as follows using the 'gres' flag:
arwen:~/Desktop$ sbatch --partition=gpu --gres=gpu:1 job.sh
Note that the partition to use will vary, as discussed in the other tabs given in this set of tabbed information.
Once it starts your job will have exclusive access to the GPU and its memory. If another user is using the GPU, your job will be queued until the current job finishes.
Interactive jobs should use that same gres flag with the usual srun syntax for an interactive job.
arwen:~/Desktop$ srun --pty --partition=gpu --gres=gpu:1 /bin/bash
Given the heterogenity in the GPUs available, you may want to request use of a specific GPU type. To do so, you can add the type to the 'gres' flag, e.g., requesting an A100 GPU:
arwen:~/Desktop$ sbatch --partition=high --gres=gpu:A100:1 job.sh
If you want to interactively logon to the GPU node to check on compute or memory use of an sbatch job that uses the GPU, find the job ID of your job using squeue and insert that job ID in place of '<jobID>' in the following command. This will give you an interactive job running in the context of your original job:
arwen:~/Desktop$ srun --pty --partition=gpu --jobid=<jobid> /bin/bash
and then use nvidia-smi commands, e.g.,
scf-sm20:~$ nvidia-smi -q -d UTILIZATION,MEMORY -l 1
There are many ways to set up your code to use the GPU.
- Standard GPUs
-
One GPU is generally available to all SCF users; it is hosted on the server named `roo`. To submit a job that uses that GPU, you should submit to the gpu partition. See the other items below for other GPUs available on the SCF for which access is prioritized for members of certain faculty groups.
The GPUs formerly hosted on scf-sm20 and scf-sm21 have been retired, so there are no longer GPUs available through the high partition.
- High-performance GPUs
-
Additional GPUs have been obtained by the Steinhardt, Song, and Yu lab groups. Most of these GPUs have higher performance (either speed or GPU memory) than our standard GPUs.
Members of the lab group have priority access to the GPUs of their group. Other SCF users can submit jobs that use these GPUs but those jobs will be preempted (killed) if higher-priority jobs need access to the GPUs. Jobs are cancelled when preemption happens. If you want your job to be automatically started again (i.e., started from the beginning) when the node becomes available you can add the "--requeue" flag when you submit via sbatch.
To submit jobs requesting access to these GPUs, you need to specify either the jsteinhardt, yss or yugroup partitions. Here's an example:
arwen:~/Desktop$ sbatch --partition=jsteinhardt --gres=gpu:1 job.sh
To use multiple GPUs for a job (only possible when using a server with more than one GPU, namely scf-sm21, smaug, shadowfax, balrog, saruman, sunstone, rainbowquartz, smokyquartz, treebeard, and morgoth), simply change the number 1 after --gres=gpu to the number desired.
To request a specific GPU type, you can add that to the gres flag, e.g., here requesting an A100:
sbatch -p jsteinhardt --gres=gpu:A100:1 job.sh
If you need more than one CPU, please request that using the --cpus-per-task flag. The value you specify actually requests that number of hardware threads, but with the caveat that a given job is allocated all the threads on a given core to avoid contention between jobs for a given physical core.
Additionally, there are an another 40 A100 GPUs obtained by the Steinhardt lab group at a remote cluster located in Texas. Details are given in the drop-down below.
- Steinhardt group
-
The Steinhart group has priority access to the balrog, shadowfax, sunstone, rainbowquartz, smokyquartz (8 GPUs each), saruman (8, eventually 10, GPUs), and smaug (2 GPUs) GPU servers.
If you are in the group, simply submit jobs to the jsteinhardt partition and you will automatically preempt jobs by users not in the group if that is needed for your job to run.
Group members can also prioritize their jobs with respect to other jobs by users in the Steinhardt group. By default jobs will run in the `preemptive_high` Slurm QoS. Each user in the group can have at most three such running jobs at a time. Additional jobs will be queued. Group members can also submit to the `preemptive` QoS, with no limit on the number of running jobs (apart from hardware availability), using submission syntax like this:
sbatch -p jsteinhardt -q preemptive --gres=gpu:1 job.sh
Such jobs will still preempt jobs run by non-group members, but the jobs can be preempted by jobs running in the `preemptive_high` QoS.
In addition to the notes below, more details on optimal use of these servers can be obtained from the guide prepared by Steinhardt group members and the SCF and available by contacting one of us.
The smaug, saruman, and balrog GPUs have a lot of GPU memory and are primarily intended for training very large models (e.g., ImageNet not CIFAR10 or MNIST), but it is fine to use these GPUs for smaller problems if shadowfax, sunstone, rainbowquartz, and smokyquartz are busy.
By default, if you do not specify a GPU type or a particular GPU server, Slurm will try to run the job on shadowfax, sunstone, rainbowquartz, or smokyquartz , unless they are busy.
To request a specific GPU type, you can add that to the gres flag, e.g., here requesting an A100:
sbatch -p jsteinhardt --gres=gpu:A100:1 job.sh
If you need more than one CPU, please request that using the --cpus-per-task flag. The value you specify actually requests that number of hardware threads, but with the caveat that a given job is allocated all the threads on a given core to avoid contention between jobs for a given physical core. So the default of "-c 1" allocates one physical core and two hardware threads. Your CPU usage will be restricted to the number of threads you request.
As an example, since shadowfax has 48 CPUs (actually 48 threads and 24 physical cores as discussed above) and 8 GPUs, there are 6 CPUs per GPU. You could request more than 6 CPUs per GPU for your job, but note that if other group members do the same, it's possible that the total number of CPUs may be fully used before all the GPUs are used. Similar considerations hold for balrog (96 CPUs and 8 GPUs), saruman (104 CPUs and 8 GPUs) and smaug (64 CPUs and 2 GPUs) as well as rainbowquartz, smokyquartz and sunstone (all with 64 CPUs and 8 GPUs). That said, that's probably a rather unlikely scenario.
To see what jobs are running on particular machines, so that you can have a sense of when a job that requests a particular machine might start:
arwen:~> squeue -p jsteinhardt -o "%.9i %.20j %.12u %.2t %.11l %.11M %.11V %.5C %.8r %.6D %.20R %.13p %8q %b" JOBID NAME USER ST TIME_LIMIT TIME SUBMIT_TIME CPUS REASON NODES NODELIST(REASON) PRIORITY QOS TRES_PER_NODE 1077240 bash nikhil_ghosh R 28-00:00:00 12:51:48 2021-11-01T 1 None 1 balrog 0.00196710997 normal gpu:1 1077248 jupyter awei R 28-00:00:00 1:40:55 2021-11-01T 1 None 1 balrog 0.00092315604 preempti gpu:1 1077121 train.sh andyzou_jiam R 28-00:00:00 2-17:32:41 2021-10-29T 48 None 1 balrog 0.00027842330 preempti gpu:2
- Steinhardt group - remote cluster
-
In addition to the GPU resources listed above, an additional 40 A100 GPUs (on 5 servers) are available at a remote cluster hosted in a co-location facility in Texas. These resources are operated in similar fashion as the GPUs in the jsteinhardt partition, available on a preemptible basis, with priority access for group members.
To request access, either by a group member or a non-group member, please email consult@stat.berkeley.edu.
To connect to the remote cluster, ssh to lambda.stat.berkeley.edu. Note that one cannot access the cluster via JupyterHub.
The configuration of these servers is the same as other SCF machines, but the filesystem is distinct. Here are the key similarities and differences from running jobs on the SCF (local) cluster.
- One can login with your SCF credentials.
- All software installed by the SCF is available and should be identical to that on the SCF (local) cluster.
- The job submission process is the same as on the SCF (local) cluster.
- The home directories (and other parts of the filesystem) on this remote cluster are separate from those on the SCF.
- You will need to copy over any data to the filesystem of the remote cluster.
- If you've installed software yourself on the SCF, you'll need to reinstall or copy over to the remote cluster.
- The partition on the remote cluster is named lambda. However you don't need to specify it when submitting a job.
- Yu group
-
The Yu group has priority access to GPUs located on merry (1 GTX GPU), morgoth (2 TITAN GPUs), and treebeard (1 A100 GPU) servers. If you are in the group, simply submit jobs to the yugroup partition and you will automatically preempt jobs by users not in the group if it is needed for your job to run.
To request a specific GPU type, you can add that to the gres flag, e.g., here requesting an A100:
sbatch -p yugroup --gres=gpu:A100:1 job.sh
If you need more than one CPU, please request that using the --cpus-per-task flag, but note that merry only has four CPUs. The value you specify actually requests that number of hardware threads, but with the caveat that a given job is allocated all the threads on a given core to avoid contention between jobs for a given physical core.
Please contact SCF staff or group members for more details.
- Song group
-
The Song group has priority access to the GPUs located on luthien (4 A100 GPUs) and beren (8 A100 GPUs). If you are in the group, simply submit jobs to the yss partition and you will automatically preempt jobs by users not in the group if it is needed for your job to run.
If you need more than one CPU, please request that using the --cpus-per-task flag. The value you specify actually requests that number of hardware threads, but with the caveat that a given job is allocated all the threads on a given core to avoid contention between jobs for a given physical core.
Please contact SCF staff or group members for more details.