GPU Server

The SCF can help with access to several GPU resources:

  • The SCF operates a single GPU hosted on the scf-sm20 node of the 'high' partition of our Linux cluster. The GPU is an Nvidia Tesla K20Xm with 6 GB memory and 2688 CUDA cores.
    • You need to use the SLURM queueing software (discussed here) to run any job making use of the GPU. You may want to use an interactive session to develop and test your GPU code. That same link also has information on monitoring GPU usage of your job.
  • The SCF also operates (1) an NVIDIA Tesla K80 dual GPU that has two GPUs, each with 12 GB memory and 2496 CUDA cores, (2) an NVIDIA GeForce GTX TITAN X with 12 GB memory, (3) an NVIDIA Tesla K40 with 12 GB memory, (4) an NVIDIA Titan X (Pascal) with 12 GB memory, and (5) an NVIDIA Titan Xp with 12 GB memory. These GPUs are owned by individual faculty members but may in some cases be made available for use by others in the department by consult [at] stat [dot] berkeley [dot] edu (emailing us).
  • Priority access for all department members to 8 GPUs on the campus Savio cluster is available through the SCF condo, and access to additional GPUs is available through the Savio faculty computing allowance. Please contact SCF staff for more information.

We provide the following software that will help you in making use of the GPU:

  • CUDA (version 9.0; other versions can be made available)
  • cuDNN (cuDNN 7.4 (for CUDA 9.0); other versions can be made available)
  • Tensorflow (version 1.12.0 for Python 3.6; as of February 2019, Tensorflow is not yet available for Python 3.7; we can also provide instructions for running Tensorflow through R)
  • Keras (version 2.2.4)
  • PyTorch (version 0.4.1)
  • Theano (version 1.0.4)
  • Caffe (latest version via the BVLC Docker container, with the Python 2.7 interface)
  • PyCUDA (version 2018.1.1)
  • We can install additional or upgrade current software as needed. 

We use Linux environment modules to manage the use of GPU-based software, as discussed next. Note that you could insert any of these commands in your .bashrc (after the stanza involving ~skel/std.bashrc) so they are always in effect or invoke them as needed in a script (including a cluster submission script) or in a terminal session.

For software that uses the GPU (via CUDA) for back-end computations:

  • Tensorflow: invoke "module load tensorflow". Note that Tensorflow won't work on roo as it has an older CPU, but we can figure out a work-around if needed; just let us know.
  • PyTorch: invoke "module load pytorch".
  • Theano: invoke "module load theano". You will see a warning about a too-recent version of cuDNN. If this seems to cause problems, let us know.
  • Caffe: contact us for instructions.
  • PyCUDA: invoke "module load pycuda"

To use the software only on the CPU:

  • Tensorflow: simply import tensorflow in Python as with any standard Python package. Note that Tensorflow won't work on arwen, beren, and a few of our other machines with old CPUs, but should work on the cluster nodes as well as on gandalf and radagast among others.
  • PyTorch: simply import torch in Python as with any standard Python package.
  • Theano: do not load the theano module.
  • Caffe: contact us for instructions.

To use program with CUDA and related packages directly, please see this tutorial for more details. You'll need to load CUDA as follows in order to be able to compile and run your code:

  • CUDA: to use CUDA directly in C or another language, invoke "module load cuda".
  • cuDNN: to make use of cuDNN, you need to invoke "module load cudnn" (for version 7.4 for use with CUDA 9.0).

If you have questions or would like additional GPU-related software installed, please contact consult [at] stat [dot] berkeley [dot] edu.