GPU Server

The SCF can help with access to several GPU resources:

  • The SCF operates a single GPU hosted on the scf-sm20 node of the 'high' partition of our Linux cluster. The GPU is an Nvidia Tesla K20Xm with 6 GB memory and 2688 CUDA cores.
    • You need to use the SLURM queueing software (discussed here) to run any job making use of the GPU. You may want to use an interactive session to develop and test your GPU code. That same link also has information on monitoring GPU usage of your job.
  • The SCF also operates (1) an NVIDIA Tesla K80 dual GPU that has two GPUs, each with 12 GB memory and 2496 CUDA cores, (2) an NVIDIA GeForce GTX TITAN X with 12 GB memory, (3) an NVIDIA Tesla K40 with 12 GB memory, and (4) an NVIDIA Titan X (Pascal) with 12 GB memory. These GPUs are owned by individual faculty members but may in some cases be made available for use by others in the department by consult [at] stat [dot] berkeley [dot] edu (emailing us).
  • Priority access to 8 GPUs on the campus Savio cluster are available through the SCF condo, and access to additional GPUs is available through the Savio faculty computing allowance. Please contact SCF staff for more information.

We provide the following software that will help you in making use of the GPU:

  • CUDA (version 8.0 is the default; older versions can be made available)
  • CUDNN (CUDNN 5.1 (for CUDA 8.0) is the default; older versions can be made available)
  • MAGMA (in progress)
  • pyCUDA (version 2016.1.2)
  • Tensorflow (version 1.1.0)
  • Torch (Github as of 2017-03-10) (including the packages cutorch, cunn, cudnn, and the Facebook packages in fblualib)
  • Theano (version 0.7.0)
  • Caffe (built with CUDA 8.0, cuDNN 5.1 and with the Python/3 and Python/2 interfaces)
  • We can install additional or upgrade current software as needed. 

As of September 2016 we are now using Linux environment modules to manage the use of GPU-based software. Here's how to set things up so you can use various software. You could insert any of these commands in your .bashrc (after the stanza involving ~skel/std.bashrc) so they are always in effect or invoke them as needed in a script (including a cluster submission script) or in a terminal session.

For software that uses the GPU (via CUDA) for back-end computations:

  • Tensorflow: invoke "module load tensorflow" (for the Python 3 interface) or "module load tensorflow/1.1.0-gpu-python2" (for the Python 2 interface).
  • Torch: invoke "module load torch".
  • Theano use of GPU: invoke "module load theano".
  • Caffe: invoke "module load caffe" (for Caffe with Python 3 interface) or "module load caffe/2017-02-23-py2" (for Caffe with Python 2 interface).
  • PyCUDA: invoke "module load pycuda"

To use the software only on the CPU:

  • Tensorflow: simply import tensorflow in Python as with any standard Python package.
  • Torch: Torch is not currently set up to only use the CPU so generally won't work on machines without a GPU.
  • Theano: do not load the theano module.
  • Caffe: Caffe is not currently set up to only use the CPU so generally won't work on machines without a GPU. However we have a test CPU-only installation on arwen that you can try.

To use program with CUDA and related packages directly, please see this tutorial for more details. You'll need to load CUDA as follows in order to be able to compile and run your code:

  • CUDA: to use CUDA directly in C or another language, invoke "module load cuda".
  • cuDNN: to make use of cuDNN, you need to invoke "module load cudnn" (for version 5.1 for use with CUDA 8.0).

If you have questions or would like additional GPU-software related software installed, please contact consult [at] stat [dot] berkeley [dot] edu.