GPU Computing : TechWeb : Boston University

Modern GPUs (graphics processing units) provide the ability to perform computations in applications traditionally handled by CPUs. Using GPUs is rapidly becoming a new standard for data-parallel heterogeneous computing software in science and engineering. Many existing applications have been adapted to make effective use of multi-threaded GPUs.

GPU Resources
Running on the GPU Nodes
Software with GPU Acceleration
CPU vs. GPU
Using Only Your Assigned GPUs – CUDA_VISIBLE_DEVICES
GPU Consulting

GPU Resources

The Shared Computing Cluster includes nodes with NVIDIA GPU cards, some of which are configured for computational workloads and some – for interactive VirtualGL sessions. For more details on nodes available on the SCC, please visit the Technical Summary page.

Running on the GPU Nodes

Access to GPU enabled nodes is via the batch system (qsub/qrsh). Direct login to these nodes is not permitted. The GPU nodes support all of the standard batch options in addition to the following GPU specific options. ( -l gpus=G is a required option)

GPU Batch Option	Description
-l gpus=G	G is the number of GPUs.
-l gpu_type=GPU_MODEL	Current options for GPU_MODEL are M2070, K40m, P100, and V100.
-l gpu_memory=#G	#G represents the minimum amount of memory required per GPU. The M2070 has 6GB, K40m and several of the P100 have 12GB of memory. Some of the P100 GPUs and the V100 GPUs have 16 GB of memory.
-l gpu_c=#CC	GPU compute_capability. M2070 NVIDIA cards have a compute capability of 2.0. K40m cards have 3.5 compute capability. The P100 is 6.0 and the V100 is 7.0. Some GPU-enabled software (like the popular Tensorflow machine learning program) have restrictions on the compute capability they support and require 3.5 or higher.

Below are some examples of requesting GPU resources.

Interactive Batch

To request an interactive session with access to 1 GPU (any type) for 12 hours:

scc1% qrsh -l gpus=1

To request an interactive session with access to 1 GPU with compute capability of at least 3.5 (which includes the K40m,P100, and V100) and 1 CPU processor:

scc1% qrsh -l gpus=1 -l gpu_c=3.5

Non-interactive Batch Job

To submit a batch job with access to 1 GPU (compute capability of at least 3.5) and 1 CPU processor:

scc1% qsub -l gpus=1 -l gpu_c=3.5 your_batch_script

See an example of a script to submit a batch job.

Software with GPU Acceleration

As GPU computing remains a fairly new paradigm, it is not supported yet by all programming languages and is particularly limited in application support. We are striving to provide you with the most up to date information but this will be an evolving process. We are dividing languages and packages into three categories listed below.

Languages and software packages we have successfully tested for GPU support:

CUDA C/C++
CUDA FORTRAN
OpenACC C/C++
OpenACC Fortran
MATLAB (Parallel Computing Toolbox)
R (various packages)
Java (requires to load module jcuda)

CPU vs. GPU

Look below to see if your application seems suitable for converting to use GPUs.

CPUs are great for task parallelism:

High performance per single thread execution
Fast caches used for data reuse
Complex control logic

GPUs are superb for data parallelism:

High throughput on parallel calculations
Arithmetic intensity: Lots of processor cores to perform simple math calculations
Fast access to local and shared memory

Ideal applications for general programming on GPUs:

Large data sets with minimal dependencies between data elements
High parallelism in computation
High number of arithmetic operations

Physical modeling, data analysis, computational engineering, matrix algebra are just a few examples of applications that might greatly benefit from GPU computations.

Using Only Your Assigned GPUs – CUDA_VISIBLE_DEVICES

Please only use the GPUs assigned to you. These are indicated by the environmental variable: CUDA_VISIBLE_DEVICES

As many of the SCC compute nodes have multiple GPUs, each job must only run on the
GPUs assigned to it by the batch system to avoid interference with other jobs. To ensure
that, the batch system sets CUDA_VISIBLE_DEVICES to a comma-separated list of
integers representing the GPUs assigned to the job. The cuda runtime library consults
this variable when it does device allocation. Therefore, unless the app does its own
device allocation, it will automatically comply with this policy.

Please DON’T manually set this variable to access other GPUs on the same node. For
example, many python codes developed on local computers come with lines that should
be AVOIDED (Commented out) like this:

# import os
# os.environ["CUDA_VISIBLE_DEVICES"]="0"

Instead, you can check out the system assigned GPU id by:

import os
print(os.getenv("CUDA_VISIBLE_DEVICES"))

GPU Consulting

RCS staff scientific programmers can help you with your questions concerning GPU programming. Please contact us at help@scc.bu.edu.

Important alerts

Incident: Intermittent Internet Connectivity Issues Across Campus

Resolved: The Links are Unavailable