GPGPU Computation

The idea behind general-purpose computing on graphics processing units (GPGPU computation) is that you assign computations that are traditionally solved on a CPU (central processing unit) to a graphics processing unit (GPU). The nature of operations on the GPU means that a single unit can deliver computational performance comparable to using many CPUs in parallel. GPGPU computation is only supported for Simcenter STAR-CCM+ servers running under Linux.

The GPGPU capabilities of Simcenter STAR-CCM+ allow end-to-end computations on General-Purpose Graphics Processing Units (GPGPUs) for specific core physics solvers. In particular, the segregated flow solver is supported for GPGPU computation along with a wide array of turbulence models. Additional models and solvers are expected to become GPGPU-compatible in future releases.

If a solver is GPGPU-compatible, Simcenter STAR-CCM+ runs all heavy computations on the GPGPU. These computations include:
  • the discretization of the partial differential equations that describe the physics (for example, the Navier-Stokes equations)
  • the solution of the resulting linear systems using the Krylov-accelerated algebraic multigrid (AMG) solver
This end-to-end solver capability is essential for high utilization of GPGPU hardware available to Simcenter STAR-CCM+.
To run a Simcenter STAR-CCM+ simulation using GPGPU hardware, you must ensure that:
  1. Simcenter STAR-CCM+ is launched with the GPGPU feature enabled.
  2. Only GPGPU-compatible models and solvers are chosen for the simulation. See Supported Solvers and Models.
When these requirements are satisfied, Simcenter STAR-CCM+ automatically runs the simulation using the GPGPUs available to it. Otherwise, Simcenter STAR-CCM+ issues a warning message and runs the simulation on the CPU—which can result in much worse performance than expected.

Due to architectural differences between CPUs and GPGPUs, digit-by-digit reproducibility cannot be achieved in GPGPU runs. This difference should not affect the overall convergence behavior.

Selection of GPGPUs

Several options are provided by which you can specify the GPGPUs that Simcenter STAR-CCM+ is to use. These are detailed in the Command Line Reference, GPGPU Options. Simcenter STAR-CCM+ does not support restriction of the GPGPUs that can be used via the environment variables CUDA_VISIBLE_DEVICES, HIP_VISIBLE_DEVICES, GPU_DEVICE_ORDINAL, or ROCR_VISIBLE_DEVICES. Simcenter STAR-CCM+ only supports GPGPU restriction using the -gpgpu command line argument.

GPGPU Supported Hardware

For GPGPU-enabled simulations, certain card models from NVIDIA and AMD are supported. Models with HBM (High Bandwidth Memory) are preferred as GDDR (graphics double data rate) memory tends to have insufficient bandwidth for CFD applications.

Cards with NVIDIA's Volta, Turing, Ampere, Hopper, and Ada Lovelace architectures are recommended. CUDA 11.8 or newer drivers are required (CUDA driver version 520.61.05 or newer). Hardware partitioning using NVIDIA Multi-Instance GPU (MIG) is not supported.

Cards from the AMD Instinct MI100 and MI200 series, as well as the Radeon PRO W6800, V620, W7800, and W7900, are recommended. The Instinct MI300 series is additionally supported, but may not deliver optimum performance. The AMDGPU driver is required, while the AMDGPU-PRO driver is unsupported.

If you encounter hangs on AMD GPUs and the kernel log (dmesg) reports messages like [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, set the HSA_ENABLE_SDMA=0 environment variable.

GPGPU Licensing

To run a simulation with GPGPU computation enabled, you must use one of the following license options:
  • Simcenter STAR-CCM+ Power Session Plus
  • Simcenter STAR-CCM+ Power on Demand

Using MPS

If there are more CPU processes than GPGPUs being used on a host, which is a common case for modern architectures, the CUDA Multi-Process Service (MPS) should be employed so as to avoid performance penalties from GPGPU over-subscription. Simcenter STAR-CCM+ automatically starts and terminates MPS on a host when all of the following prerequisites are met:
  • More CPU processes than GPGPUs are specified on the host
  • MPS is not already running on the host

In managed compute environments, such as HPC clusters, MPS can be handled system-wide. If Simcenter STAR-CCM+ takes over MPS management, messages are printed to the console about MPS being automatically started and terminated. You can avoid automatic MPS handling by appending the :nomps qualifier to the -gpgpu <...> command line parameter, or through the user interface.

MPS is not relevant for AMD GPUs. No automatic MPS handling is performed and :nomps is silently ignored.

MPS Limitations
  • On GPGPUs that predate the Volta architecture, only the MPS process itself appears in the GPGPU process report in nvidia-smi. On newer architectures, all Simcenter STAR-CCM+ processes appear directly.
  • When using an external version of Intel MPI, automatic MPS is not guaranteed to work. Please observe the command line output for a warning regarding MPS not running when expected to.

GPGPU-aware CPU Binding

When enabling GPGPUs using any of the options available on the -gpgpu parameter, the CPU binding policy can be changed to the "gpgpuaware" policy (instead of the default "bandwidth" policy). You can change this policy using the command line option, -cpubind gpgpuaware. The gpgpuaware binding policy implies that CPU processes are placed on cores which are physically close to the GPGPU they are driving. This policy minimizes data transfers over slow paths.

For fully populated hosts, that is, using all available CPU cores, all CPU binding policies result in the same binding—hence the specific option passed on the command line is redundant.

Process Distribution on Multiple Hosts

When running on multiple hosts with GPGPU, it can be beneficial to under-subscribe each compute node, for example, so there is only one process per GPGPU on each host. For more information and recommendations regarding the number of compute processes per GPGPU, see Running Simulations with GPGPU.

You can achieve the required CPU to GPGPU count by specifying the correct number of processes per host in the machinefile, or by setting the relevant attributes when submitting a job via a batch system. For more information, see Defining a Machine File for Parallel Hosts and Using Batch Management Systems.

When running with -gpgpu auto, Simcenter STAR-CCM+ automatically distributes the processes evenly across all hosts to improve GPGPU utilization. To return to the default process distribution, use the -gpgpu auto:oversubscribe command (see Command-Line Reference).