GPU Operator Setup
Kubernetes provides access to special hardware resources such as NVIDIA GPUs and other devices through the device plugin framework. However, configuring and managing nodes with these hardware resources requires configuration of multiple software components such as drivers, container runtimes or other libraries which are difficult and prone to errors.
The NVIDIA GPU Operator uses the operator framework within Kubernetes to automate the management of all NVIDIA software components needed to provision GPU. These components include the NVIDIA drivers (to enable CUDA), Kubernetes device plugin for GPUs, the NVIDIA Container Runtime, automatic node labeling, DCGM based monitoring and others.
Installation
We recommend installing the NVIDIA GPU Operator to the Kubernetes environment via the externally available GPU Operator Helm Chart.
Detailed instructions on the installation and upgrade process and possible and recommended configuration can also be found at Installing the NVIDIA GPU Operator.
Configuration
The NVIDIA GPU Operator Helm-based installation can be configured via Helm value overrides.
An overview of available Helm config options can be found at NVIDIA GPU Operator Helm values.yaml file.
GPU Sharing with Multi-Instance GPU (MIG)
The NVIDIA Multi-Instance GPU (MIG) feature allows GPUs (with NVIDIA Ampere architecture) to be securely partitioned into up to seven separate virtual GPU Instances for CUDA applications, providing multiple users with separate GPU resources for optimal GPU utilization.
For details on the concept, we refer to the general documentation at NVIDIA Multi-Instance GPU User Guide documentation.
A physical GPU can be divided into multiple virtual MIG GPUs, where the size (memory and processing units) is defined by a specific MIG profile. Further details on available profiles for different GPU types / models can be found in the MIG User Guide.
Profile Definition
For different NVIDIA GPU types, specific default MIG profiles are available for defining MIG-sliced virtual GPUs. However, to have more control on individual partitioning of one or more GPUs on a Kubernetes node, custom MIG profiles can be defined.
The custom profiles are defined and maintained as part of the NVIDIA GPU Operator Helm Chart value overrides.
The following example illustrates a custom setup of configuration options for the partitioning of an NVIDIA A100 80GB GPU:
| Custom MIG Profile for A100 | Partitioning | Number (virtual) GPU Devices | Allocatable Kubernetes Resources |
|---|---|---|---|
custom-mig-a100-80gb-2 | 2 x 3g.40gb = 3 Compute Instances (GPC) / 40GB Memory | 2 | nvidia.com/mig-3g.40gb: 2 |
custom-mig-a100-80gb-3 | 1 x 3g.40gb = 3 Compute Instances (GPC) / 40GB Memory 2 x 2g.20gb = 2 Compute Instances (GPC) / 20GB Memory | 3 | nvidia.com/mig-3g.40gb: 1nvidia.com/mig-2g.20gb: 2 |
custom-mig-a100-80gb-4 | 3 x 2g.20gb = 2 Compute Instances (GPC) / 20GB Memory 1 x 1g.10gb = 1 Compute Instances (GPC) / 10GB Memory | 4 | nvidia.com/mig-2g.20gb: 3nvidia.com/mig-1g.10gb: 1 |
custom-mig-a100-80gb-5 | 2 x 2g.20gb = 2 Compute Instances (GPC) / 20GB Memory 3 x 1g.10gb = 1 Compute Instances (GPC) / 10GB Memory | 5 | nvidia.com/mig-2g.20gb: 2nvidia.com/mig-1g.10gb: 3 |
The following yaml file shows an example for defining the custom profiles mentioned above as a Helm value section for the NVIDIA GPU Operator Helm installation:
mig:
strategy: mixed
migManager:
config:
default: "all-disabled"
name: custom-mig-parted-configs
create: true
data:
config.yaml: |-
version: v1
mig-configs:
all-disabled:
- devices: all
mig-enabled: false
custom-mig-a100-80gb-2:
- devices: all
mig-enabled: true
mig-devices:
"3g.40gb": 2
custom-mig-a100-80gb-3:
- devices: all
mig-enabled: true
mig-devices:
"3g.40gb": 1
"2g.20gb": 2
custom-mig-a100-80gb-4:
- devices: all
mig-enabled: true
mig-devices:
"2g.20gb": 3
"1g.10gb": 1
custom-mig-a100-80gb-5:
- devices: all
mig-enabled: true
mig-devices:
"2g.20gb": 2
"1g.10gb": 3
The MIG strategy mixed was selected to potentially allow for dedicated / separated MIG profiles for multiple GPUs on a single Kubernetes node instance (cf. MIG Support in Kubernetes).
With the example above and the config entry devices: all the respective profile applies the configured device slicing to all GPUs on a single Kubernetes node.
If individual GPU slicing config should be applied to individual physical GPUs on a multi-GPU Kubernetes node, this can also be achieved via the NVIDIA GPU Operator config:
mig:
strategy: mixed
migManager:
config:
default: "all-disabled"
name: custom-mig-parted-configs
create: true
data:
config.yaml: |-
version: v1
mig-configs:
all-disabled:
- devices: all
mig-enabled: false
custom-mig-4-a100-80gb-0-3-4-5:
- devices: [0]
mig-enabled: false
- devices: [1]
mig-enabled: true
mig-devices:
"3g.40gb": 1
"2g.20gb": 2
- devices: [3]
mig-enabled: true
mig-devices:
"2g.20gb": 3
"1g.10gb": 1
- devices: [4]
mig-enabled: true
mig-devices:
"2g.20gb": 2
"1g.10gb": 3
The example above defines a MIG profile custom-mig-4-a100-80gb-0-3-4-5, which would apply the custom profile setting custom-mig-a100-80gb-3, custom-mig-a100-80gb-4 and custom-mig-a100-80gb-5 to three physical GPUs individually on a 4xA100 GPU Kubernetes node, while leaving one physical A100 instance without any partitioning.
Profile Deployment
The respective custom profile is attached to a Kubernetes GPU node via applying the respective Kubernetes label:
nvidia.com/mig.config: <custom-profile>
Although the label can be attached manually to a node via
kubectl label nodes <node-name> nvidia.com/mig.config=<custom-profile> --overwrite
the label could be automatically assigned to new nodes joining a Kubernetes cluster via the definition of node labels based e.g. on node pool setting in a productive environment.
Kubernetes Pod scheduling (to the virtual MIG devices) is defined via the Kubernetes Deployment’s resource section. The example shows usage of one virtual GPU device for one Pod for a selected A100 profile:
resources:
limits:
nvidia.com/mig-<profile>: 1
requests:
nvidia.com/mig-<profile>: 1
The resource limit / request settings can be used to coordinate the respective application Pod scheduling in context of the profile-dependent partitioning (see table with examples above).