NVIDIA GPU Operator setup

This article describes the setup of the NVIDIA GPU Operator in the Kubernetes cluster. We discuss configuration options for activating Multi-Instance GPU (MIG) for supported NVIDIA GPUs to allow sharing resources between workloads.

In this article:

Introduction
Installation
Configuration
GPU sharing with multi-instance GPU (MIG)

Introduction

Kubernetes provides access to special hardware resources such as NVIDIA GPUs and other devices through the device plugin framework.

However, configuring and managing nodes with these hardware resources requires configuration of multiple software components such as drivers, container runtimes, or other libraries which are difficult and prone to errors.

The NVIDIA GPU Operator uses the operator framework within Kubernetes to automate the management of all NVIDIA software components needed to provision GPU. These components include the NVIDIA drivers (to enable CUDA), Kubernetes device plugin for GPUs, the NVIDIA Container Runtime, automatic node labeling, DCGM based monitoring, and others.

Installation

We recommend installing the NVIDIA GPU Operator to the Kubernetes environment via the externally available NVIDIA GPU Operator Helm chart.

Detailed instructions on the installation and upgrade process and possible and recommended configuration can also be found at Installing the NVIDIA GPU Operator.

Configuration

The NVIDIA GPU Operator Helm-based installation can be configured using Helm value overrides.

An overview of available Helm configuration options can be found in the NVIDIA GPU Operator Helm values.yaml file.

The NVIDIA Multi-Instance GPU (MIG) feature allows GPUs (with NVIDIA Ampere architecture) to be securely partitioned into up to seven separate virtual GPU instances for CUDA applications, providing multiple users with separate GPU resources for optimal GPU utilisation.

For details on the concepts, see the NVIDIA Multi-Instance GPU User Guide.

A physical GPU can be divided into multiple virtual MIG GPUs, where the size (memory and processing units) is defined by a specific MIG profile.

Further details on available profiles for different GPU types / models can be found in the MIG User Guide.

Profile definition

For different NVIDIA GPU types, specific default MIG profiles are available for defining MIG-sliced virtual GPUs. However, to have more control on individual partitioning of one or more GPUs on a Kubernetes node, custom MIG profiles can be defined.

The custom profiles are defined and maintained with NVIDIA GPU Operator Helm chart value overrides.

The following example illustrates a custom setup of configuration options for the partitioning of an NVIDIA A100 80GB GPU:

Custom MIG profile for A100 Partitioning Number (virtual) GPU devices Allocatable Kubernetes resources

Custom MIG profile for A100	Partitioning	Number (virtual) GPU devices	Allocatable Kubernetes resources
`custom-mig-a100-80gb-2`	2 x 3g.40gb = 3 Compute Instances (GPC) / 40GB Memory	2	`nvidia.com/mig-3g.40gb: 2`
`custom-mig-a100-80gb-3`	1 x 3g.40gb = 3 Compute Instances (GPC) / 40GB Memory 2 x 2g.20gb = 2 Compute Instances (GPC) / 20GB Memory	3	`nvidia.com/mig-3g.40gb: 1` `nvidia.com/mig-2g.20gb: 2`
`custom-mig-a100-80gb-4`	3 x 2g.20gb = 2 Compute Instances (GPC) / 20GB Memory 1 x 1g.10gb = 1 Compute Instances (GPC) / 10GB Memory	4	`nvidia.com/mig-2g.20gb: 3` `nvidia.com/mig-1g.10gb: 1`
`custom-mig-a100-80gb-5`	2 x 2g.20gb = 2 Compute Instances (GPC) / 20GB Memory 3 x 1g.10gb = 1 Compute Instances (GPC) / 10GB Memory	5	`nvidia.com/mig-2g.20gb: 2` `nvidia.com/mig-1g.10gb: 3`

custom-mig-a100-80gb-2

2 x 3g.40gb = 3 Compute Instances (GPC) / 40GB Memory

nvidia.com/mig-3g.40gb: 2

custom-mig-a100-80gb-3

1 x 3g.40gb = 3 Compute Instances (GPC) / 40GB Memory
2 x 2g.20gb = 2 Compute Instances (GPC) / 20GB Memory

nvidia.com/mig-3g.40gb: 1
nvidia.com/mig-2g.20gb: 2

custom-mig-a100-80gb-4

3 x 2g.20gb = 2 Compute Instances (GPC) / 20GB Memory
1 x 1g.10gb = 1 Compute Instances (GPC) / 10GB Memory

nvidia.com/mig-2g.20gb: 3
nvidia.com/mig-1g.10gb: 1

custom-mig-a100-80gb-5

2 x 2g.20gb = 2 Compute Instances (GPC) / 20GB Memory
3 x 1g.10gb = 1 Compute Instances (GPC) / 10GB Memory

nvidia.com/mig-2g.20gb: 2
nvidia.com/mig-1g.10gb: 3

The following YAML file shows an example for defining the custom profiles described above as a Helm value section for the NVIDIA GPU Operator Helm installation:

mig:
  strategy: mixed
migManager:
  config:
    default: "all-disabled"
    name: custom-mig-parted-configs
    create: true
    data:
      config.yaml: |-
        version: v1
        mig-configs:
          all-disabled:
            - devices: all
              mig-enabled: false
          custom-mig-a100-80gb-2:
            - devices: all
              mig-enabled: true
              mig-devices:
                "3g.40gb": 2
          custom-mig-a100-80gb-3:
            - devices: all
              mig-enabled: true
              mig-devices:
                "3g.40gb": 1
                "2g.20gb": 2
          custom-mig-a100-80gb-4:
            - devices: all
              mig-enabled: true
              mig-devices:
                "2g.20gb": 3
                "1g.10gb": 1
          custom-mig-a100-80gb-5:
            - devices: all
              mig-enabled: true
              mig-devices:
                "2g.20gb": 2
                "1g.10gb": 3

The MIG strategy mixed was selected to potentially allow for dedicated / separated MIG profiles for multiple GPUs on a single Kubernetes node instance (see also MIG Support in Kubernetes).

With the example above and the config entry devices: all, the respective profile applies the configured device slicing to all GPUs on a single Kubernetes node.

If individual GPU-slicing configurations should be applied to individual physical GPUs on a multi-GPU Kubernetes node, this can also be achieved using the NVIDIA GPU Operator configuration:

mig:
  strategy: mixed
migManager:
  config:
    default: "all-disabled"
    name: custom-mig-parted-configs
    create: true
    data:
      config.yaml: |-
        version: v1
        mig-configs:
          all-disabled:
            - devices: all
              mig-enabled: false
          custom-mig-4-a100-80gb-0-3-4-5:
            - devices: [0]
              mig-enabled: false
            - devices: [1]
              mig-enabled: true
              mig-devices:
                "3g.40gb": 1
                "2g.20gb": 2
            - devices: [3]
              mig-enabled: true
              mig-devices:
                "2g.20gb": 3
                "1g.10gb": 1
            - devices: [4]
              mig-enabled: true
              mig-devices:
                "2g.20gb": 2
                "1g.10gb": 3

The example above defines an MIG profile custom-mig-4-a100-80gb-0-3-4-5, which applies the custom profile settings custom-mig-a100-80gb-3, custom-mig-a100-80gb-4, and custom-mig-a100-80gb-5 to three physical GPUs individually on a 4xA100 GPU Kubernetes node, while leaving one physical A100 instance without any partitioning.

Profile deployment

The custom profile is attached to a Kubernetes GPU node by applying the relevant Kubernetes label:

nvidia.com/mig.config: <custom-profile>

The label can also be attached manually to a node:

kubectl label nodes <node-name> nvidia.com/mig.config=<custom-profile> --overwrite

However, the label can be automatically assigned to new nodes joining a Kubernetes cluster using the definition of node labels based (for example) on the node pool setting in a production environment.

Kubernetes pod scheduling (to the virtual MIG devices) is defined in the Kubernetes deployment’s resource section. The following example shows usage of one virtual GPU device for one pod for a selected A100 profile:

resources:
  limits:
    nvidia.com/mig-<profile>: 1
  requests:
    nvidia.com/mig-<profile>: 1

The resource limit and request settings can be used to coordinate the application Pod scheduling in the context of the profile-dependent partitioning (see the table with examples above).