Using GPU accelerated instances¶

To leverage GPU accelerated instances in Kubernetes, there are some pre-requisites and steps to follow. These include both configuration of the cluster and the applications docker image.

Infrastructure pre-requisites¶

A karpenter nodePool with ability to provision GPU accelerated instances, with an appropriate EC2NodeClass with an AMI that uses the correct container runtime. ref: nvidia container runtime. Amazon EKS optimized AMIs are available with the nvidia container runtime pre-configured. You can check the available EKS optimized AMIs here.
A device-plugin for the specific GPU type. eg: NVIDIA GPU device plugin or AWS neuron device plugin. This runs as a DaemonSet on the cluster, and exposes the GPU resources to the cluster.

To test the setup, you can use a simple test pod that will run on a GPU node.

ref: NVIDIA CUDA sample

apiVersion: v1
kind: Pod
metadata:
  name: gpu-test
spec:
  restartPolicy: OnFailure
  containers:
    - name: cuda-vector-add
      # https://catalog.ngc.nvidia.com/orgs/nvidia/teams/k8s/containers/cuda-sample
      image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04"
      resources:
        limits:
          nvidia.com/gpu: 1

Application pre-requisites¶

The application must be built with the correct CUDA libraries and drivers for the GPU type. This is easiest by using a base image that has the correct libraries installed and configured. For NVIDIA GPUs, the nvidia/cuda images are a good starting point. To run neuron based applications, the AWS Neuron dockerfile reference is a good starting point.

Testing the GPU availability in the application¶

To test the GPU availability in the application, for CUDA, you can run a script that checks the availability of the GPU device:

import torch
print(torch.cuda.is_available())

For AWS Neuron, you can run a script that checks the availability of the Neuron device:

neuron-top