Using GPU accelerated instances¶
To leverage GPU accelerated instances in Kubernetes, there are some pre-requisites and steps to follow. These include both configuration of the cluster and the applications docker image.
Infrastructure pre-requisites¶
- A karpenter nodePool with ability to provision GPU accelerated instances, with an appropriate EC2NodeClass with an AMI that uses the correct container runtime. ref: nvidia container runtime. Amazon EKS optimized AMIs are available with the nvidia container runtime pre-configured. You can check the available EKS optimized AMIs here.
- A device-plugin for the specific GPU type. eg: NVIDIA GPU device plugin or AWS neuron device plugin. This runs as a DaemonSet on the cluster, and exposes the GPU resources to the cluster.
To test the setup, you can use a simple test pod that will run on a GPU node.
apiVersion: v1
kind: Pod
metadata:
name: gpu-test
spec:
restartPolicy: OnFailure
containers:
- name: cuda-vector-add
# https://catalog.ngc.nvidia.com/orgs/nvidia/teams/k8s/containers/cuda-sample
image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04"
resources:
limits:
nvidia.com/gpu: 1
Application pre-requisites¶
The application must be built with the correct CUDA libraries and drivers for the GPU type. This is easiest by using a base image that has the correct libraries installed and configured. For NVIDIA GPUs, the nvidia/cuda images are a good starting point. To run neuron based applications, the AWS Neuron dockerfile reference is a good starting point.
Testing the GPU availability in the application¶
To test the GPU availability in the application, for CUDA, you can run a script that checks the availability of the GPU device:
For AWS Neuron, you can run a script that checks the availability of the Neuron device: