Deploy ROSA + Nvidia GPU + RHOAI with Automation
This content is authored by Red Hat experts, but has not yet been tested on every supported configuration.
Getting Red Hat OpenShift AI up and running with NVIDIA GPUs on a Red Hat OpenShift Service on AWS (ROSA) cluster can involve a series of detailed steps, from installing various operators to managing dependencies. While manageable, this process can be time-consuming when you’re eager to start leveraging OpenShift AI for your projects.
This guide and its accompanying Git repository are designed to streamline your setup significantly. We focus on getting you productive faster by using Terraform to deploy a ROSA cluster with GPUs from the start. From there, Ansible scripts take over, automating the deployment and configuration of all necessary operators for both NVIDIA GPUs and Red Hat OpenShift AI. This means less manual configuration for you and more time spent on what matters: innovating with AI.
Prerequisites
- terraform
- git
- ansible cli
Create a Red Hat Hybrid Cloud Console Service Account
Please refer to this guide to create a service account to be used to create the cluster.
Note: Make sure to add the service account to a group that has ‘OCM cluster provisioner’ access. Refer to this guide on adding a service account to a group.
Set Environment Variables
Set and adjust the following variables to meet your requirements
Create the ROSA cluster
The ROSA cluster will be created with the following:
- A second machine pool with Nvidia GPU worker nodes
- Deploys Node Feature Discovery (NFD) Operator
- Deploys NVIDIA GPU Operator
- Deploys OpenShift Serverless Operator
- Deploys Service Mesh Operator
- Deploys Authorino Operator
- Deploys and Configures Red Hat OpenShift AI
- Deploys and Configures Accelerator Profile