Run:ai Documentation Library
Reference
Initializing search
GitHub
Home
Infrastructure Administrator
Platform Administrator
Researcher
Developer
Run:ai Documentation Library
GitHub
Home
Home
Overview
System Components
Whats New
Whats New
Version 2.19
Version 2.18
Version 2.17
Version 2.16
Version 2.15
Version 2.13
Changelog
Changelog
Hotfixes for 2.19
Hotfixes for 2.18
Hotfixes for 2.17
Hotfixes for 2.16
Hotfixes for 2.15
Hotfixes for 2.13
Data Privacy
Infrastructure Administrator
Infrastructure Administrator
Overview
Installation
Installation
Installation Types
Classic (SaaS)
Classic (SaaS)
Introduction
System Requirements
Network Requirements
Cluster Install
Customize Installation
Manually Create Projects
Cluster Upgrade
Cluster Uninstall
NVIDIA DGX Bundle
Self-hosted
Self-hosted
Overview
Kubernetes-based
Kubernetes-based
Prerequisites
Preparations
Install Control Plane
Install a Cluster
Install additional Clusters
Manually Create Projects
Next Steps
Upgrade
Uninstall
OpenShift-based
OpenShift-based
Prerequisites
Preparations
Install Control Plane
Install a Cluster
Install additional Clusters
Manually Create Projects
Next Steps
Upgrade
Uninstall
Researcher Setup
Researcher Setup
Introduction
Install the V1 CLI
Install the V2 CLI
Configuration
Configuration
Overview
Clusters
Advanced Cluster Configuration
Secure your Cluster
Shared Storage
Local Certificate Authority
Install Administrator CLI
Backup & Restore
High Availability
Scaling
Email and System Notifications
Set Node Roles
Review Kubernetes Access provided to Run:ai
External access to Containers
Node Affinity with Cloud Node Pools
Setup cluster wide PVC
Group Nodes
Workload Deletion Protection
Mark Assets for Run:ai
Set Default Scheduler
Maintenance
Maintenance
Monitoring and maintenance Overview
Node Maintenance
System Monitoring
Audit Log
Authentication & Authorization
Authentication & Authorization
Overview
Single Sign-On
Single Sign-On
Setup SSO with SAML
Setup SSO with OpenID Connect
Setup SSO with OpenShift
Users
Applications
Roles
Access Rules
Researcher Authentication
User Identity in Container
Troubleshooting
Troubleshooting
Logs Collection
Troubleshooting
Diagnostics
Platform Administrator
Platform Administrator
Overview
Authentication & Authorization
Authentication & Authorization
Users
Applications
Roles
Access Rules
Managing AI Intiatives
Managing AI Intiatives
Overview
Managing your Organization
Managing your Organization
Projects
Departments
Scheduling Rules
Managing your resources
Managing your resources
Nodes
Node Pools
Workloads
Workloads
Overview
Managing Workloads
Workload Support
Workload Assets
Workload Assets
Overview
Environments
Compute Resources
Data Sources
Templates
Credentials
Data Volumes
Policies
Policies
Overview
Policies
Policies Examples
Policies Reference
Older Policies
Older Policies
Policies V1
Integrations
Integrations
Overview
Working with Karpenter
Review your performance
Review your performance
Dashboard Analysis
Best Practices
Best Practices
From Docker to Run:ai
System Configuration
System Configuration
Administrator Messages
Researcher
Researcher
Overview
Quickstart Guides
Quickstart Guides
Run:ai Quickstart Guides
Train
Train
Training
Distributed Training
Build
Build
Basics
Build with Connected Ports
Jupyter Notebook
Visual Studio Code Web
Inference
GPU Fractions
Scheduling Basics
Scheduling Basics
Over-Quota, Basic Fairness & Bin-Packing
Queue Fairness
Workloads
Workloads
Managing Workloads
Workload Support
Workload Assets
Workload Assets
Overview
Environments
Compute Resources
Data Sources
Templates
Credentials
Data Volumes
Workspace
Workspace
Overview
Create a Workspace
Training
Inference
Command Line Interface
Command Line Interface
CLI V2
CLI V2
Overview
CLI Reference
CLI Examples
CLI V1
CLI V1
Introduction
runai attach
runai bash
runai config
runai delete
runai describe
runai exec
runai list
runai login
runai logout
runai logs
runai port-forward
runai resume
runai submit
runai submit-dist mpi
runai submit-dist pytorch
runai submit-dist tf
runai submit-dist xgboost
runai suspend
runai top node
runai update
runai version
runai whoami
Best Practices
Best Practices
Bare-Metal to Docker Images
Convert a Workload to Run Unattended
Save Deep Learning Checkpoints
Environment Variables
Email Notifications
Secrets as Environment Variables (CLI)
Scheduling
Scheduling
The Run:ai Scheduler
Allocation of GPU Fractions
Allocation of CPU and Memory
Advanced
Advanced
Dynamic GPU Fractions
Optimize performance with the Node Level Scheduler
GPU Time Slicing
GPU Memory Swap
Researcher Tools
Researcher Tools
Visual Studio Code
PyCharm
X11 & PyCharm
Jupyter Notebook
TensorBoard
Use Cases
Developer
Developer
Overview
API Authentication
REST API
Cluster API (Deprecated)
Cluster API (Deprecated)
Overview
Submit Workload via YAML
Submit Workload via HTTP/REST
Reference
Metrics
Metrics
Metrics via API
(Deprecated) Metrics via Prometheus
Kubernetes Workloads Integration
Reference
For a full reference for the YAML API parameters see the
YAML Reference
document.
Back to top