docs(operations): add containerized GPU workloads guide#555
docs(operations): add containerized GPU workloads guide#555Aleksei Sviridkin (lexfrei) wants to merge 1 commit into
Conversation
✅ Deploy Preview for cozystack ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR adds a new operations documentation page explaining how to deploy and use the ChangesGPU Container Workloads Documentation
Possibly related issues
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request adds a new documentation page detailing how to run containerized GPU workloads using the container variant of the cozystack.gpu-operator package. The review feedback suggests specifying the cozy-system namespace in both the kubectl patch command and the Package resource manifest to ensure they are applied to the correct namespace.
| kubectl patch packages.cozystack.io cozystack.cozystack-platform --type=json \ | ||
| -p '[{"op": "add", "path": "/spec/components/platform/values/bundles/enabledPackages/-", "value": "cozystack.gpu-operator"}]' |
There was a problem hiding this comment.
In Cozystack, the Package resources (including cozystack.cozystack-platform) are typically located in the cozy-system namespace. Running kubectl patch without specifying the namespace will fail if the user's current context is set to another namespace (like default). Adding -n cozy-system ensures the command runs successfully.
| kubectl patch packages.cozystack.io cozystack.cozystack-platform --type=json \ | |
| -p '[{"op": "add", "path": "/spec/components/platform/values/bundles/enabledPackages/-", "value": "cozystack.gpu-operator"}]' | |
| kubectl patch packages.cozystack.io cozystack.cozystack-platform -n cozy-system --type=json \\ | |
| -p '[{"op": "add", "path": "/spec/components/platform/values/bundles/enabledPackages/-", "value": "cozystack.gpu-operator"}]' |
| apiVersion: cozystack.io/v1alpha1 | ||
| kind: Package | ||
| metadata: | ||
| name: cozystack.gpu-operator | ||
| spec: | ||
| variant: container |
There was a problem hiding this comment.
The Package resource needs to be created in the cozy-system namespace for the Cozystack operator to detect and reconcile it. Adding namespace: cozy-system to the metadata ensures it is applied to the correct namespace.
| apiVersion: cozystack.io/v1alpha1 | |
| kind: Package | |
| metadata: | |
| name: cozystack.gpu-operator | |
| spec: | |
| variant: container | |
| apiVersion: cozystack.io/v1alpha1 | |
| kind: Package | |
| metadata: | |
| name: cozystack.gpu-operator | |
| namespace: cozy-system | |
| spec: | |
| variant: container |
Document the new container variant of cozystack.gpu-operator, paired with cozystack/cozystack#2766. Covers the apt-installed-driver-and-toolkit Linux shape that the variant targets: when to pick it over the passthrough and vGPU variants, prerequisites (host driver + host nvidia-container-toolkit, validated via nvidia-smi over kubectl debug), the operator-validator host-driver auto-detect path (/host/usr/bin/nvidia-smi), Talos caveat with a pointer to the values-native-talos.yaml reference, install steps, a sample CUDA pod for verification, the variant comparison matrix, and a cross-reference to the HAMi sharing guide for tenant Kubernetes clusters. Lands under operations/ — symmetric with virtualization/gpu.md (VM passthrough on management cluster) and kubernetes/gpu-sharing.md (HAMi in tenant Kubernetes addons). Assisted-By: Claude <[email protected]> Signed-off-by: Aleksei Sviridkin <[email protected]>
3170d45 to
8b83e54
Compare
|
Actionable comments posted: 0 |
What this PR does
Add a new operations guide describing the
containervariant ofcozystack.gpu-operator— the architectural mode for containerized GPU workloads (CUDA pods, ML training, inference) on Linux GPU nodes that already ship the NVIDIA driver andnvidia-container-toolkitvia the distro package manager.The new page lands at
content/en/docs/next/operations/gpu-container-workloads.mdand rounds out the GPU documentation surface:defaultvariant).containervariant).Content covers when to pick the variant (host driver + host toolkit prerequisite), the operator-validator host-driver auto-detect mechanism (
/host/usr/bin/nvidia-smi), the Talos caveat with a pointer to theexamples/values-native-talos.yamlreference, install steps withPackageCRvariant: container, a sample CUDA pod for verification, and a three-row variant comparison matrix.Companion to cozystack/cozystack#2766, which adds the
containervariant itself.Release note
Summary by CodeRabbit