Kubeflow Trainer
- Designed for llm fine-tuning
- Enabling scalable, distributed training
- Support various frameworks(torch, jax, tensorflow)
You can develop your LLMs with:
- Python SDK
- K8s Custom Resources API -> Kubernetes Training Runtimes
Optimize GPU utilization and gang-scheduling for ML workloads by leveraging Kubernetes projects like
Top comments (0)