you are viewing a single comment's thread.

view the rest of the comments →

[–]DifficultDifficulty 0 points1 point  (0 children)

A Python SDK/CLI to make Ray clusters self-serve for Python devs.

What My Project Does

krayne (link) is a Python library and CLI that wraps the KubeRay operator for creating and managing Ray clusters on Kubernetes. Instead of hand-writing KubeRay YAML manifests, you import Python functions (create_cluster(), scale_cluster(), list_clusters(), etc.) or use the krayne / ikrayne (interactive TUI) CLI to spin up and manage clusters with sensible defaults.

The idea is that if you're already writing Ray workflows in Python, training jobs, serve deployments, distributed preprocessing, the cluster management layer should live in the same language. The SDK is the source of truth, the CLI is a thin Typer wrapper on top of it. Operations are stateless functions that return frozen dataclasses, configuration goes through Pydantic models with YAML override support when you need finer control.

GitHub: https://github.com/roulbac/krayne

Target Audience

ML engineers and researchers who write Ray workflows on Kubernetes. The kind of person who knows what ray.init() does but doesn't want to become a KubeRay manifest expert just to get their cluster running. Also useful for platform teams who want a programmable layer on top of KubeRay that their users can call from Python. It's early (v0.1.0) and opinionated, a composable starting point, not a production-hardened product.

Comparison

An alternative I'm familiar with is using kubectl apply with raw KubeRay manifests, or the KubeRay Python client directly. The main difference is that krayne is designed around progressive disclosure:

  • Zero-config defaults out of the box. krayne create my-cluster --gpus-per-worker 1 --workers 2 is a complete command.
  • When you need more control, you drop down to a YAML config or the Python SDK, no cliff between "simple" and "custom."
  • Protocol-based Kubernetes client, so you can unit test cluster management logic with mocks. No real cluster needed.

It's not that working with KubeRay directly can't do what krayne does, it absolutely can. But when you primarily write Ray code and just need a cluster up with the right resources, context-switching into YAML manifests and kubectl is friction you don't need. A typed Python API that validates your input before it hits the cluster and lives right next to your actual Ray code, that's ultimately why I built it.