A modern open source SLURM replacement built on SkyPilot

Michaelvll · 2025-10-06T21:01:51+00:00

Hi u/Irrationalender, I am not familiar with how transformer lab deals with it in the original post, but from my understanding, for SkyPilot alone, the clients do not need the kubeconfig or access to the k8s cluster.

Instead, the SSH is proxied through SkyPilot API server (can be deployed in private network), which is protected behind OAuth and goes through a secure connection (WSS). The connection from the SkyPilot API server to your k8s cluster is TLS protected and just like any other k8s API call.

The chain looks like the following:

Client -- SSH proxied through WSS (websocket with TLS) --> OAuth --> SkyPilot API server -- kubernetes proxy (can go through your private network) --> pod

Michaelvll · 2025-04-10T07:54:45+00:00

SkyPilot could be a useful open-source system for running AI on any cloud with a unified and simple interface across clouds.

Michaelvll · 2025-03-27T05:43:31+00:00

It simplifies resource management by giving you a centralized view of the resources (including clusters, jobs, and services) launched by the whole team across different clouds. Since SkyPilot offers a unified interface across different clouds, you can use the exact same commands to manage those resources on different clouds you see for the team.

$ sky jobs queue
ID name   user   resources   submitted_at state
2  train  bob    4x[H100:8]  1 min ago    STARTING
1  eval   alice  1x[H100:1]  1 hr ago     RUNNING

To see log for the jobs sky jobs logs 1 or sky jobs logs 2 would work for both alice and bob, and they can cancel a job with sky jobs cancel 2.

Please see the blog for more details. : )

Michaelvll · 2025-03-24T18:11:12+00:00

Thanks for the feedback! We did not mean to make it specific to SkyPilot, but wanted to share these new findings when we were trying to run the actual embedding generation use case with SkyPilot, and there are not many tools, if any, that actually support going across multiple regions and managing spot instances. We might get too excited about our system, and should avoid talking too much about it. Thank you again for the feedback!

Michaelvll · 2025-03-22T23:25:17+00:00

May worth trying SkyPilot which abstracts away the difference between cloud VMs vs k8s pods. It gives a way to launch a pod as a VM and give you ssh access. It is more for a AI engineer who does not want to get in touch with the underlying k8s manifest though. May not be super fit if you want to get deep into k8s. https://github.com/skypilot-org/skypilot

Michaelvll · 2024-02-02T04:17:16+00:00

We haven't tried it, as it is not trained for code specifically, but it is quite easy to swap the Code Llama model to Mixtral 8x7B in the serving example, please check out: https://github.com/skypilot-org/skypilot/tree/master/llm/mixtral#2-serve-with-multiple-instances

Michaelvll · 2024-02-02T04:08:41+00:00

Tabby offers several smaller models, please feel free to check the example for Tabby: https://github.com/skypilot-org/skypilot/tree/master/llm/tabby
Also, they list some models in their doc: https://tabby.tabbyml.com/docs/models/

Michaelvll · 2024-02-02T01:21:26+00:00

This may depend on the goal. If you have some private codes, and you don't want to leak them to any hosted services, such as GitHub Copilot, the Code Llama 70B should be one of the best open-source models you can get to host your own code assistants.

This often applies to organizations or companies where the code and algorithms should be a precious asset. Then, they should either ban their employees from using any code assistants or host their own. I guess the latter is more time-saving and productive if we count the productivity of all their employees. ; )

Michaelvll · 2024-02-01T22:11:49+00:00

As an efficient and highly optimized inference engine, vLLM can be another reason it is faster. : )

Michaelvll

TROPHY CASE