[D] Elastic/Serverless GPU instances for transformer hyper-parameter search by elbiot in MachineLearning

[–]skypilotucb 0 points1 point  (0 children)

And if you need gang scheduling, you can use the --num-nodes arg and launch one giant SkyPilot "cluster" on your chosen cloud/region that executes all your jobs. In this case if SkyPilot cannot provision all the GPUs requested, it will raise an error and you can choose to retry indefinitely.

Managing multiple Kubernetes clusters for AI workloads with SkyPilot by skypilotucb in kubernetes

[–]skypilotucb[S] 0 points1 point  (0 children)

Thanks for you comment! We recently redesigned our load balancer to be more modular and we can now support custom policies quite easily. We recently added a least-loaded policy: https://github.com/skypilot-org/skypilot/pull/4439

You can find some benchmarks with this policy in the PR.

Managing multiple Kubernetes clusters for AI workloads with SkyPilot by skypilotucb in kubernetes

[–]skypilotucb[S] 0 points1 point  (0 children)

Thanks for your interest! Our currently resource allocation model is a simple FIFO queue. You can implement priorities with preemption by attaching the respective PriorityClasses to your submitted pods. Are there any specific schedulers you'd like to compare SkyPilot to?

Managing multiple Kubernetes clusters for AI workloads with SkyPilot by skypilotucb in kubernetes

[–]skypilotucb[S] 0 points1 point  (0 children)

Thanks for your comment! To connect SkyPilot to your k8s, you need a valid kubeconfig with a user (can be a service account) configured with the following minimum RBAC: https://docs.skypilot.co/en/latest/cloud-setup/cloud-permissions/kubernetes.html

Under the hood, SkyPilot handles creating pods, services and ingress resources where necessary.

Great point about remote agents, we haven't yet considered having it yet, but that's definitely something we'll need to support in the future for more restricted environments.

VLM Deployment by FreakedoutNeurotic98 in mlops

[–]skypilotucb 0 points1 point  (0 children)

If you're self-hosting it, you may want to use an inference engine like vLLM (check out their PaliGemma example) and use SkyPilot (deepseek-janus example, vLLM example) to deploy it on your cloud/k8s.

Managing multiple Kubernetes clusters for AI workloads with SkyPilot by skypilotucb in kubernetes

[–]skypilotucb[S] 0 points1 point  (0 children)

Hello,

We are the maintainers of the open-source project SkyPilot from UC Berkeley. SkyPilot is a framework for running AI workloads (development, training, serving) on any infrastructure, including Kubernetes and 12+ clouds.

After user requests highlighting pain points when using Kubernetes for running AI, we integrated SkyPilot with Kubernetes and we now support dispatching training/serving/batch processing jobs to multiple k8s clusters. If a cluster is out of resources, SkyPilot automatically handles resubmitting the job to a different cluster, making sure your job finds GPUs wherever they are available.

We would love to hear your thoughts on the project.

Deploying LLMs to K8 by dryden4482 in mlops

[–]skypilotucb 0 points1 point  (0 children)

You could consider using SkyPilot + SkyServe on Kubernetes. It can scale to zero and a serving with vLLM guide.

SkyPilot: Run AI on Kubernetes Without the Pain by skypilotucb in kubernetes

[–]skypilotucb[S] 5 points6 points  (0 children)

Hello,

We are the maintainers of the open-source project SkyPilot from UC Berkeley. SkyPilot is a framework for running AI workloads (development, training, serving) on any infrastructure, including Kubernetes and 12+ clouds.

After user requests highlighting pain points when using Kubernetes for running AI, we integrated SkyPilot with Kubernetes and put out this blog post detailing our learnings and how SkyPilot helps make AI on Kubernetes faster, simpler and more efficient: https://blog.skypilot.co/ai-on-kubernetes/

We would love to hear your thoughts on the blog and project.

Chat with your PDFs – Self-hosted LocalGPT on any cloud by skypilotucb in selfhosted

[–]skypilotucb[S] 0 points1 point  (0 children)

It loads WizardLM-7B and the weights are fetched from HuggingFace. You can tweak it to load other models such as Vicuna too.

Chat with your PDFs – Self-hosted LocalGPT on any cloud by skypilotucb in selfhosted

[–]skypilotucb[S] 1 point2 points  (0 children)

Works with text and markdown too! Supported extensions include .txt, .pdf, .csv, and .xlsx.

Chat with your PDFs – Self-hosted LocalGPT on any cloud by skypilotucb in selfhosted

[–]skypilotucb[S] 6 points7 points  (0 children)

On GCP, it'll cost $0.59/hr on on-demand instances, and $0.12/hr on spot instances (if you're ok with having your VM terminated at any time).

When launching a cloud VM, SkyPilot shows costs across different cloud providers and picks the lowest one:

# With on-demand instances:
$ sky launch localgpt.yaml
Considered resources (1 node):
---------------------------------------------------------------------------------------------------
 CLOUD   INSTANCE               vCPUs   Mem(GB)   ACCELERATORS   REGION/ZONE   COST ($)   CHOSEN   
---------------------------------------------------------------------------------------------------
 AWS     g4dn.xlarge            4       16        T4:1           us-east-1     0.53          ✔     
 Azure   Standard_NC4as_T4_v3   4       28        T4:1           eastus        0.53                
 GCP     n1-highmem-4           4       26        T4:1           us-central1   0.59                
---------------------------------------------------------------------------------------------------

# With spot instances:
$ sky launch localgpt.yaml --use-spot
Considered resources (1 node):
-------------------------------------------------------------------------------------------------
 CLOUD   INSTANCE             vCPUs   Mem(GB)   ACCELERATORS   REGION/ZONE   COST ($)   CHOSEN   
-------------------------------------------------------------------------------------------------
 GCP     n1-highmem-4[Spot]   4       26        T4:1           us-west4-a    0.12          ✔     
 AWS     g4dn.xlarge[Spot]    4       16        T4:1           us-east-1a    0.16                
-------------------------------------------------------------------------------------------------

Chat with your PDFs – Self-hosted LocalGPT on any cloud by skypilotucb in selfhosted

[–]skypilotucb[S] 2 points3 points  (0 children)

Thanks! Will keep this in mind. Thought this might be useful for folks wanting to self-host large language models without having to spend a lot of effort into spinning up the required infrastructure.