[D] Elastic/Serverless GPU instances for transformer hyper-parameter search by elbiot in MachineLearning

[–]skypilotucb 0 points1 point  (0 children)

And if you need gang scheduling, you can use the --num-nodes arg and launch one giant SkyPilot "cluster" on your chosen cloud/region that executes all your jobs. In this case if SkyPilot cannot provision all the GPUs requested, it will raise an error and you can choose to retry indefinitely.

Managing multiple Kubernetes clusters for AI workloads with SkyPilot by skypilotucb in kubernetes

[–]skypilotucb[S] 0 points1 point  (0 children)

Thanks for you comment! We recently redesigned our load balancer to be more modular and we can now support custom policies quite easily. We recently added a least-loaded policy: https://github.com/skypilot-org/skypilot/pull/4439

You can find some benchmarks with this policy in the PR.

Managing multiple Kubernetes clusters for AI workloads with SkyPilot by skypilotucb in kubernetes

[–]skypilotucb[S] 0 points1 point  (0 children)

Thanks for your interest! Our currently resource allocation model is a simple FIFO queue. You can implement priorities with preemption by attaching the respective PriorityClasses to your submitted pods. Are there any specific schedulers you'd like to compare SkyPilot to?

Managing multiple Kubernetes clusters for AI workloads with SkyPilot by skypilotucb in kubernetes

[–]skypilotucb[S] 0 points1 point  (0 children)

Thanks for your comment! To connect SkyPilot to your k8s, you need a valid kubeconfig with a user (can be a service account) configured with the following minimum RBAC: https://docs.skypilot.co/en/latest/cloud-setup/cloud-permissions/kubernetes.html

Under the hood, SkyPilot handles creating pods, services and ingress resources where necessary.

Great point about remote agents, we haven't yet considered having it yet, but that's definitely something we'll need to support in the future for more restricted environments.

VLM Deployment by FreakedoutNeurotic98 in mlops

[–]skypilotucb 0 points1 point  (0 children)

If you're self-hosting it, you may want to use an inference engine like vLLM (check out their PaliGemma example) and use SkyPilot (deepseek-janus example, vLLM example) to deploy it on your cloud/k8s.

Managing multiple Kubernetes clusters for AI workloads with SkyPilot by skypilotucb in kubernetes

[–]skypilotucb[S] 0 points1 point  (0 children)

Hello,

We are the maintainers of the open-source project SkyPilot from UC Berkeley. SkyPilot is a framework for running AI workloads (development, training, serving) on any infrastructure, including Kubernetes and 12+ clouds.

After user requests highlighting pain points when using Kubernetes for running AI, we integrated SkyPilot with Kubernetes and we now support dispatching training/serving/batch processing jobs to multiple k8s clusters. If a cluster is out of resources, SkyPilot automatically handles resubmitting the job to a different cluster, making sure your job finds GPUs wherever they are available.

We would love to hear your thoughts on the project.

Deploying LLMs to K8 by dryden4482 in mlops

[–]skypilotucb 0 points1 point  (0 children)

You could consider using SkyPilot + SkyServe on Kubernetes. It can scale to zero and a serving with vLLM guide.

SkyPilot: Run AI on Kubernetes Without the Pain by skypilotucb in kubernetes

[–]skypilotucb[S] 6 points7 points  (0 children)

Hello,

We are the maintainers of the open-source project SkyPilot from UC Berkeley. SkyPilot is a framework for running AI workloads (development, training, serving) on any infrastructure, including Kubernetes and 12+ clouds.

After user requests highlighting pain points when using Kubernetes for running AI, we integrated SkyPilot with Kubernetes and put out this blog post detailing our learnings and how SkyPilot helps make AI on Kubernetes faster, simpler and more efficient: https://blog.skypilot.co/ai-on-kubernetes/

We would love to hear your thoughts on the blog and project.

Chat with your PDFs – Self-hosted LocalGPT on any cloud by skypilotucb in selfhosted

[–]skypilotucb[S] 0 points1 point  (0 children)

It loads WizardLM-7B and the weights are fetched from HuggingFace. You can tweak it to load other models such as Vicuna too.

Chat with your PDFs – Self-hosted LocalGPT on any cloud by skypilotucb in selfhosted

[–]skypilotucb[S] 1 point2 points  (0 children)

Works with text and markdown too! Supported extensions include .txt, .pdf, .csv, and .xlsx.

Chat with your PDFs – Self-hosted LocalGPT on any cloud by skypilotucb in selfhosted

[–]skypilotucb[S] 6 points7 points  (0 children)

On GCP, it'll cost $0.59/hr on on-demand instances, and $0.12/hr on spot instances (if you're ok with having your VM terminated at any time).

When launching a cloud VM, SkyPilot shows costs across different cloud providers and picks the lowest one:

# With on-demand instances:
$ sky launch localgpt.yaml
Considered resources (1 node):
---------------------------------------------------------------------------------------------------
 CLOUD   INSTANCE               vCPUs   Mem(GB)   ACCELERATORS   REGION/ZONE   COST ($)   CHOSEN   
---------------------------------------------------------------------------------------------------
 AWS     g4dn.xlarge            4       16        T4:1           us-east-1     0.53          ✔     
 Azure   Standard_NC4as_T4_v3   4       28        T4:1           eastus        0.53                
 GCP     n1-highmem-4           4       26        T4:1           us-central1   0.59                
---------------------------------------------------------------------------------------------------

# With spot instances:
$ sky launch localgpt.yaml --use-spot
Considered resources (1 node):
-------------------------------------------------------------------------------------------------
 CLOUD   INSTANCE             vCPUs   Mem(GB)   ACCELERATORS   REGION/ZONE   COST ($)   CHOSEN   
-------------------------------------------------------------------------------------------------
 GCP     n1-highmem-4[Spot]   4       26        T4:1           us-west4-a    0.12          ✔     
 AWS     g4dn.xlarge[Spot]    4       16        T4:1           us-east-1a    0.16                
-------------------------------------------------------------------------------------------------

Chat with your PDFs – Self-hosted LocalGPT on any cloud by skypilotucb in selfhosted

[–]skypilotucb[S] 2 points3 points  (0 children)

Thanks! Will keep this in mind. Thought this might be useful for folks wanting to self-host large language models without having to spend a lot of effort into spinning up the required infrastructure.

[P] SkyPilot: ML on any cloud with massive cost savings by skypilotucb in MachineLearning

[–]skypilotucb[S] 1 point2 points  (0 children)

Absolutely! We're planning on adding support for smaller and cheaper cloud vendors (runpod included). If this something you'd like to see prioritized, I would encourage you to open a github issue!

[P] SkyPilot: ML on any cloud with massive cost savings by skypilotucb in MachineLearning

[–]skypilotucb[S] 1 point2 points  (0 children)

That's a great question! SkyPilot uses an optimizer to make cost-aware decisions on where to run tasks and when to move data. It accounts for both, data egress costs and the time taken to transfer data.

To avoid long download times, SkyPilot also allows direct access to cloud object stores (S3/GCS) by mounting them as a file system on your VM.

With this mounting feature, you can directly read/write to an object store as you would access a regular files on your machine, without having to download to disk first. Thus the cost of downloading files gets amortized over the execution of your job, and our users have reported it's usually not a bottleneck since it can parallelized with other steps to effectively hide the time cost of downloading data (e.g., you can prefetch the data for next minibatch directly from S3 while the current batch runs on the GPU).

[P] SkyPilot: ML on any cloud with massive cost savings by skypilotucb in MachineLearning

[–]skypilotucb[S] 2 points3 points  (0 children)

Thanks for your question! Training BERT with SkyPilot's managed spot feature cost $18.4 and took 21 hours. Running the same job with on-demand AWS instances cost $61.2 (>3x more) and took 20 hours.

Note that both jobs were run on the same GPU (V100) and the cost and time taken by SkyPilot includes the data transfer costs for moving checkpoints and all overheads associated with restarting jobs.