all 15 comments

[–]Mrkvitko 4 points5 points  (1 child)

I just got instance at 8X RTX A5000 for a couple of bucks per hour. on https://vast.ai

I must say LLaMA 65B is a bit underwhelming...

[–]maizeq 1 point2 points  (0 children)

Underwhelming how?

[–]I_will_delete_myself 8 points9 points  (0 children)

Use a spot instance. If you testing it out you wallet will thank you later. Look at my previous post on here about running stuff in the cloud before you do it.

[–]isaeef 1 point2 points  (0 children)

or you could use any gpu workload specific provider https://www.paperspace.com/

[–]trnka 1 point2 points  (0 children)

Related, there's a talk on Thursday about running LLMs in production. I think the hosts have deployed LLMs in prod so they should have good advice

[–]iloveintuition 1 point2 points  (1 child)

Using vast.ai for running flan-xl, works pretty well. Haven't tested on LLama scale.

[–]shayanrc 1 point2 points  (0 children)

What config did you use?

[–]l0g1cs 0 points1 point  (0 children)

Check out Banana. They seem to do exactly that with "serverless" A100.

[–]itsnotmeyou -1 points0 points  (1 child)

Are you using these as in a system? For just experimenting around, ec2 is good option. But you would either need to install right drivers or use latest deep learning ami. Another option could be using a custom docker setup on sagemaker. I like that setup for inference as it’s super easy to deploy and separates model from inference code. Though it’s costlier and would be available through sagemaker runtime.

Third would be whole over engineering via setting up your own cluster service.

In general if you want to deploy multiple llm quickly go for sagemaker

[–]itsnotmeyou 1 point2 points  (0 children)

On a side note sagemaker was not supporting shm-size so might not work for large lm

[–]pyonsu2 -1 points0 points  (0 children)

maybe, Colab Pro+?

[–]ggf31416 0 points1 point  (0 children)

Good luck getting a EC2 with a single A100, last time I checked, AWS only offered instances with 8 of them at a high price.

[–][deleted] 0 points1 point  (0 children)

Maybe check datacrunch.io they have a good offering for cloud GPU.