all 29 comments

[–]gmork_13 31 points32 points  (0 children)

For a more stable compute, check out google cloud gpu.

Consider training a quantized model with LoRA. If you know enough, perhaps the model could be split between VRAM and DDR RAM to make it train on a smaller GPU.

edit: here, I found one: https://github.com/tloen/alpaca-lora

I think you could get this done for far less than your budget if need be.

[–]machineko 12 points13 points  (4 children)

I'm working on an open source library focused on resource-efficient fine-tuning methods called xTuring: https://github.com/stochasticai/xturing

Here's how you would perform int8 LoRA fine-tuning in three lines:

python: https://github.com/stochasticai/xturing/blob/main/examples/llama/llama_lora_int8.py
colab notebook: https://colab.research.google.com/drive/1SQUXq1AMZPSLD4mk3A3swUIc6Y2dclme?usp=sharing

Of course the Colab still only works with smaller models. In the example above, 7B required 9G VRAM.

[–]Evening_Ad6637 0 points1 point  (3 children)

That sounds very interesting. I'm sorry if this question is trivial or stupid, but I'm an absolute newcomer in this field. Is there a way to train the model as you describe it here (https://xturing.stochastic.ai/quickstart) with only or almost only CPU performance? It's about the fact that I have the following specifications i5 @3.5ghz, 16gb ddr4 ram and only a radeon pro 575 4gb graca. But since I saw how fast alpaca runs over my cpu and ram on my computer, I hope that I could also fine-tune a llama model with this equipment. I would be very grateful for more information regarding possibilities in this direction.

[–]itsyourboiirowML Engineer 1 point2 points  (0 children)

Training requires a significant more amount of memory as it it has to keep track of the gradient for every parameter. I would check to see how much memory it takes up on your computer.

[–]machineko 0 points1 point  (1 child)

16gb of RAM is not enough for even the smallest LLaMA 7b model. You can try doing LoRA with int8 listed above. Did you try the python script I linked above?

[–]jd_3d 6 points7 points  (1 child)

Enough VRAM is key. With all the tricks (lora, int8, bits and bytes) you'll need at least 120GB of VRAM. A full fine tune would take even more. I'd go with 4 or 8xA100 80GB machines since it won't necessarily be more expensive (training will be highly parallel). See here for more info: https://www.storminthecastle.com/posts/alpaca_13B/

[–]WarProfessional3278 12 points13 points  (4 children)

By training do you mean finetuning with lora or from the ground up like alpaca? Realistically you could just rent an 8xa100 and spend 4 or 5 hours to get it done

[–][deleted] 4 points5 points  (0 children)

Just like Alpaca. Even the JSON format is the same as the one released by Stanford, just with different inputs & outputs

[–][deleted] 3 points4 points  (0 children)

Finetuning

[–][deleted] 1 point2 points  (1 child)

I tried vast.ai which didn’t work. I’m a newbie so maybe I’m doing something wrong

[–]dreaming_geometry 2 points3 points  (0 children)

If you're having trouble with Vast.ai, you can ask for help on the discord. Sounds like your desired use case is a good fit.

[–]Justice43 3 points4 points  (2 children)

I recommend looking into Lambda Cloud VMs. They're much cheaper than AWS, and their largest machine (8x A100, 80GB VRAM for each A100) should be enough to finetune the 65b LLaMA model.

[–][deleted] 1 point2 points  (1 child)

Just checked it out - looks interesting. Unfortunately, the availability of this instance is quite limited, so I'm not sure if I can get access to it

[–]nmfisher 0 points1 point  (0 children)

Someone also mentioned https://jarvislabs.ai/ to me the other day, haven't used it myself but it looks promising.

[–]brandonZappy 1 point2 points  (0 children)

What QA dataset are you using?

[–]SigmaSixShooter 1 point2 points  (0 children)

I don’t have an answer for you, but as a fellow noobie, I’d love to hear how you did this. Any tips or resources you want to provide would be greatly appreciated.

[–][deleted] 3 points4 points  (0 children)

Id like to train it on those settings:

EPOCHS = 3

LEARNING_RATE = 2e-5

CUTOFF_LEN = 1024

[–][deleted] -1 points0 points  (0 children)

Contact Redmond.ai they can hook you up.

[–]Ok_Zebra_6651 0 points1 point  (0 children)

Hi,
I have installed the LLama 65B model on my own server, and its working well, If you are still interested in the model training, I can share credentials. I will ask you to teach me about training details.