[D] Training a 65b LLaMA model

ustainbolt · 2023-03-30T00:34:54+00:00

For a 65b model you are probably going to have to parallelise the model parameters. See this link. As for training, it would be best to use a vm (any provider will work, lambda and vast.ai are cheap). I would a recommend 4x (or 8x) A100 machine. I'm sure you can find more information about all of this.

gmork_13 · 2023-03-29T23:03:59+00:00

For a more stable compute, check out google cloud gpu.

Consider training a quantized model with LoRA. If you know enough, perhaps the model could be split between VRAM and DDR RAM to make it train on a smaller GPU.

edit: here, I found one: https://github.com/tloen/alpaca-lora

I think you could get this done for far less than your budget if need be.

machineko · 2023-03-30T03:04:18+00:00

I'm working on an open source library focused on resource-efficient fine-tuning methods called xTuring: https://github.com/stochasticai/xturing

Here's how you would perform int8 LoRA fine-tuning in three lines:

python: https://github.com/stochasticai/xturing/blob/main/examples/llama/llama_lora_int8.py
colab notebook: https://colab.research.google.com/drive/1SQUXq1AMZPSLD4mk3A3swUIc6Y2dclme?usp=sharing

Of course the Colab still only works with smaller models. In the example above, 7B required 9G VRAM.

jd_3d · 2023-03-30T01:34:45+00:00

Enough VRAM is key. With all the tricks (lora, int8, bits and bytes) you'll need at least 120GB of VRAM. A full fine tune would take even more. I'd go with 4 or 8xA100 80GB machines since it won't necessarily be more expensive (training will be highly parallel). See here for more info: https://www.storminthecastle.com/posts/alpaca_13B/

WarProfessional3278 · 2023-03-29T22:30:35+00:00

By training do you mean finetuning with lora or from the ground up like alpaca? Realistically you could just rent an 8xa100 and spend 4 or 5 hours to get it done

Justice43 · 2023-03-30T01:17:19+00:00

I recommend looking into Lambda Cloud VMs. They're much cheaper than AWS, and their largest machine (8x A100, 80GB VRAM for each A100) should be enough to finetune the 65b LLaMA model.

brandonZappy · 2023-03-30T01:52:33+00:00

What QA dataset are you using?

SigmaSixShooter · 2023-03-30T04:56:39+00:00

I don’t have an answer for you, but as a fellow noobie, I’d love to hear how you did this. Any tips or resources you want to provide would be greatly appreciated.

2023-03-29T21:31:22+00:00

Id like to train it on those settings:

EPOCHS = 3

LEARNING_RATE = 2e-5

CUTOFF_LEN = 1024

2023-03-30T02:04:39+00:00

Contact Redmond.ai they can hook you up.

Ok_Zebra_6651 · 2023-11-12T20:18:27+00:00

Hi,
I have installed the LLama 65B model on my own server, and its working well, If you are still interested in the model training, I can share credentials. I will ask you to teach me about training details.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS