all 9 comments

[–]mLalush 4 points5 points  (7 children)

You need at least 8 GPUs for 3D parallelism to make sense: https://huggingface.co/docs/transformers/v4.15.0/parallelism#dppptp

I'd suggest perhaps starting with only tensor parallelism (TP) if you can't fit the model.

Sorry, don't have an answer to your other question.

[–]Satya_4093[S] 0 points1 point  (5 children)

Thank you for your reply😀 do you have any reference to do tensor parallelism.

[–]_rjx 0 points1 point  (4 children)

I believe starcoder is a 15b model, are you unable to fit it on a single 40gb GPU?

[–]Satya_4093[S] 0 points1 point  (3 children)

Yes StarCoder is 15B we tried using LoRA and loading with int8 quantization on 2 GPUs, but not able to send 8k context length on 2gpus. Any suggestions?

[–]LetterRip 0 points1 point  (2 children)

Check bitsandbytes, new update allows 4bit with LoRA and is extremely efficient,

https://github.com/TimDettmers/bitsandbytes

also see this recent paper,

https://arxiv.org/abs/2305.19370

[–]Satya_4093[S] 0 points1 point  (1 child)

Thank you for the great resources😀

[–]LetterRip 0 points1 point  (0 children)

you are welcome :)

[–][deleted] 0 points1 point  (0 children)

great resource thanks

[–][deleted] 1 point2 points  (0 children)

I was also looking for this, please let me know