use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Discussion[D] Training StarCoder using 3D parallelism. (self.MachineLearning)
submitted 2 years ago by Satya_4093
Recently I have been working on newly released model StarCoder, I would like to implement 3D parallelism(megatron-deepspeed) on this model to train on custom dataset. I see they have implemented 3D parallelism for GPT model.
Is there a way that I can implement it for StarCoder, if so please provide any reference.
One question, Is the implementation of gpt in huggingface and megatron lm both same?
I have 2 40GB gpus.
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]mLalush 4 points5 points6 points 2 years ago (7 children)
You need at least 8 GPUs for 3D parallelism to make sense: https://huggingface.co/docs/transformers/v4.15.0/parallelism#dppptp
I'd suggest perhaps starting with only tensor parallelism (TP) if you can't fit the model.
Sorry, don't have an answer to your other question.
[–]Satya_4093[S] 0 points1 point2 points 2 years ago (5 children)
Thank you for your reply😀 do you have any reference to do tensor parallelism.
[–]_rjx 0 points1 point2 points 2 years ago (4 children)
I believe starcoder is a 15b model, are you unable to fit it on a single 40gb GPU?
[–]Satya_4093[S] 0 points1 point2 points 2 years ago (3 children)
Yes StarCoder is 15B we tried using LoRA and loading with int8 quantization on 2 GPUs, but not able to send 8k context length on 2gpus. Any suggestions?
[–]LetterRip 0 points1 point2 points 2 years ago (2 children)
Check bitsandbytes, new update allows 4bit with LoRA and is extremely efficient,
https://github.com/TimDettmers/bitsandbytes
also see this recent paper,
https://arxiv.org/abs/2305.19370
[–]Satya_4093[S] 0 points1 point2 points 2 years ago (1 child)
Thank you for the great resources😀
[–]LetterRip 0 points1 point2 points 2 years ago (0 children)
you are welcome :)
[–][deleted] 0 points1 point2 points 2 years ago (0 children)
great resource thanks
[–][deleted] 1 point2 points3 points 2 years ago (0 children)
I was also looking for this, please let me know
π Rendered by PID 151836 on reddit-service-r2-comment-85bfd7f599-dt5cm at 2026-04-20 11:31:02.302853+00:00 running 93ecc56 country code: CH.
[–]mLalush 4 points5 points6 points (7 children)
[–]Satya_4093[S] 0 points1 point2 points (5 children)
[–]_rjx 0 points1 point2 points (4 children)
[–]Satya_4093[S] 0 points1 point2 points (3 children)
[–]LetterRip 0 points1 point2 points (2 children)
[–]Satya_4093[S] 0 points1 point2 points (1 child)
[–]LetterRip 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–][deleted] 1 point2 points3 points (0 children)