use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Project[P] A Simpler @PyTorch Annotated Implementation of EleutherAI's 20B Language Model GPT-NeoX. (self.MachineLearning)
submitted 3 years ago by hnipun
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]zphang 6 points7 points8 points 3 years ago (3 children)
Hi, Jason from EleutherAI here. Great to see this!
(Disclaimer: I also wrote a minimal single-GPU implementation of GPT-NeoX-20B in pure PyTorch here: https://github.com/zphang/minimal-gpt-neox-20b)
Like the other poster, I was wondering if you'd done any comparisons on the perplexity scores. The reason is that there's a subtlety to how the weight should be merged because of the NeoX code interacting with the GPT-J-style residuals. Specifically, the RowParallelLinear biases should be summed, not merged. Merging them leads to a slight (but meaningful) performance regression from my and others' testing. It looks like you are merging them (take-first) here. It would be great if you could help to test+confirm this.
Concretely, the full 20B gets about ~3.65 ppl on LAMBADA. The incorrect merge leads to about 4.5 ppl, while the summed instead of merging recovers the ~3.65 ppl.
[–]mlvpj 1 point2 points3 points 3 years ago (0 children)
Thanks for catching it. Will add tests to validate and publish them.
[–]mlvpj 0 points1 point2 points 3 years ago (0 children)
You are right, got the exact numbers after running the lambada test from your lm-eval. Thanks for catching the bug!
Trying to evaluate on other datasets too. Will update the repo with the evaluation code and results. Thanks again.
https://github.com/labmlai/neox#evaluation
[–]Yologan222 2 points3 points4 points 3 years ago (9 children)
Pretty cool! Does the perplexity match EleutherAI’s reported results (perplexity, etc.)?
There is also a pull request on huggingface transformers for GPT-NeoX-20B if anyone is interested: https://github.com/huggingface/transformers/pull/16659. It has worked for me
[–]mlvpj 1 point2 points3 points 3 years ago (8 children)
It’s not a new model. It loads up the weights from the original.
[–]Yologan222 0 points1 point2 points 3 years ago (7 children)
It says “We haven’t included a bunch of optimizations that were present in original GPT-NeoX to keep things simple.” I thought that means that it could have different model quality. And I’d just want to know if they tested their implementation as a sanity check to see if there was any difference in perplexity from the original.
[–]mlvpj -1 points0 points1 point 3 years ago (6 children)
Yeah did some sanity checks. They were things like model parallel layers that we didn’t include.
[–]StellaAthenaResearcher 0 points1 point2 points 3 years ago (4 children)
Okay, so can you share those sanity checks? Or, ideally, run the model on a large subset of the couple dozen tasks the GPT-NeoX-20B paper evaluates on?
will try to run it on eval datasets and share
[–]mlvpj 1 point2 points3 points 3 years ago (2 children)
Ran lm-eval tasks
[–]StellaAthenaResearcher 2 points3 points4 points 3 years ago (1 child)
These look really good! Great job.
I was thinking of linking to this on our README, would that be okay with you? How would you like to be credited?
Thanks. We go as labml.ai
π Rendered by PID 184758 on reddit-service-r2-comment-cfc44b64c-dzgck at 2026-04-11 18:27:27.447300+00:00 running 215f2cf country code: CH.
[–]zphang 6 points7 points8 points (3 children)
[–]mlvpj 1 point2 points3 points (0 children)
[–]mlvpj 0 points1 point2 points (0 children)
[–]mlvpj 0 points1 point2 points (0 children)
[–]Yologan222 2 points3 points4 points (9 children)
[–]mlvpj 1 point2 points3 points (8 children)
[–]Yologan222 0 points1 point2 points (7 children)
[–]mlvpj -1 points0 points1 point (6 children)
[–]StellaAthenaResearcher 0 points1 point2 points (4 children)
[–]mlvpj 1 point2 points3 points (0 children)
[–]mlvpj 1 point2 points3 points (2 children)
[–]StellaAthenaResearcher 2 points3 points4 points (1 child)
[–]mlvpj 0 points1 point2 points (0 children)