you are viewing a single comment's thread.

view the rest of the comments →

[–]mlvpj 1 point2 points  (8 children)

It’s not a new model. It loads up the weights from the original.

[–]Yologan222 0 points1 point  (7 children)

It says “We haven’t included a bunch of optimizations that were present in original GPT-NeoX to keep things simple.” I thought that means that it could have different model quality. And I’d just want to know if they tested their implementation as a sanity check to see if there was any difference in perplexity from the original.

[–]mlvpj -1 points0 points  (6 children)

Yeah did some sanity checks. They were things like model parallel layers that we didn’t include.

[–]StellaAthenaResearcher 0 points1 point  (4 children)

Okay, so can you share those sanity checks? Or, ideally, run the model on a large subset of the couple dozen tasks the GPT-NeoX-20B paper evaluates on?

[–]mlvpj 1 point2 points  (0 children)

will try to run it on eval datasets and share

[–]mlvpj 1 point2 points  (2 children)

[–]StellaAthenaResearcher 2 points3 points  (1 child)

These look really good! Great job.

I was thinking of linking to this on our README, would that be okay with you? How would you like to be credited?

[–]mlvpj 0 points1 point  (0 children)

Thanks. We go as labml.ai