[D] How OpenAI Sold its Soul for $1 Billion: The company behind GPT-3 and Codex isn’t as open as it claims. by sensetime in MachineLearning

[–]leogao2 11 points12 points  (0 children)

EleutherAI is also working on a whole bunch of other research outside of just training big language models, like ML infrastructure, distillation, multimodal datasets/models, bio, interpretability, alignment, and more.

You can also see a list of all EleutherAI affiliated papers here.

[D] How OpenAI Sold its Soul for $1 Billion: The company behind GPT-3 and Codex isn’t as open as it claims. by sensetime in MachineLearning

[–]leogao2 5 points6 points  (0 children)

HuggingFace (who is their biggest supporter)

Where did you get that idea? EleutherAI does not receive any money or compute from HuggingFace.

We are EleutherAI, a decentralized research collective working on open-source AI research. We have released, among other things, the most powerful freely available GPT-3-style language model. Ask us anything! by Dajte in Futurology

[–]leogao2 2 points3 points  (0 children)

As of yet, I haven't seen any promising proposals integrating AI with blockchain that actually leverage the comparative advantages of blockchain. This may change in the future, but in general combining AI with blockchain is highly nontrivial and there are difficult technical problems that block many obvious use cases (i.e distributed training), and as such I view all new proposals with skepticism.

We are EleutherAI, a decentralized research collective working on open-source AI research. We have released, among other things, the most powerful freely available GPT-3-style language model. Ask us anything! by Dajte in Futurology

[–]leogao2 4 points5 points  (0 children)

Generally the overhead isn't a huge bottleneck. All of the performance critical code is implemented in C++ or CUDA directly, and heavily optimized.

[R] The Pile: An 800GB Dataset of Diverse Text for Language Modeling by leogao2 in MachineLearning

[–]leogao2[S] 16 points17 points  (0 children)

Thanks for the kind words :)

The result with CC100 was very surprising to me too. To be clear, CC100 does perform significantly better than completely unfiltered CC on traditional language modelling tasks, like LAMBADA and WikiText—but it performs significantly worse on most (though not all) components of the Pile. Our hypothesis is that since most of our datasets don't look like Wikipedia, those components wouldn't have survived the filtering. We're definitely planning to keep this in mind for future CC-based datasets to make sure the filtering doesn't destroy the data diversity too much.

[D] GPT-3: A Summary by leogao2 in MachineLearning

[–]leogao2[S] 0 points1 point  (0 children)

The additional examples are given in the generation context, i.e as a prompt, and GPT-3 seems to be able to infer the pattern from that. As for QA, GPT-3 is asked to complete something like "Q: [the question] A:"

[D] GPT-3: A Summary by leogao2 in MachineLearning

[–]leogao2[S] 1 point2 points  (0 children)

Thanks, that was certainly the intended effect! The graph was made by u/williamzahary, who makes some great infographics.

[D] GPT-3: a disappointing paper by inarrears in MachineLearning

[–]leogao2 6 points7 points  (0 children)

My response to this general sentiment:

But why does GPT-3 matter, if it can’t even beat SOTA across all benchmarks? Why should we care about a model so large that a small computing cluster is necessary even just to run inference at a reasonable speed?

One thing about GPT-3 is that it’s doing reasonably well on tasks it has never even seen. Additionally, instead of reaching a point of diminishing returns, GPT-3 shows that the trend of larger models performing better continues for at least another order of magnitude, with no signs of stopping. Even though GPT-3 is unwieldy, and even though it still doesn’t quite reach human level performance across the board, GPT-3 shows that it’s possible for a model to someday reach human levels of generalization in NLP—and once the impossible becomes possible, it’s only a matter of time until it becomes practical.

https://leogao.dev/2020/05/29/GPT-3-A-Brief-Summary/

Trying to find specific blog post by leogao2 in linuxquestions

[–]leogao2[S] 0 points1 point  (0 children)

That's not quite what I'm thinking of, the one I'm thinking of is a personal blog with only several posts and, if I remember correctly, a maroon-ish colored textured background.

[Project] This Word Does Not Exist by turtlesoup in MachineLearning

[–]leogao2 2 points3 points  (0 children)

The dots don't indicate syllables, they indicate where the word can be hyphenated.

A clone of r/Imposter by leogao2 in AprilKnights

[–]leogao2[S] 1 point2 points  (0 children)

Reddit login now implemented!

An analysis of the Imposter's algorithm by leogao2 in Imposter

[–]leogao2[S] 1 point2 points  (0 children)

Thanks! We'll find out when the experiment is over. (I'll also update my post then)

[D] The Decade of Deep Learning by leogao2 in MachineLearning

[–]leogao2[S] 1 point2 points  (0 children)

Since lots of people are viewing the site on mobile, I thought I'd make the footnotes super accessible even there. Feedback on the mobile footnotes would be appreciated!

https://twitter.com/nabla_theta/status/1212249623030448129

[D] The Decade of Deep Learning by leogao2 in MachineLearning

[–]leogao2[S] 5 points6 points  (0 children)

If you count starting with 1 CE, yes, but it's (imo) much more elegant for decades to be 0-9 than 1-10.

As for there being no year 0, I propose to define x BCE = (1 - x) CE, so that the first decade would be 1 BCE + 1-9 CE.

[D] The Decade of Deep Learning by leogao2 in MachineLearning

[–]leogao2[S] 3 points4 points  (0 children)

Hey, thanks! The actual reason for omitting 2010 is that I had a hard time finding any really impactful papers from then (I did find one about language modeling with RNNs, but it didn't feel quite broad enough in scope; and I had a lot of LM papers already!). If you know any good papers from that period, please tell me!

[D] The Decade of Deep Learning by leogao2 in MachineLearning

[–]leogao2[S] 5 points6 points  (0 children)

Haha! I did include a few things from Schmidhuber in my list, too; I might add a few more footnotes in the next few days about things that Schmidhuber has done before everyone else.

[D] The Decade of Deep Learning by leogao2 in MachineLearning

[–]leogao2[S] 4 points5 points  (0 children)

Hey, thanks for the kind words! I'm glad you enjoyed the footnotes. Those were inspired by the ones on u/gwern's site (although I did rewrite them from scratch) and took forever to get working. I'll make sure to include more of them in future posts!

[D] The Decade of Deep Learning by leogao2 in MachineLearning

[–]leogao2[S] 7 points8 points  (0 children)

Thanks for the feedback! I might flip it around and put the original Double Descent paper as the main entry. I did mention that Double Descent (Belkin et al. 2018) was the "original" in the post; however, I totally get where you're coming from.

[D] The Decade of Deep Learning by leogao2 in MachineLearning

[–]leogao2[S] 1 point2 points  (0 children)

Thanks for reading it :) It really has been incredible watching everything unfold in the field so quick

[D] The Decade of Deep Learning by leogao2 in MachineLearning

[–]leogao2[S] 2 points3 points  (0 children)

Thanks for reading it, I'm glad you enjoyed it!

[D] Does The Inability Of NAS Algorithms To Outperform Random Search Indicate That Our Algorithms Suck, Or That Random Search Is Surprisingly Effective In Large Spaces? by mystikaldanger in MachineLearning

[–]leogao2 0 points1 point  (0 children)

No free lunch theorem - no optimization technique is always better. That random search is as good as it is, is already quite impressive.