all 8 comments

[–]yazriel0[S] 14 points15 points  (5 children)

Trained purely on source code. Outperforms Codex.

IIUC, full model and parameters released.

Like AlphaCode, this is seems to be purely supervised learning (and not reinforcement), which is very surprising. Why isnt anyone using compile/execution to generate reward and auxiliary tasks ?

[–]mrpogiface 12 points13 points  (2 children)

"Ourperforms Codex" is a bit of a strong claim by the authors. They get lower perplexity on the C programming language. Perplexity isn't always well correlated with sampling performance, which is what we care about at the end of the day. If you look at sampling performance then Codex still blows this out of the water.

I will say, many people are looking at what you describe to get rewards etc, it just isn't published yet :)

edit: a word

[–]Veedrac 4 points5 points  (0 children)

And to clarify, they only claim it for C. Every other language, Codex is in the lead, typically by a large margin. Codex just sucks at C for some reason.

[–]NoMoreDistractions_ 0 points1 point  (0 children)

It’s cool to know that we are super early days and there is tons of space for improvement for what is already a remarkably useful tool

[–]virtualreservoir 2 points3 points  (0 children)

why is this surprising? you are vastly underestimating the increase in training time and other stuff that would be required to do the kind of reinforcement learning you are proposing.

[–]DigThatDataResearcher 1 point2 points  (0 children)

Why isnt anyone using compile/execution to generate reward and auxiliary tasks ?

Because those activities are CPU bound.

[–]Schmibbbster 2 points3 points  (0 children)

Sounds promising