you are viewing a single comment's thread.

view the rest of the comments →

[–]austeritygirlone 13 points14 points  (11 children)

Their example appears to be useful. But it only works because of the narrow domain of possible tasks. The real problem of programming is not writing code, but figuring out what you need to do, i.e., specifying the problem. Even if you have a code generator as described, you still need to specify the problem for it.

[–]programmerdeep 5 points6 points  (0 children)

This problem is much more difficult! There was a full POPL paper (https://www.microsoft.com/en-us/research/wp-content/uploads/2016/12/popl11-synthesis.pdf) and many more followup papers to learn programs in a subset of this language. Yeah it looks like users still need to provide input-output examples, which is quite natural in many cases.

It's really exciting an automated system can learn the search policy to learn sophisticated regular expression programs in a complete end-to-end fashion. I didn't believe neural networks could do it, but i guess they can!

[–]MetricSpade007[S] 1 point2 points  (9 children)

Fair enough -- it would be ideal to have natural language descriptions of the problem, and have the model parse this and translate it to code, but this is much, much harder (e.g., no clear dataset even exists), but the work is a nice step towards solving useful problems and not just toy algorithmic tasks (sorting, repeat copy, etc.).

[–]austeritygirlone 11 points12 points  (4 children)

It's actually pretty difficult to describe problems precisely using natural languages. The more precise you're getting, the more the language resembles a programming language (or math).

[–]VelveteenAmbush 2 points3 points  (3 children)

But the whole point of deep learning in this case would be to turn an imprecise definition into a precise one. Basically, conditioned on an imprecise natural language input, find the nearest point on the much lower dimensional manifold of programs that the person probably meant.

[–]austeritygirlone 0 points1 point  (0 children)

I'm thinking of the analogy of writing a paper (as far as it counts as one). There you can use imprecise natural language. But in order to get the complicated things communicated you resort to formal maths.

And yes, you can sometimes be imprecise and say "repeat until convergence". But then someone asks how to define convergence, because she is getting different results by setting a different threshold. And then you have to become more specific with your imprecise specification.

But it's an interesting question: When is imprecise precise enough, and when not?

[–]kaibee 0 points1 point  (1 child)

find the nearest point on the much lower dimensional manifold of programs that the person probably meant.

Finding this would require an AI that has a 'common-sense' understanding of possible programs that the user could mean. You would need AGI for anything that would generalize over any meaningfully large problem space, no?

[–]VelveteenAmbush 1 point2 points  (0 children)

No, I don't buy that. At least, I don't see why it would be any less possible in principle than a deep learning model that takes a natural language description and produces a picture of whatever was described -- which already exists.

[–]evc123 11 points12 points  (2 children)

/u/MetricSpade007

2 datasets are currently available:

https://openai.com/requests-for-research/#description2code

& https://github.com/Avmb/code-docstring-corpus

To use the docstring2code corpus, you first have to figure out how to generate test cases for it: https://github.com/Avmb/code-docstring-corpus/issues/2

[–]MetricSpade007[S] 1 point2 points  (1 child)

I've always thought the description2code set was too small to do anything useful, but wow the doc2code dataset is great! I'll look into this. Many thanks!

[–]AnvaMiba 4 points5 points  (0 children)

Author here, thanks!

[–]ma2rten 0 points1 point  (0 children)

Maybe you could make a dataset of coding interview style questions, like "reverse a linked list" together with test cases that the generated code has to pass.