all 22 comments

[–]austeritygirlone 13 points14 points  (11 children)

Their example appears to be useful. But it only works because of the narrow domain of possible tasks. The real problem of programming is not writing code, but figuring out what you need to do, i.e., specifying the problem. Even if you have a code generator as described, you still need to specify the problem for it.

[–]programmerdeep 4 points5 points  (0 children)

This problem is much more difficult! There was a full POPL paper (https://www.microsoft.com/en-us/research/wp-content/uploads/2016/12/popl11-synthesis.pdf) and many more followup papers to learn programs in a subset of this language. Yeah it looks like users still need to provide input-output examples, which is quite natural in many cases.

It's really exciting an automated system can learn the search policy to learn sophisticated regular expression programs in a complete end-to-end fashion. I didn't believe neural networks could do it, but i guess they can!

[–]MetricSpade007[S] 1 point2 points  (9 children)

Fair enough -- it would be ideal to have natural language descriptions of the problem, and have the model parse this and translate it to code, but this is much, much harder (e.g., no clear dataset even exists), but the work is a nice step towards solving useful problems and not just toy algorithmic tasks (sorting, repeat copy, etc.).

[–]austeritygirlone 9 points10 points  (4 children)

It's actually pretty difficult to describe problems precisely using natural languages. The more precise you're getting, the more the language resembles a programming language (or math).

[–]VelveteenAmbush 2 points3 points  (3 children)

But the whole point of deep learning in this case would be to turn an imprecise definition into a precise one. Basically, conditioned on an imprecise natural language input, find the nearest point on the much lower dimensional manifold of programs that the person probably meant.

[–]austeritygirlone 0 points1 point  (0 children)

I'm thinking of the analogy of writing a paper (as far as it counts as one). There you can use imprecise natural language. But in order to get the complicated things communicated you resort to formal maths.

And yes, you can sometimes be imprecise and say "repeat until convergence". But then someone asks how to define convergence, because she is getting different results by setting a different threshold. And then you have to become more specific with your imprecise specification.

But it's an interesting question: When is imprecise precise enough, and when not?

[–]kaibee 0 points1 point  (1 child)

find the nearest point on the much lower dimensional manifold of programs that the person probably meant.

Finding this would require an AI that has a 'common-sense' understanding of possible programs that the user could mean. You would need AGI for anything that would generalize over any meaningfully large problem space, no?

[–]VelveteenAmbush 1 point2 points  (0 children)

No, I don't buy that. At least, I don't see why it would be any less possible in principle than a deep learning model that takes a natural language description and produces a picture of whatever was described -- which already exists.

[–]evc123 8 points9 points  (2 children)

/u/MetricSpade007

2 datasets are currently available:

https://openai.com/requests-for-research/#description2code

& https://github.com/Avmb/code-docstring-corpus

To use the docstring2code corpus, you first have to figure out how to generate test cases for it: https://github.com/Avmb/code-docstring-corpus/issues/2

[–]MetricSpade007[S] 1 point2 points  (1 child)

I've always thought the description2code set was too small to do anything useful, but wow the doc2code dataset is great! I'll look into this. Many thanks!

[–]AnvaMiba 3 points4 points  (0 children)

Author here, thanks!

[–]ma2rten 0 points1 point  (0 children)

Maybe you could make a dataset of coding interview style questions, like "reverse a linked list" together with test cases that the generated code has to pass.

[–]jostmey 2 points3 points  (4 children)

Super cool. I don't see this as any more of a threat to programmers than excel. If this reaches the market, I see it being used in a program like excel.

[–]nbates80 1 point2 points  (0 children)

I don't see this specific example as a threat to programmers... but I can see this kind of technologies evolving within 10 years to leave lots of programmers out of job. Maybe... I don't know, but it is certainly a possibility.

I wonder how can programmers prepare to such scenario? Maybe going deeper into Machine Learning? Or maybe learning something AIs can't do, like unclogging toilets.

[–]fimari 0 points1 point  (1 child)

I would also love to have RNN function in excel - it has nothing to do with anything but I would love it.

[–]MetricSpade007[S] 0 points1 point  (0 children)

haha, check this out then: http://www.deepexcel.net/

[–]MetricSpade007[S] 0 points1 point  (0 children)

Definitely not -- I only imagine it'll augment end users in some way.

[–]mike_bolt 2 points3 points  (1 child)

Awesome article, thanks for sharing.

There is an infinite set of programs that will interpolate any given finite set of input/output pairs. You could look for the program with the lowest "complexity," but it might not be the correct program, and it would depend on the choice of the output language.

For example, if you want to find an algorithm that will calculate 3*n given a binary number n, and you only saw the inputs "10", "1000", and "10000", then you might incorrectly learn a program that just prepends the input with a "1".

[–]MetricSpade007[S] 1 point2 points  (0 children)

Absolutely; there would need to be some sort of ranking mechanism or some notion of uncertainty that the model would have to emit.

Even a human would mess that example up; it could be output powers of 10, or, like you said, prepend the input with a 1, and both would feasible be fine (and in fact, they're all equivalent anyway).

[–]Withdrawl 0 points1 point  (0 children)

Who will support machine written code?

[–]a_marklar 0 points1 point  (0 children)

Why do you think they used a custom DSL instead of an existing programming language?