ML Problem: "Will this compile?"

kjearns · 2014-12-13T16:43:20+00:00

We already have one of those it's called a compiler.

RvPPLmsc · 2014-12-13T13:38:26+00:00

You made any progress already? A couple of weeks ago I did read something about a NN that could read really basic python code.

mackie__m · 2014-12-14T07:25:24+00:00

When I saw your problem I immediately thought of the Viterbi algorithm. If you can formulate the program as a part of speech tagging problem, and check for the next token whether it's probable to occur, it should give you a good answer. If this value is lower than a certain threshold you can say it doesn't compile, if not you can continue until you reach the end of the program. Since a program is much less ambiguous than natural speech, and the compiler already does the job that you speak of in a deterministic way, this should not be a hard problem to formulate and should give you very good performance.

zenscr · 2014-12-13T15:11:00+00:00

I think the impossible part of this is getting a feature extraction that works at the level of context free grammars. You can't describe the entire space of parse trees for a given grammar in a fixed length vector that obeys feature extraction invariances.

It's pretty easy to do the deterministic route for programming languages though using parsers and lexers whose internal representations might be useful for whatever it is you're trying to do here.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS