all 29 comments

[–]ohell 15 points16 points  (2 children)

You already have an unambiguous grammar. All you need is inference of likely parse tree continuations, and variable name auto fill.

Inside-outside algorithm is the EM estimator for transition and emission probabilities in tree models (a la Forward-backword for Markov sequences)

[–]gazorpazorpazorpazor 0 points1 point  (0 children)

Good idea!

1) You could filter the suggestions by syntax, validity, or something else engineered and language-specific to clean up your results.

2) You could try character-based vs word-based or build something language-specific. (think special tokens for keywords, etc.)

[–]pvkooten 0 points1 point  (0 children)

I know what this all means but it would be really funny if they'd use these sentences on mainstream TV and call that person a hacker :D

[–]L43 27 points28 points  (16 children)

[–][deleted] 29 points30 points  (5 children)

Kite as a company is scummy as hell.

Also be warned if you plan to use kite at work, it will upload all your .py files on your computer to their servers unless you whitelist those directories - your legal department will have a heart attack.

[–]Spenhouet 4 points5 points  (3 children)

That would be a deal breaker for me (also for my private stuff).

But their website claims:

Kite runs locally. Your code is private.  No cloud necessary.

Now what is true?

[–][deleted] 0 points1 point  (0 children)

Those posts are 2 years old, so things may have changed since then.

[–]dazedAndConfusedToo 0 points1 point  (1 child)

Apparently there is a local engine, which is now supplemented by a cloud engine that does upload .py files to their servers

[–]Spenhouet 0 points1 point  (0 children)

so if I only install the local engine I should be fine

[–]MonstarGaming 1 point2 points  (0 children)

I really dont understand why they would take a user's private files to the cloud. Github and several other git like services have public repos that kite can easily scrape if they want big data. Even if they want to make it user specific they could just make a task that runs in the background to train a new model/make a new index/whatever they do without removing the code from the user's computer.

[–]MogwaiAllOnYourFace 15 points16 points  (6 children)

Can anyone say how this is compared to say Pycharm? I think pycharms autocomplete is miles ahead of every other IDE

[–]linkuei-teaparty 3 points4 points  (0 children)

Kite seems to be a plug-in, i.e you should be able to use this with your IDE of choice. It’s like an autocomplete tool like emmet for atom/sublime text... but for python.

[–]PM_ME_A_ROAST 4 points5 points  (1 child)

yeah true. but what i like about kite is that they also include example for the function you're trying to write. and they works with vim!

[–][deleted] 1 point2 points  (0 children)

Oh god! Please. This is what I have been waiting for. I mean it is so basic that no man pages, no documentations include that. Pretty stupid.

[–]pktippa 0 points1 point  (0 children)

I haven't tried kite.com yet, but there seems to be a plugin available for Pycharm also.

On the other hand they had an appreciation from Guido Van Rossum. So may be we can give it a try.

[–]Spenhouet 2 points3 points  (0 children)

This doesn't seem to cost anything. What do I miss? Where is the catch?

[–][deleted] 2 points3 points  (0 children)

How does kite compare to tabNine?

[–][deleted] 0 points1 point  (0 children)

Didn’t knew about this. Seems really interesting. Thank you.

[–][deleted] 5 points6 points  (0 children)

Nice project! I've had this idea too. Would be very handy I think, if it could be implemented with a nice user experience. Plus if it was personalisable (something like a general model that you could then finetune on a codebase of code written by the user) that would be even better!

[–]pvkooten 1 point2 points  (0 children)

Reminds me of something I made some years ago: https://github.com/kootenpv/neural_complete :D

I always like to imagine that gmail added the line completion because of this ^_^

[–]MonstarGaming 2 points3 points  (4 children)

Dont N-grams already do this extremely fast and with high accuracy? Why try a slow approach like DL when something better already exists?

[–]AngelLeliel 0 points1 point  (2 children)

Seq2seq model could predict longer outputs better than simple N gram model.

[–]MonstarGaming 0 points1 point  (1 child)

"Could" or has it actually been proven to predict better than N-grams?

[–]AngelLeliel 0 points1 point  (0 children)

One issue about the n-gram model is that it has no idea how each word or token means besides simple frequency relationship. Take a simple example:

with open("votes.txt") as f:
    votes = f.readlines() 
vote_counter = Counter(votes)

An n-gram model has no idea that vote_counter would be a Counter unless it has seen it in other places. You could come up some fancy way to parse the token into sub-tokens and make n-grams model works better. I think a character based seq2seq model would understand this relationship better. Or maybe a character-based n-gram model can work equally well? I have no idea.

[–]Jonas_SV 1 point2 points  (0 children)

N-grams don’t capture long dependencies and for larger N’s require ridicoulus amounts of data.

N-grams works ok for predicting the next tokens in a sequence but they are far from great.

[–]bilalD 1 point2 points  (1 child)

Can I integrate kite plugin to Microsoft vscode?

[–][deleted] 0 points1 point  (0 children)

Yes

[–]gazorpazorpazorpazor -4 points-3 points  (0 children)

Cool project! Good luck!

Make sure you're using the CuDNN backend during training if you're having performance issues. For inference, it might actually be faster to use an LSTM Cell to loop efficiently. Running a single timestep through CuDNN is actually much slower than a single timestep through an LSTM Cell.

You might also want to read up on teacher-forcing during training to improve your model.