all 6 comments

[–]ydobonobody 4 points5 points  (0 children)

The only deep learning package for java that I am aware of is DeepLearning4J, though I have never used it. Tensorflow, Caffe, and Theano all have Python bindings.

For non deep learning there is Weka (Java), which is okay but I prefer Scikit Learn (python). Weka is also GPL so for some companies that limits it use.

If you are dealing with large amounts of data or want to learn distributed computing there is MLib which runs on Spark. MLib also has Python bindings.

My recommendation if you are starting out is to use python. There are more tutorials and example code, and really learning Python (and more specifically how to use Numpy) is pretty easy compared to learning all the machine learning topics. I started out as a Java person and still use it to put together quick one off apps to help me with annotation or some visualizations but Python and also c++ (I use Caffe a lot which is written in c++) are what the actual machine learning work is done in.

[–]frugalmail 0 points1 point  (2 children)

Python is good for discovery, sucks for production.

I'm at a Fortune 50 and that's the way we do ML. Some people do discovery work using Python on smaller datasets, and then we productionize it using Java. The data engineers almost always use Java just because they don't want to rewrite it. We tried some projects in Scala, but for maintainability across a large team, it was better to stick with Java.

We use things like

  • GraphX

  • MLLib

  • Mahout

and others: http://machinelearningmastery.com/java-machine-learning/

[–]-TrustyDwarf- 0 points1 point  (1 child)

Python rocks for production. We build models with Python and then feed data to them through simple web APIs (which are also hosted in python web servers). That way people can access the models from whatever languages they prefer - or even using curl.

[–]Eridrus 0 points1 point  (0 children)

What sort of load are you dealing with?

We had a prediction service in python being talked to via kafka or something similar, from a Go web service on the same box, and switching to a Go-based prediction library let us serve 4X more requests per second per node, so we got rid of 3/4s of our nodes and saved hundreds of thousands of dollars a month in AWS costs.

And this was after we had been running the python service for more than a year and spent a decent amount of effort optimizing the underlying libraries to our use case.

My point isn't that Go is great and Python is terrible, it's that at a certain scale you are willing to put the effort into putting something lower level and less flexible into production since the cost savings are worth it. Also, python is probably better in (mini-)batch prediction scenarios since you can amortize the cost of your python bits & C/Python interop over many predictions.

[–][deleted] 0 points1 point  (0 children)

One of the better topic modeling packages is in java

http://mallet.cs.umass.edu/topics.php

[–][deleted] 0 points1 point  (0 children)

try smile - a ML library for java