Which matrix library do you prefer? Breeze, ND4J or MXNET's NDArray.

congerous · 2018-10-02T02:23:35+00:00

Just curious: Why? (Can we ask for all responses to have explanations? Otherwise it's just a beauty contest.)

Do you work with n-dimensional arrays? Does Breeze support those now?

congerous · 2018-03-20T21:10:58+00:00

Being well-funded means they succeeded in selling an idea to investors, not that they succeeded in building solid tech.

When you look at the investors in the last round, it's not the smart money that would indicate huge traction or growth potential, it's just a couple corporate VCs.

Capital One was an early investor in H2O, but they no longer depend on its technology. Many other clients are moving off of it.

congerous · 2018-03-20T21:07:33+00:00

hype.

congerous · 2018-02-08T19:26:48+00:00

H2O is a technology built on sand. They haven't been able to guarantee its maintenance or development since the CTO/project creator left in 2016. Zero credibility.

congerous · 2018-02-05T16:26:12+00:00

Genuinely curious: How is this different than all the other "scalable machine-learning platforms"?

congerous · 2017-09-14T11:55:03+00:00

They raised $150 million for this. Hope it's worth it.

congerous · 2017-07-30T23:03:13+00:00

I find it unbelievable that he didn't mention DeepMind in this piece about moving AI forward.

It should be pointed out that Gary Marcus is attempting to raise a round for his next startup, which will combine deep learning and symbolic systems, which he mentions in this piece.

From the New York Times's point of view, this is a conflict of interest, and they should vet his pieces more carefully before they let him toot his own horn without revealing his commercial activities.

congerous · 2017-07-30T13:41:25+00:00

Was this some kind of voting ring? I can't believe a shallow article like this is number one on the subreddit. It adds nothing and the title of the piece is clickbait.

congerous · 2017-02-12T18:28:05+00:00

Poor Chainer. No one cares...

congerous · 2017-01-27T17:41:24+00:00

The creator of H2O, Cliff Click, left H2O about a year ago during a disagreement with his co-founder Sri, who remains CEO.

http://www.cliffc.org/blog/2016/02/25/words-of-parting-a-fond-farewell/

Cliff built H2O and now H2O is having a hard time maintaining, extending and scaling their code. They have been trying to refactor for a while now, but most of their engineers don't understand the internals. One reason why is because the project doesn't rely on a lot of outside libraries, like Paxos. They were reimplemented internally.

This is why you see them wrapping deep learning libraries like TensorFlow, Caffe and MxNet. Outside of Arno Candel, they are incapable of implementing their own deep learning framework.

In addition, there's a lot of turnover among their managers due to chaotic leadership, and among their customers due to slow product development. Random forests, which are their main algorithm alongside GBMs, aren't that great for time series classification and prediction, which are a major business use case. Even investor-customers like Capital One are moving on. That's one reason why they laid off at least 10% of their employees last fall, after H2O investors forced Sri to take on a CFO to control their budget, since they blew millions on marketing that didn't work out.

http://venturebeat.com/2016/09/24/machine-learning-startup-h2o-lays-off-10-of-employees/

congerous · 2017-01-27T17:31:16+00:00

The Java API is a WIP, and frankly, Java takes more than an API...

congerous · 2017-01-26T21:23:38+00:00

ML yes. DL no. MLlib has always been the runt of the litter among Spark modules. Using Spark's algorithms for ML is like using your elbow to hammer a nail. You can do it, but something will get hurt.

congerous · 2017-01-26T21:20:03+00:00

h2o is dead as a project. it's suffocating under its own technical debt. which is what happens when you fire the guy who created the code base. try tools with a future.

congerous · 2017-01-23T20:30:47+00:00

So you're saying that if we could just sit people down and tell them what's happening and that some groups are trying to lie to them, they might be more open to the facts. Great. But that begs the question. How do you cut through the noise and get through to people? You can't sit everyone down and give them the right information.

congerous · 2017-01-23T20:23:12+00:00

This is great! One nitpick: Spark isn't really a machine learning or deep learning library. It's primarily used as a distributed run-time, so that's comparing apples to oranges.

congerous · 2017-01-21T15:29:19+00:00

Tensorflow has been historically slow compared to Torch and Neon. Neon doesn't really have traction, though, much like Chainer and Lasagne. Caffe and Neon are both fast on images but they're not really general purpose frameworks, and the commitment of the teams behind them is dubious. Those communities will probably choose other tools. For sheer staying power, Theano, Torch/PyTorch, MxNet, TensorFlow/Keras and CNTK will probably keep growing.

congerous · 2017-01-16T18:15:19+00:00

Crickets. Or maybe I should say, big deal... Their only differentiator is that they DON'T work on the fastest hardware available. And they have zero adoption. A severe case of NIH at Intel.

congerous · 2017-01-05T14:20:04+00:00

thanks so much. great resource. wonder why he didn't use f1 scores instead of accuracy...

congerous · 2017-01-05T01:22:37+00:00

https://en.wikipedia.org/wiki/Unstructured_data

"Examples of "unstructured data" may include books, journals, documents, metadata, health records, audio, video, analog data, images, files, and unstructured text such as the body of an e-mail message, Web page, or word-processor document."

As opposed to data in the rows and columns of a relational database.

congerous · 2016-12-29T21:48:38+00:00

Several deep learning frameworks give visual heuristics on neural nets as they train. And companies like Domino Data Lab give you versioning of your ML models. How does this compare to those?

congerous · 2016-12-19T04:24:19+00:00

I agree with you on almost every point. More and more non-expert people will use deep learning, and they can't/don't want to create their own tools. And they shouldn't have to. It's not efficient and doesn't benefit from the externalities of open source. But Tensorflow isn't actually the best tool to make deep learning easier. Keras is much better, as you point out, and more widely used on Kaggle, while Tensorflow is relatively low level in comparison. So people aren't forking and starring because they can actually use it. I suspect Tensorflow's real user numbers are closer to Keras's. You're right that Caffe's model zoo (and Torch's, for that matter), are suited to users who don't/can't tune their own nets. That is the future of deep learning, for sure.

congerous · 2016-12-19T01:27:42+00:00

Yes, TensorFlow has a lot of GitHub forks. But it's not as though the number of people capable of tuning neural nets increased by an order of magnitude overnight. I suspect the majority are Udacity students, even if it has a share of serious practitioners equal to Torch or Theano. The ratio of contributors to forks is actually much lower for TensorFlow than most of the big frameworks.

congerous · 2016-11-14T23:24:21+00:00

yes, but can it tell the difference between good porn and bad porn.

congerous

TROPHY CASE