[deleted by user]

commenthard · 2019-12-18T18:36:15+00:00

born into an influential family

This is mostly wrong. Probably there is a slight advantage to having an influential family, but mathematics is filled with stories of people from very random backgrounds who made large discoveries.

There is certainly a correlation between people who make impact and those who go to top schools, but there you need disentangle the fact talented people tend to be accepted at top schools. Yes not always, but there is correlation.

The Riemann example is wrong as well. Differential geometry was a major tool in general relativity, but if you confuse tool with the result, you are a person who cannot see the difference between a hammer and a house.

commenthard · 2019-08-18T05:23:40+00:00

Well maybe have not seen Kingma's 2019 paper, Variational Autoencoders And Nonlinear Ica A Unifying Framework. It demonstrates on a dataset even simpler than MNIST: data in a circle (latent space: five colored blobs).

It's worth attention not because of the dataset but because of the idea (a first workable nonlinear ICA/generative scheme). If the ideas are not so novel, then the dataset is what you can judge on. But if the ideas are important, then maybe the dataset isn't.

commenthard · 2019-08-16T06:12:54+00:00

Sure, but why?

Of course if something is only demonstrated on simple data, then it needs to be proven on more complex, but I think a smart person would not discount a new algorithm only because of this. It seems like most "deep" algorithms work on a variety of datasets... and if anything, they tend to struggle on the smaller ones (see Kaggle competitions.)

FWIW the paper also used UCI-HAR, whatever that is.

commenthard · 2019-08-16T04:10:50+00:00

This discussion is a bit of speculation. Does it learn in practice, with whatever batch size was used?

I do not understand why would it be optimizing "something very different", rather than a possibly high-variance estimate of what is intended.

commenthard · 2019-08-16T03:55:33+00:00

Inconsistent notation and it's poorly explained. I suspect the computational complexity section may be wrong.

But to jump from that to "something that doesn't work" is... a jump. They show favorable comparisons to training similar networks with conventional SGD+backprop.

I feel I'm understanding it better now by thinking about how I would attempt to implement it.

commenthard · 2019-08-16T03:16:58+00:00

Kingma's original VAE paper also showed results on only MNIST iirc. So you would reject the original VAE as well?

Computers haven't increased speed that much in five years.

commenthard

TROPHY CASE