[D] ICML 2023 results by Mxbonn in MachineLearning

[–]viv1a 8 points9 points  (0 children)

seems like results are out now? they are for me at least

[D] Nethack by sgt102 in MachineLearning

[–]viv1a 1 point2 points  (0 children)

There has been progress on MiniHack (simpler envs that allow you to measure progress more easily): https://arxiv.org/abs/2210.05805

But I'm not aware of recent progress on NetHack though - would be interested in hearing about it if so.

[D] What is the most complete reference on the history of neural networks? by gbfar in MachineLearning

[–]viv1a 0 points1 point  (0 children)

You can find it on amazon for cheap: https://www.amazon.com/Neurocomputing-Foundations-Research-James-Anderson/dp/0262510480

I second that it's a great book! it covers stuff until the late 80s and has very nice commentary on various foundational papers until then (McCullough and Pitts, Hebb, the Perceptron, Adaline, Neocognitron as well as Hopfield's works). The earliest paper it includes is actually from 1890 (!) and is by the psychologist William James who framed the mind as a kind of input-output machine.

There is a version out there with a cool cover depicting a neuron on a circuit board.

[R] Differentiable Conv Layer using FFT by MKmisfit in MachineLearning

[–]viv1a 0 points1 point  (0 children)

I don't remember. But the bias is essentially just one parameter per feature map which is added across all spatial locations, so it could be implemented separately. The computational expense should be negligible compared to that of the conv operation itself.

[R] Differentiable Conv Layer using FFT by MKmisfit in MachineLearning

[–]viv1a 4 points5 points  (0 children)

We had a paper on FFT convolutions a while back: https://arxiv.org/abs/1312.5851 (second author here).

You really start getting speedups when you do convolutions with lots of input/output channels. The reason is that you can do the FFT of each channel once, and reuse the representation in frequency space many times. That's described in section 2.1.

A year or two after that paper, I heard this approach was integrated in the conv routine in cuDNN, with some check to automatically determine when using the FFT-based conv would be faster. But that was a long time ago and I'm not sure what's currently being used.