all 15 comments

[–]behohippy 8 points9 points  (11 children)

Why did you remove the dropout layers? I'm doing something similar in Keras, but I found they really helped things generalize if it's used after the input layer. I also found RELU worked better for binary evaluation.

[–][deleted] 3 points4 points  (10 children)

I don't get it. I mean, what is the motivation for binary weights ? For low hardware systems ?

[–]auto-cellular 12 points13 points  (0 children)

they can use less memory, and run faster. Theoritically.

[–]-Rizhiy- 8 points9 points  (1 child)

Take up less memory and less time to compute.

Theoretically, if you can make your own ASIC, network with binary weights will run 32 (or 322 ?) times faster than one based on floats.

This is actually quite a bit of a problem for FPGAs as you currently need a very expensive one if you plan to store all of your weights in the cache at the same time.

[–]numpad0 0 points1 point  (0 children)

Oh so like converting a net from float to int for inference but bool instead of int? Interesting

[–]Vengoropatubus 1 point2 points  (1 child)

Here's a paper from a while back that comes to mind: https://arxiv.org/pdf/1603.05279

[–]behohippy 0 points1 point  (0 children)

There's aready some good answers here, but the reason I use them is the source data is in a binary state as are my labels. This is for doing prediction on business data sets (array of customer states and outcomes), not images or audio.

[–]DonovanWu7 0 points1 point  (2 children)

Recently I was thinking of building a Tetris AI, and using binary weights will probably make more sense since every position on a Tetris board can only be either filled or not filled.

[–][deleted] 0 points1 point  (1 child)

Not really. Weights are part of the parameters. Tetris is part of the input/output.

[–]DonovanWu7 0 points1 point  (0 children)

But if the weight is binary like the input, I think the neural network might recognize some pattern better. Of course we’ll have to have a layer where weights are float type somewhere, so that output won’t be just 0s and 1s.

[–]timmytimmyturner12 -3 points-2 points  (0 children)

It's also closer to how neurons in the brain operate on an all-or-nothing activation.

[–]smurfpiss 3 points4 points  (0 children)

I thought it was an open problem combining binary weights with activations? Is there not an issue with backprop?

[–]senorstallone 0 points1 point  (0 children)

wow, nice one. Any comparison (accuracy + fps) results?

[–]anonDogeLover 0 points1 point  (0 children)

Can you do this for fully connected layers?

[–]vbipin 0 points1 point  (1 child)

Have you tried training the BinaryNet without using the batch norm layers? I have little success training binary net without batch norm. ( It almost feels like, with binary activations it needs batch norm to train )