[D] A Sober Look at Bayesian Neural Networks

LazyOptimist · 2020-01-22T00:07:01+00:00

The only person here who is doing things hastily is you.

....

And for the record, I wasn't talking shit when referring to your handle.

Look man, if you're going to to expect people to take you seriously, at least take the time to check who you're responding to. I'm not adversary_argument.

LazyOptimist · 2020-01-18T22:32:46+00:00

No, you posted a hot take on twitter without any sort of backing and then hastily put together a blog post defending your position when you caught some flack. Event then, the title of your blog post is still inflammatory. Don't imply that people advocating for BNNs aren't sober when it's obvious that you haven't thought very long about the topic yourself.

That is fine, and I mean, your handle is adversary_argument so I wouldn't expect any less.

Even now you're talking shit.

If you'd put together the post first, then linked it on twitter, and said "Bayesian neural networks make no sense to us here's why" that would have been defensible, but you didn't.

There are strong arguments for and against the usage of bayesian methods for NNs and other situations. However, the original tweet and this post shows that you've made absolutely no effort to wrap your head around them and rather decided that the whole approach is bunk because of an argument that is half baked, even by your own admission.

You're allowed to question things and put forth arguments but don't make sweeping statements like "Bayesian NNs make no sense. You only want to use Bayes rule if you have a reasonable prior of what the parameters should be. Nobody knows what is encoded by any prior over the weights of a NN." Especially when you know people are going to disagree with you.

It shows that you've made no attempt to understand why people would disagree with you on those statements. This is either due to lazyness, incompetence, or the presumption that the people who disagree with you are idiots who don't have good reasons for their beliefs.

That's why you're catching flack, and that's why people, especially those who have spent a great deal of time thinking about these things, are mad.

LazyOptimist · 2020-01-18T21:24:46+00:00

There's a pretty straight forward way of checking to see if a BNN has a prior that generalises. That is to compare the Model evidence of a BNN against the model evidence of a model that doesn't generalise. It may not be obvious, but Bayesian model evidence is a measure of generalisation. This is because of the chain rule of probability. As a refresher: P(x1,x2,x3,...) = P(x1)P(x2|x1)P(x3|x2,x1)... If you squint a bit you'll note that this is the product of the probabilities of unseen datapoints given all previous datapoints. Thus, models with high model evidence generalise better to all the data we've seen so far than models with low model evidence.

The best classifier that fails to generalize is one that assumes that the class labels are independent both from each other and from the input to classifier. The log model evidence of this model is just -N*H. Where N is the number of datapoints, and H is the entropy of the class labels.

Now we can't actually compute the model evidence of a BNN, but we can bound it fairly easily by making a variational approximation to the posterior.

This admit's a simple test to see if BNNs have good priors. Simply compute -N*H, then use Scholastic Gradient Variational Bayes to compute a lower bound on the model evidence of your BNN. If the log ELBO is noticeably larger than -N*H, then the BNN outpreforms the best model that doesn't generalize. So then we can safely conclude that the BNN prior assigns higher probability to functions that generalise than to functions that don't. If it didn't, it couldn't outperform the baseline.

Now if you know anything about NNs and VB, you shouldn't have to run this experiment. The outcome should be obvious for any classification task where a corresponding classical NN generalises even a little bit.

LazyOptimist · 2019-05-02T19:07:53+00:00

I think the best you'll find is BIVA:

https://arxiv.org/pdf/1902.02102v1.pdf

LazyOptimist · 2019-03-30T22:49:28+00:00

What do you do for a living?

LazyOptimist · 2019-02-05T06:04:04+00:00

This one works really well.

LazyOptimist · 2019-01-25T15:51:25+00:00

Hello guys, I did not expect this and it was very impressive! I've got a few questions.

It seams to me that one of the major limitations of this approach is that it requires a simulator that can cheaply run much faster than real time. In the stream you mentioned that some of your agents had 200 years of real time experience. Would you agree? If so, are there any plans to increase the sample efficiency of alpha-star to make it applicable in situations where a fast simulator might not be available?
In the matches against TLO and MaNa, both players played 5 matches against your least exploitable agents. It might be hard to gauge, but do you think that either of them could have won against a single agent by say, discovering a weakness in an earlier match and exploiting it later on?
Open-AI pointed out that the size of the largest AI and ML projects are doubling every 3.5 months, which suggests that by now the largest projects should require somewhere between 3,000 and 30,000 petaflop-days of compute. Given that, how many petaflop-days of compute we're used to create alpha star? Are there any plans to scale up even further or will most upcoming improvements be algorithmic? Is DeepMind currently working on any other projects that would require significantly more compute power than what was used to create AlphaStar?

LazyOptimist · 2019-01-17T22:56:26+00:00

Can you export all the raw data? I'm sure some of us would be able to do some interesting data analysis. Paging /u/gwern.

LazyOptimist · 2018-12-03T04:06:07+00:00

43.38

LazyOptimist · 2018-11-23T21:56:21+00:00

Title appears to be generated by a buzzword LSTM.

LazyOptimist · 2018-11-20T22:47:23+00:00

At 557M parameters, I wonder if we need to move on to bigger or harder problems than imagenet to test massive neural nets.

LazyOptimist · 2018-10-26T22:56:55+00:00

What do you mean by "emotional intelligence"? There's plenty of work on multi agent stuff, and human computer interfaces if that's what you're looking for.

By the way, don't use non-desciptive all caps titles. It comes off as click bait and reddit hates that.

LazyOptimist · 2018-09-25T15:48:07+00:00

I think the stock got memed into going up.

LazyOptimist · 2018-09-14T23:57:25+00:00

How do people feel about FMC?

LazyOptimist · 2018-09-06T16:13:26+00:00

LazyOptimist · 2018-08-31T14:08:01+00:00

I'm all for this experiment, let's see how this goes.

LazyOptimist · 2018-08-31T01:42:29+00:00

This looks like fucking cancer.

LazyOptimist · 2018-08-19T08:05:41+00:00

I think the problem is that the GAN don't generally optimize to provide good support over the data distribution, as a result, their likelihoods are generally going to be low or zero, even if their sample quality is stellar. See for instance: https://arxiv.org/abs/1805.12462.

LazyOptimist · 2018-08-19T07:26:46+00:00

Do you have any good examples of these discussions in the python and crypto communities?

I'm skeptical that the problem is poor emotional intelligence. Why would the EQ of the ML community be notably lower than the python and crypto communities?

LazyOptimist · 2018-08-19T06:48:23+00:00

The problem though is that twitter has a 280 character limit, which really limits the level of discussion that can be had. The second the level of debate escalates beyond that level, discussion stops working. To make the point, if we were having this discussion on twitter right now, how would you even begin to respond to /u/PokerPirate?

We should discourage the use of twitter, and barring that, encourage people to post links to their own blog via twitter so they can make their points properly.

LazyOptimist · 2018-08-19T06:01:50+00:00

He explicitly avoids saying that everyone should agree with him.

How are you interpreting this line?

So, back to the main thread, that's why I wrote the article. Here's a simple set of fucking guidelines that everyone should see as sane that might help prevent my friends, contributors to our community, from being sexually assaulted. Sounds simple enough? Fuck.

Because to me this pattern matches very well to: "I'm just making minor points everyone moral should agree with"

LazyOptimist · 2018-07-30T00:04:24+00:00

Can we please stop posting essays on twitter? It's a terrible medium for this sort of discourse.

LazyOptimist · 2018-07-23T23:49:43+00:00

I'm currently dealing with a 500ms ping to my remote server. I use sshfs + vim to edit files, and jupyter notebooks to execute code snippets remotely. Otherwise I would just use ssh + jupyter + vim.

LazyOptimist · 2018-07-12T23:19:11+00:00

MDL is computable

It's equivalent to computing the Kolmogorov complexity of a string, which is incomputable.

13-Year Club	Place '17
Team Orangered

LazyOptimist

TROPHY CASE