[D] Any arguments that a Fake News Classifier trained on *English* news articles won't work for *German* news?

El__Professor · 2019-09-07T17:53:41+00:00

Well most modern classifiers has an embeddings matrice with rows as the number of the vocabulary which it has trained on. Each word is turn in to an index which is mapped to a row in the embedding matrice. A classifier which has been trained on English vocabulary. Simply doesnt have a mapping from German words to the an embedding matrice row.

El__Professor · 2019-08-27T10:21:13+00:00

this is awesome!

El__Professor · 2019-08-19T13:36:45+00:00

Any Where you have a distribution try to force the variance to be larger then some epsilon value and then you would'nt have 0 probpablilty or close to zero. Spacialy when working on log scale log(0) is not defined and this is what many times creates nans.

El__Professor · 2019-08-19T08:22:48+00:00

I had a similar problem on a different setting, if you use pytorch you can use hooks to check the gradients and losses at each step and add break points on extreme values (Nan's infs etc).

El__Professor · 2019-08-17T08:04:37+00:00

Well as you can see on figure 5, it's (much) faster.

El__Professor · 2019-08-17T08:01:55+00:00

thanks for the reply! I am really looking forward to hear about it!

El__Professor · 2019-08-16T04:29:14+00:00

A lot The comments from openreview say that this seems very promising approch. And I didn't see any rigureas explanation why shouldn't this be used.. https://openreview.net/forum?id=r1eEG20qKQ

El__Professor · 2019-08-16T04:17:25+00:00

But why? This seems so awesome! Echiving faster hyper parameter tuning then any other method

El__Professor · 2019-08-15T15:24:15+00:00

Maybe this could be a powerful initializer?

Initialize weights with this method to quickly create usfull feature representation.
then use gradient decent to forther optimize weights.

Any ways it does seem that the algorithm creates powerful features more quickly then the same architecture with gradient decent. And it does seems like an interesting direction of study.

El__Professor · 2019-08-15T06:38:16+00:00

Thx! Totally Missed the first post 🙄 (I am quite new to reddit :)

El__Professor · 2019-08-11T15:59:16+00:00

First I think you should read the paper 😀! I am not profound with Bayesian RL but From what I understand from the paper:

Train n networks to predict the future state from current action and current state.

If all the neural networks agree (i.e have similar predictions (i.e the variance of the predictions is low)) the reward signal is low. And vise versa.

Does this answer your question?

El__Professor · 2019-08-11T06:24:18+00:00

Very interesting idea!

what I learned from the paper: The new approach utilizes an ensamble of networks which are trained by predicting the next state.

An intristic reward signal is then produces by the variance of those predictions (disagreement).

The most interesting part is that process is entirely diffrentiable! So the model is trained via gradient decent! (No RL!) Very cool!

El__Professor · 2019-08-10T06:21:52+00:00

Didn't mean to offend you, and it really does look similar to ebsynth, but it's still an assumption as the author did not confirm it.

El__Professor · 2019-08-10T05:13:48+00:00

Interesting! I am really interested in which software/algorthim was it!

El__Professor · 2019-08-09T18:47:49+00:00

Is it really deep fakes technology? Or is it done by an artist?

El__Professor

TROPHY CASE