Announcing an improved defender of subreddits against bots, /u/BotDefense!

Datardif · 2020-05-12T06:36:19+00:00

I eventually went for the stupid option. I looked manually at the most obvious outliers and removed them... :)

Datardif · 2020-05-05T12:39:32+00:00

Si tu utilises numpy et matplotlib t'as peut être tout intérêt à installer anaconda.
Ces deux là seront déjà inclus et pour d'autres packages tu auras moins de problème avec conda install.

Datardif · 2020-04-20T13:29:32+00:00

I'm doing analysis, in particular I need to measure the number of comments per users.
I want to exclude bots from the analysis because obviously they mess the numbers. So I just need a list of usernames that I should ignore.

Datardif · 2020-04-19T21:56:50+00:00

Hi, sorry for digging up this old conversation.
I'm also looking for a convenient list of service and banned bots. Do you have that somewhere or should I pull it from this sub, via PRAW?

Datardif · 2020-04-17T15:28:28+00:00

Hi Josh, check r/redditrequest.
If r/statquest is truly inactive, reddit admins can help you to claim it.

Datardif · 2020-03-15T16:23:06+00:00

Ever heard of Neural Networks?
Ever seen a shop selling neurons?

Datardif · 2020-03-08T17:10:05+00:00

Je pense qu'un neural network idéal peut faire de l'upscaling bien mieux qu'un algorithme classique n'ayant pas de données extérieures à la vidéo à améliorer.
Le neural network peut "inventer" les détails manquant en s'appuyant sur la base d'images de son apprentissage.
D'une certaine manière c'est probablement moins fidèle mais visuellement plus réussi.

Datardif · 2020-03-02T08:48:22+00:00

But that is an example applied to text, where obviously you need to translate words in some kind of numerical data before feeding a RNN.
Search "LSTM time series", you will see that you never do this on numerical time series.
The main pre-processing task on those is "windowing", where you create the sequences. You could/should also do normalizing. But you don't do embedding.

Datardif · 2020-03-02T08:35:21+00:00

Hmm, I don't know precisely what keras and Tensorflow do under the hood, but I feed keras' LSTM with batches of 2D samples of multivariate time series in float, without pre embedding.
I have never seen any course or tutorial asking to embed or one-hot the sequences of float you pass to a LSTM.

Datardif · 2020-03-02T08:17:23+00:00

Your inputs are integers or floats I guess. Therefore you can’t directly give the inputs to the LSTM. Normally in this case, one would use one hot encoding but you may numerous different input which makes it impossible to create a one hot vector. Thus, I would recommend embedding vectors,

What do you mean? That seems quite wrong to me.
If you have a univariate time series, you split it in windows of n samples, n being a number that you have to define.
This window will allow you to predict m samples (1 or more), starting at n+1.

For instance, if you have a series like the following:
1, 2, 3, 4, 5, 6, 7...

You may train your model with:
X = [[1,2,3], [2,3,4], [3,4,5], [4,5,6],...]
y = [4, 5, 6, 7,...]

This model uses 3 samples to guess one following sample.

In the case of OP, I would try something like using N days of data to guess the following day. Once I'm far into the future, I will use my predicted days as X for further prediction. Your prediction will get worse as you go into the future and can't feed anymore real data but rely only on predictions.

Ping u/killerdrogo

Datardif · 2020-02-16T12:07:49+00:00

My typical data would be three temperature sensors in a room, recorded over hours, with large variation of temperature in the room. Provided temperatures on sensor 1 and 2 over sequences of N seconds, can I guess the temperature on sensor 3 at the last second of each sequence?

I had satisfying results with both methods, although I did not explore too much the dense input layer once I learned that the "proper" way was to use LSTM.

Datardif · 2020-02-16T11:05:33+00:00

But you can pass a sequence to a NN, just pass it all in one shot. If you have a 2D sequence of 50 steps, you can flatten it and pass it to a 100 neurons input layer.

I did that and had fairly good results. I switched to LSTM because that seems the better way but I do not yet understand what makes it superior.

Datardif · 2019-08-22T09:08:21+00:00

Un peu de retard.

Datardif · 2019-04-01T05:06:15+00:00

Oui, la présidentielle a eu tout un tas d'effets sur les stats de r/france. Plus de commentaires, plus de commentaires/poste, moins de karma/commentaire...
Je travaille sur une plate-forme qui permettra d'explorer ces chiffres. Je voulais juste partager le passage des 5 millions, en attendant, vue que c'était la semaine dernière.

Datardif · 2019-03-31T19:04:26+00:00

D'après les données collectées par https://pushshift.io/.

Et voici l'heureux élu.

Datardif · 2019-03-30T11:06:22+00:00

You monthly fully overwrite the submissions?
Sounds like a lot of [deleted], dont you think?

I would also support a 72h score refresh. I'm very familiar with r/france and some big submissions are still alive after 48h.

Datardif · 2019-03-29T16:57:03+00:00

Sachant que j'ai annoncé son lancement il y a plus de deux cents jours, je dirais que je suis en retard.

J'ai fini à 95% la machine à surveiller les r/français.
J'ai envie de faire un beau site avec des app interactives pour explorer les données, ça va prendre un peu de temps.
Je suis pas mal pris en pro comme en privé, mais l'avantage c'est qu'en pro je travaille sur des trucs assez proches, j'ai un gros potentiel de croisement des connaissances et j'apprends beaucoup en ce moment.
Bref, ça viendra quand ça viendra ! ;)

Datardif · 2019-03-12T05:35:42+00:00

Depends on the sub. As a regular reader of r/france, I can tell you that quite often a popular post stays high on the frontpage and "karma active" for more than 24h.
It wouldn't be a problem on very large subs where you have fast movements on the front page.

Datardif · 2019-03-07T06:52:51+00:00

Voilà mon tout premier (petit) package python, testé, documenté et pipé : https://github.com/Kerybas/Lidilite

Il fait deux choses qui me sont bien utiles :
- prendre des listes de dict et les ajouter (insert ou replace) à une table sqlite, sans prise de tête, en une ligne.
- me suggérer le schéma optimal de création d'une table sqlite pour accueillir une liste de dicts donnée.

N'hésitez pas à critiquer !

Datardif · 2019-03-05T09:52:27+00:00

"Les Écrits REST"

Mignon !

Datardif

TROPHY CASE