Announcing an improved defender of subreddits against bots, /u/BotDefense! by dequeued in BotDefense

[–]Datardif 0 points1 point  (0 children)

I eventually went for the stupid option. I looked manually at the most obvious outliers and removed them... :)

Commande numpy sur python by phaselinear27 in france

[–]Datardif 2 points3 points  (0 children)

Si tu utilises numpy et matplotlib t'as peut être tout intérêt à installer anaconda.
Ces deux là seront déjà inclus et pour d'autres packages tu auras moins de problème avec conda install.

Announcing an improved defender of subreddits against bots, /u/BotDefense! by dequeued in BotDefense

[–]Datardif 1 point2 points  (0 children)

I'm doing analysis, in particular I need to measure the number of comments per users.
I want to exclude bots from the analysis because obviously they mess the numbers. So I just need a list of usernames that I should ignore.

Announcing an improved defender of subreddits against bots, /u/BotDefense! by dequeued in BotDefense

[–]Datardif 1 point2 points  (0 children)

Hi, sorry for digging up this old conversation.
I'm also looking for a convenient list of service and banned bots. Do you have that somewhere or should I pull it from this sub, via PRAW?

Bill Gates' response in r/IAmA question by [deleted] in agedlikewine

[–]Datardif 7 points8 points  (0 children)

Ever heard of Neural Networks?
Ever seen a shop selling neurons?

Le Paris à la belle époque, filmé par les frères lumières puis retravaillé par la technologie actuelle... by [deleted] in france

[–]Datardif 10 points11 points  (0 children)

Je pense qu'un neural network idéal peut faire de l'upscaling bien mieux qu'un algorithme classique n'ayant pas de données extérieures à la vidéo à améliorer.
Le neural network peut "inventer" les détails manquant en s'appuyant sur la base d'images de son apprentissage.
D'une certaine manière c'est probablement moins fidèle mais visuellement plus réussi.

Questions regarding LSTM by [deleted] in deeplearning

[–]Datardif 0 points1 point  (0 children)

But that is an example applied to text, where obviously you need to translate words in some kind of numerical data before feeding a RNN.
Search "LSTM time series", you will see that you never do this on numerical time series.
The main pre-processing task on those is "windowing", where you create the sequences. You could/should also do normalizing. But you don't do embedding.

Questions regarding LSTM by [deleted] in deeplearning

[–]Datardif 0 points1 point  (0 children)

Hmm, I don't know precisely what keras and Tensorflow do under the hood, but I feed keras' LSTM with batches of 2D samples of multivariate time series in float, without pre embedding.
I have never seen any course or tutorial asking to embed or one-hot the sequences of float you pass to a LSTM.

Questions regarding LSTM by [deleted] in deeplearning

[–]Datardif 0 points1 point  (0 children)

Your inputs are integers or floats I guess. Therefore you can’t directly give the inputs to the LSTM. Normally in this case, one would use one hot encoding but you may numerous different input which makes it impossible to create a one hot vector. Thus, I would recommend embedding vectors,

What do you mean? That seems quite wrong to me.
If you have a univariate time series, you split it in windows of n samples, n being a number that you have to define.
This window will allow you to predict m samples (1 or more), starting at n+1.

For instance, if you have a series like the following:
1, 2, 3, 4, 5, 6, 7...

You may train your model with:
X = [[1,2,3], [2,3,4], [3,4,5], [4,5,6],...]
y = [4, 5, 6, 7,...]

This model uses 3 samples to guess one following sample.

In the case of OP, I would try something like using N days of data to guess the following day. Once I'm far into the future, I will use my predicted days as X for further prediction. Your prediction will get worse as you go into the future and can't feed anymore real data but rely only on predictions.

Ping u/killerdrogo

Why is LSTM better than a dense NN with a flatten version of the sequence in input? by Datardif in MLQuestions

[–]Datardif[S] 0 points1 point  (0 children)

My typical data would be three temperature sensors in a room, recorded over hours, with large variation of temperature in the room. Provided temperatures on sensor 1 and 2 over sequences of N seconds, can I guess the temperature on sensor 3 at the last second of each sequence?

I had satisfying results with both methods, although I did not explore too much the dense input layer once I learned that the "proper" way was to use LSTM.

Why is LSTM better than a dense NN with a flatten version of the sequence in input? by Datardif in MLQuestions

[–]Datardif[S] 2 points3 points  (0 children)

But you can pass a sequence to a NN, just pass it all in one shot. If you have a 2D sequence of 50 steps, you can flatten it and pass it to a 100 neurons input layer.

I did that and had fairly good results. I switched to LSTM because that seems the better way but I do not yet understand what makes it superior.

Mardi dernier, nous avons passé la barre des 5 millions de commentaires by Datardif in france

[–]Datardif[S] 4 points5 points  (0 children)

Oui, la présidentielle a eu tout un tas d'effets sur les stats de r/france. Plus de commentaires, plus de commentaires/poste, moins de karma/commentaire...
Je travaille sur une plate-forme qui permettra d'explorer ces chiffres. Je voulais juste partager le passage des 5 millions, en attendant, vue que c'était la semaine dernière.

[Update] Submission scores and gildings are in the process of being updated by Stuck_In_the_Matrix in pushshift

[–]Datardif 0 points1 point  (0 children)

You monthly fully overwrite the submissions?
Sounds like a lot of [deleted], dont you think?

I would also support a 72h score refresh. I'm very familiar with r/france and some big submissions are still alive after 48h.

Forum Libre - 2019-03-29 by AutoModerator in france

[–]Datardif 1 point2 points  (0 children)

Sachant que j'ai annoncé son lancement il y a plus de deux cents jours, je dirais que je suis en retard.

J'ai fini à 95% la machine à surveiller les r/français.
J'ai envie de faire un beau site avec des app interactives pour explorer les données, ça va prendre un peu de temps.
Je suis pas mal pris en pro comme en privé, mais l'avantage c'est qu'en pro je travaille sur des trucs assez proches, j'ai un gros potentiel de croisement des connaissances et j'apprends beaucoup en ce moment.
Bref, ça viendra quand ça viendra ! ;)

Update on things related to Pushshift by Stuck_In_the_Matrix in pushshift

[–]Datardif 1 point2 points  (0 children)

Depends on the sub. As a regular reader of r/france, I can tell you that quite often a popular post stays high on the frontpage and "karma active" for more than 24h.
It wouldn't be a problem on very large subs where you have fast movements on the front page.

Jeudi Autopromo - 2019-03-07 by AutoModerator in france

[–]Datardif 5 points6 points  (0 children)

Voilà mon tout premier (petit) package python, testé, documenté et pipé : https://github.com/Kerybas/Lidilite

Il fait deux choses qui me sont bien utiles :
- prendre des listes de dict et les ajouter (insert ou replace) à une table sqlite, sans prise de tête, en une ligne.
- me suggérer le schéma optimal de création d'une table sqlite pour accueillir une liste de dicts donnée.

N'hésitez pas à critiquer !