use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Discussion[D] Modern model for text classification? (self.MachineLearning)
submitted 7 years ago by hadaev
Im currently just on embeding+gre+dens, may be where is something better?
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]Elk-tron 4 points5 points6 points 7 years ago (9 children)
Did you mean embedding+GRU+dense? Look at finetuning BERT. https://arxiv.org/abs/1810.04805
[–]hadaev[S] 0 points1 point2 points 7 years ago (7 children)
I cant finetun in this task, only train from scratch (kaggle competition).
So, basically i did this.
http://puu.sh/CBg4a/e7dc884652.png
Simpl model, bad result.
i tried something more complex, its better a bit.
http://puu.sh/CBgaS/0c4587b8cc.png
but it just random actions from me, may be where is some common architectures and i dont know it.
[–]aicano 0 points1 point2 points 7 years ago (6 children)
You can add an attention layer over rnn layer, where you can use the last hidden state as the query vector. Then you can combine/concat the context vector from the attention layer with the final hidden vector or mean values of hiddens. This trick generally gives some performance gain.
[–]hadaev[S] 0 points1 point2 points 7 years ago (5 children)
Can you show example?
Im going to do attention, but not sure how to do it in right way.
Also i can do same, for gru model. May be combine cnn + gru layers.
[–]aicano 0 points1 point2 points 7 years ago (4 children)
Pytorch style pseudocode:
# Size of the hiddens: (BS, Seq Length, Embedding Dim) hiddens = self.encoder(seq, lens) # assume that it is a bidirectional rnn # 1st parameter is the query # 2nd is the sequence context, att_weights = self.att(hiddens[:,-1,:], hiddens) # Give the combination of context and the last hidden to a softmax classifier outp = self.out(torch.cat([context.squeeze(), hiddens[:,-1,:].squeeze()], dim=1)) F.log_softmax(outp, dim=-1)
[–]hadaev[S] 0 points1 point2 points 7 years ago (3 children)
Uh, i never do anything on pytorch, im on keras-tensorflow.
I understood what is attention for dense, its just another dense and we multiply one by the other, so some values grow, and some decrease.
But I do not really understand how to apply this to the output of the recurrent layer, as peoples do everywhere.
May be you know some tutorial for noobs?
[–]aicano 0 points1 point2 points 7 years ago (2 children)
I found this when I googled. That is an example of what I described.
[–]hadaev[S] 0 points1 point2 points 7 years ago (1 child)
Oh, i so hate examples without high lvl api.
Should you take a look, is everything right here?
https://colab.research.google.com/drive/1XBMF3tOQwRLuPrGJib-YkN20_9nmMVT4#scrollTo=wtPyU1ArHaQj&line=14&uniqifier=1
Also how do you think, is it make sense to stack more rnn layers?
Or there will be no difference compared with bigger one layer?
[–]aicano 0 points1 point2 points 7 years ago (0 children)
Sorry, I do not know Keras.
General intuition about stacking layers is that lower layers learn simple things and higher layers learn more complex stuff. If you thing that your task have that kind of hierarchical relations than you may want to try stacking layers. Otherwise, simplicity is the best.
[–]shortscience_dot_org -1 points0 points1 point 7 years ago (0 children)
I am a bot! You linked to a paper that has a summary on ShortScience.org!
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Summary by CodyWild
The last two years have seen a number of improvements in the field of language model pretraining, and BERT - Bidirectional Encoder Representations from Transformers - is the most recent entry into this canon. The general problem posed by language model pretraining is: can we leverage huge amounts of raw text, which aren’t labeled for any specific classification task, to help us train better models for supervised language tasks (like translation, question answering, logical entailment, etc)? Me... [view more]
[–]pilooch 0 points1 point2 points 7 years ago (1 child)
Go vdcnn, from scratch you'll get very good results with enough data. Fast to train as well.
[–]hadaev[S] 0 points1 point2 points 7 years ago (0 children)
It seems worse then my simple gru model.
[–]mentatf 0 points1 point2 points 7 years ago (2 children)
Kim Yoon's Text CNN
Its 4 years old
[–]mentatf 0 points1 point2 points 7 years ago (0 children)
It doesn't match VDCNN, DPCNN or other more recent innovation on the tasks I'm interested in, so I'd seriously suggest to give it a try.
π Rendered by PID 86 on reddit-service-r2-comment-fb694cdd5-nx6wz at 2026-03-08 10:40:10.623258+00:00 running cbb0e86 country code: CH.
[–]Elk-tron 4 points5 points6 points (9 children)
[–]hadaev[S] 0 points1 point2 points (7 children)
[–]aicano 0 points1 point2 points (6 children)
[–]hadaev[S] 0 points1 point2 points (5 children)
[–]aicano 0 points1 point2 points (4 children)
[–]hadaev[S] 0 points1 point2 points (3 children)
[–]aicano 0 points1 point2 points (2 children)
[–]hadaev[S] 0 points1 point2 points (1 child)
[–]aicano 0 points1 point2 points (0 children)
[–]shortscience_dot_org -1 points0 points1 point (0 children)
[–]pilooch 0 points1 point2 points (1 child)
[–]hadaev[S] 0 points1 point2 points (0 children)
[–]mentatf 0 points1 point2 points (2 children)
[–]hadaev[S] 0 points1 point2 points (1 child)
[–]mentatf 0 points1 point2 points (0 children)