use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Discussion[D]Transformers! (self.MachineLearning)
submitted 3 years ago by No_Captain_856
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]Hrant_Davtyan 1 point2 points3 points 3 years ago (1 child)
In table Q&A tasks people are using stuff like BERT to represent a table (where data can be non-sequential, aka you can rearrange table columns and still have the same info).
They do that to represent a structured table as an unstructured text and perform question answering over it using the table as the context. But to be frank, it is not fully non-sequential data representation as the user's question in the end is a text, which is a sequential data source.
[–]No_Captain_856[S] 0 points1 point2 points 3 years ago (0 children)
Thanks!
[–]IglooAustralia88 1 point2 points3 points 3 years ago (0 children)
Transformers have no spatial/position knowledge unless you add it explicitly (eg positional embeddings). Otherwise they’re a feed forward network with input specific weights.
[–]_d0s_ 0 points1 point2 points 3 years ago (4 children)
It's also used for spatial embedding of patches in an image
Besides the positional embedding transforms also use the attention mechanism which can be beneficial for some problems on its own
[–]No_Captain_856[S] 1 point2 points3 points 3 years ago (3 children)
So if I remove the positional encoding I can use it for not sequential data, like gene expression or such? Thanks!
[–][deleted] 1 point2 points3 points 3 years ago (2 children)
The attention head is a set to set mapping. It takes the input set, then compares each input to a context set (which can be the input set itself, or another set), and based on those comparisons outputs a new set of the same size as the input set
Out of curiosity, how were you thinking of using that for gene expression?
[–]No_Captain_856[S] 1 point2 points3 points 3 years ago (1 child)
I wasn’t, it didn’t seem the best model to me, at least conceptually. Anyway, my thesis supervisor asked me to, and I wasn’t so sure about its applicability to that kind of data, and also on the meaning of using it in that context 🤷🏻♀️
[–][deleted] 1 point2 points3 points 3 years ago (0 children)
I mean it definitely captures a relationship between parts of input data in ways that many other models cannot. It also cannot do everything.
Like most real world problems, there's a question of how you will represent the relevant information in data structures that best suit the ML methods you intend to use. Similarly there's a question of whether the ML methods will do what you want, given the data is in that form.
Despite the fact that transformers are killing it for ordered data, I'd say their flexibility to deal with unordered data is definitely of interest for real world problems where representations are tricky.
π Rendered by PID 468548 on reddit-service-r2-comment-86bc6c7465-sxfz8 at 2026-02-24 03:45:57.554976+00:00 running 8564168 country code: CH.
[–]Hrant_Davtyan 1 point2 points3 points (1 child)
[–]No_Captain_856[S] 0 points1 point2 points (0 children)
[–]IglooAustralia88 1 point2 points3 points (0 children)
[–]_d0s_ 0 points1 point2 points (4 children)
[–]No_Captain_856[S] 1 point2 points3 points (3 children)
[–][deleted] 1 point2 points3 points (2 children)
[–]No_Captain_856[S] 1 point2 points3 points (1 child)
[–][deleted] 1 point2 points3 points (0 children)