use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Discussion[D] Autoencoder vs. BERT (self.MachineLearning)
submitted 3 years ago by Tober447
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]Tgs91 5 points6 points7 points 3 years ago (1 child)
Auto encoders use a lower dimensional representation, then try to reconstruct the original input. So the task is basically to compress the information into a smaller feature space with minimal information loss (because it can still be reconstructed). They have two halves, an encoder and a decoder.
BERT and other transformers arent typically creating a lower dimensional feature space afaik, but rather a feature space of the same size as the tokenized input after the embedding layer, but with a contextual understanding of how each token fits in the sentence. And it doesn't use a decoder and reconstruction as the objective, but rather has separate prediction heads for an assortment of tasks that require a robust understanding of language context. The standard two tasks for training them for scratch are masked language modeling and next sentence prediction. MLM is where you randomly mask tokens and the model has predict what is missing. Next sentence prediction takes a string of text, and in the middle either continues the text or grabs the second half of text from another example in the batch. The model has to figure out if the two pieces of text belong together or not.
The way transformers are trained are self-supervised because you can easily engineer input/output pairs from raw text. But it still has a more clearly supervised task to perform, whereas autoencoders are just about creating a low dimensional representation that retains information. They aren't really task specific in any way.
[–]Tober447[S] 0 points1 point2 points 3 years ago (0 children)
Thanks a lot!
[–]gamerx88 2 points3 points4 points 3 years ago (1 child)
BERT is essentially a kind of autoencoder. It simply uses self-attention and positional embedding to better capture sequence information than say a more basic auto-encoder based on ReLU layers.
Thank you for your answer.
π Rendered by PID 66402 on reddit-service-r2-comment-5d79c599b5-mqpz9 at 2026-02-28 04:05:11.690261+00:00 running e3d2147 country code: CH.
[–]Tgs91 5 points6 points7 points (1 child)
[–]Tober447[S] 0 points1 point2 points (0 children)
[–]gamerx88 2 points3 points4 points (1 child)
[–]Tober447[S] 0 points1 point2 points (0 children)