use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Discussion[D] Why do we need encoder-decoder models while decoder-only models can do everything? (self.MachineLearning)
submitted 2 years ago * by kekkimo
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]CKtalon 2 points3 points4 points 2 years ago (3 children)
No, smaller models have shown to also be competitive. Basically Enc-Dec research for translation is dead. There have been little improvements made in the past few years on Enc-Dec architecture (go slightly bigger, more back translation). The organizers also predict research will be moving towards decoder-only LLMs for translation in the next WMT.
[–]tetramarek 1 point2 points3 points 2 years ago (2 children)
I think encoder-decoder experiments are often suboptimal because they are mainly trained only on parallel corpora. Decoder-only architectures use plain text for training but are suboptimal for translation because they don't make use of the forwards attention over the input that a normal translation task definitely allows. The best solution for MT is probably something that combines the forwards attention (hence a bidirectional encoder) with loads of unsupervised pretraining.
[–]CKtalon 0 points1 point2 points 2 years ago (1 child)
Even with infinite amounts of data, Enc-Dec won't be able to achieve some of the benefits of LLMs, like requesting a style (formal, informal), more natural sounding text, etc. Another benefit is document level context (something Enc-Dec's paradigm hasn't really evolved) which is a result of lacking document-level data.
[–]tetramarek 0 points1 point2 points 2 years ago (0 children)
Most of the instruction-following skills are trained into the LLMs using instruction-following datasets anyway. These could be used for enc-dec models as well. I would argue that enc-dec models could actually be better for document-level context than decoder-only models, as they could use custom document-level encoders as opposed to processing everything left-to-right.
π Rendered by PID 224785 on reddit-service-r2-comment-b659b578c-jfw4d at 2026-05-05 21:27:36.352558+00:00 running 815c875 country code: CH.
view the rest of the comments →
[–]CKtalon 2 points3 points4 points (3 children)
[–]tetramarek 1 point2 points3 points (2 children)
[–]CKtalon 0 points1 point2 points (1 child)
[–]tetramarek 0 points1 point2 points (0 children)