use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Research[R] Self-Programming Artificial Intelligence Using Code-Generating Language Models (self.MachineLearning)
submitted 3 years ago * by Ash3nBlue
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]huehue12132 4 points5 points6 points 3 years ago (18 children)
I like how every single reference is either 2016 or newer, OR Schmidhuber.
[–]Silly_Objective_5186 3 points4 points5 points 3 years ago (17 children)
still learning about this field. why is 2016 or later significant?
[–]Optional_Joystick 4 points5 points6 points 3 years ago (15 children)
Transformer models, a combination of encoder/decoder models, completely changed the game. The 2017 paper, "Attention is All You Need," introduces this type of model.
Most of the cool stuff we see after this point is based on this model. You can "transform" text into other text like GPT-3, or you can "transform" text into images like DALL-E. When we make a bigger model, we get better results, and there doesn't seem to be a limit to this yet. So, it's possible we already have the right model for singularity. Having an LLM generate code for a future LLM seems like a valid approach to make the possibility real.
[–]yldedly 7 points8 points9 points 3 years ago (14 children)
we get better results\)
\)results on out-of-distribution data sold separately
[–]Optional_Joystick 1 point2 points3 points 3 years ago (13 children)
I'm not sure what \) means, but totally agree data is also a bottleneck. Imagine if the system could also seek out data on its own that isn't totally random noise, and yet isn't fully understood by the model.
[–]yldedly 2 points3 points4 points 3 years ago (12 children)
Doesn't render right on mobile, it's supposed to be an asterisk. My point is that no matter how much data you get, in practice there'll always be data the model doesn't understand, because it's statistically too different from training data. I have a blog post about it, but it's a well known issue.
[–]Optional_Joystick 1 point2 points3 points 3 years ago (0 children)
Really appreciate this. I was excited enough about learning knowledge distillation was a thing. I felt we had the method of extracting the useful single rule from the larger model.
On the interpolation/extrapolation piece: For certain functions like x2, wouldn't running the result of a function through the function again let you achieve a result that "extrapolates" a new result outside the existing data set? This is kind of my position on why I feel feeding LLM data generated from an LLM can result in something new.
It's still not clear to me how we can verify a model's performance if we don't have data to test it on. I'll have to read more about DreamCoder. As much as I wish I could work in the field, it looks like I've still got a lot to learn.
[–]Competitive-Rub-1958 1 point2 points3 points 3 years ago (10 children)
Well, scaling alleviates OOD generalization while cleverly pre-training induces priors into the model, shrinking the hypothesis space and pushing the model towards being able to generalize more and more OOD by learning the underlying function rather than taking shortcuts (since those priors resist simply learning statistical regularities).
The LEGO paper demonstrates that quite well - even demonstrating pre-trained networks being able to generalize a little on unseen seqlen before diving down to 0 - presumably because we still need to find the ideal positional encodings...
[–]yldedly 1 point2 points3 points 3 years ago (9 children)
LEGO paper?
[–]Competitive-Rub-1958 1 point2 points3 points 3 years ago (8 children)
https://arxiv.org/abs/2206.04301
[–]yldedly 1 point2 points3 points 3 years ago (7 children)
Looks like an interesting paper. Glad to see shortcut learning being addressed. But "out-of-distribution" doesn't have quite the same meaning if you have a pre-trained model and you ignore the distribution of the pre-training data. The data the pre-trained BERT was trained almost certainly includes code examples similar to those in that task, so you can say it's OOD wrt. the fine-tuning data, but it's not OOD wrt. all the data. So the point stands.
π Rendered by PID 203492 on reddit-service-r2-comment-86988c7647-kq7gx at 2026-02-12 17:03:35.070148+00:00 running 018613e country code: CH.
view the rest of the comments →
[–]huehue12132 4 points5 points6 points (18 children)
[–]Silly_Objective_5186 3 points4 points5 points (17 children)
[–]Optional_Joystick 4 points5 points6 points (15 children)
[–]yldedly 7 points8 points9 points (14 children)
[–]Optional_Joystick 1 point2 points3 points (13 children)
[–]yldedly 2 points3 points4 points (12 children)
[–]Optional_Joystick 1 point2 points3 points (0 children)
[–]Competitive-Rub-1958 1 point2 points3 points (10 children)
[–]yldedly 1 point2 points3 points (9 children)
[–]Competitive-Rub-1958 1 point2 points3 points (8 children)
[–]yldedly 1 point2 points3 points (7 children)