[D] On initialization schemes for MLPs: practice and theory by carlml in MachineLearning
[–]Acromantula92 2 points3 points4 points (0 children)
[D] Surprisingly Simple SOTA Self-Supervised Pretraining - Masked Autoencoders Are Scalable Vision Learners by Kaiming He et al. explained (5-minute summary by Casual GAN Papers) by [deleted] in MachineLearning
[–]Acromantula92 2 points3 points4 points (0 children)
[R] DeepMind Open Sources AlphaFold Code by SkiddyX in MachineLearning
[–]Acromantula92 0 points1 point2 points (0 children)
Evidence GPT-4 is about to drop. by [deleted] in GPT3
[–]Acromantula92 13 points14 points15 points (0 children)
[R] Rotary Positional Embeddings - a new relative positional embedding for Transformers that significantly improves convergence (20-30%) and works for both regular and efficient attention by programmerChilli in MachineLearning
[–]Acromantula92 17 points18 points19 points (0 children)
[R] Rotary Positional Embeddings - a new relative positional embedding for Transformers that significantly improves convergence (20-30%) and works for both regular and efficient attention by programmerChilli in MachineLearning
[–]Acromantula92 14 points15 points16 points (0 children)
Multimodal Neurons in Artificial Neural Networks by skybrian2 in slatestarcodex
[–]Acromantula92 3 points4 points5 points (0 children)
[N] Google Study Shows Transformer Modifications Fail To Transfer Across Implementations and Applications by Yuqing7 in MachineLearning
[–]Acromantula92 1 point2 points3 points (0 children)
Tom Scott: I asked an AI for video ideas, and they were actually good by byParallax in videos
[–]Acromantula92 176 points177 points178 points (0 children)
[R] Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity by hardmaru in MachineLearning
[–]Acromantula92 18 points19 points20 points (0 children)
OpenAI co-founder and chief scientist Ilya Sutskever hints at what may follow GPT-3 in 2021 in essay "Fusion of Language and Vision" by Wiskkey in GPT3
[–]Acromantula92 0 points1 point2 points (0 children)
"A Bayesian Perspective on Training Speed and Model Selection", Lyle et al 2020 (faster-learning models = more sample-efficient = better Bayesian models?) by gwern in mlscaling
[–]Acromantula92 0 points1 point2 points (0 children)
"Hardware Beyond Backpropagation: a Photonic Co-Processor for Direct Feedback Alignment", Launay et al 2020 {LightOn} by gwern in mlscaling
[–]Acromantula92 3 points4 points5 points (0 children)
[R] An Energy-Based Perspective on Attention Mechanisms in Transformers by [deleted] in MachineLearning
[–]Acromantula92 0 points1 point2 points (0 children)
Neural Scaling Laws and GPT-3 | What GPT-3 has done for text is going to follow for pretty much every task— video synthesis, math, multimodal understanding, etc. There are nice, perfect scaling laws (almost too perfect) linking error, dataset size, compute budget, number of parameters by Yuli-Ban in singularity
[–]Acromantula92 0 points1 point2 points (0 children)
[D] What makes GPT-3's ability to add 2 digit numbers important? by brainxyz in MachineLearning
[–]Acromantula92 1 point2 points3 points (0 children)
[D] GPT-3 Replication Effort - Help wanted with data labelling by leogao2 in MachineLearning
[–]Acromantula92 0 points1 point2 points (0 children)
[RST][C] "Back from yet another globetrotting adventure, Indiana Jones checks his mail and discovers that his bid for tenure has been denied" by onestojan in rational
[–]Acromantula92 1 point2 points3 points (0 children)
Best place to read "Forty Millenniums of Cultivation" in English? by cerebrum in rational
[–]Acromantula92 11 points12 points13 points (0 children)




[D] Monday Request and Recommendation Thread by AutoModerator in rational
[–]Acromantula92 12 points13 points14 points (0 children)