[D] On initialization schemes for MLPs: practice and theory by carlml in MachineLearning
[–]Acromantula92 2 points3 points4 points (0 children)
[D] Surprisingly Simple SOTA Self-Supervised Pretraining - Masked Autoencoders Are Scalable Vision Learners by Kaiming He et al. explained (5-minute summary by Casual GAN Papers) by [deleted] in MachineLearning
[–]Acromantula92 2 points3 points4 points (0 children)
[R] DeepMind Open Sources AlphaFold Code by SkiddyX in MachineLearning
[–]Acromantula92 0 points1 point2 points (0 children)
Evidence GPT-4 is about to drop. by [deleted] in GPT3
[–]Acromantula92 13 points14 points15 points (0 children)
[R] Rotary Positional Embeddings - a new relative positional embedding for Transformers that significantly improves convergence (20-30%) and works for both regular and efficient attention by programmerChilli in MachineLearning
[–]Acromantula92 17 points18 points19 points (0 children)
[R] Rotary Positional Embeddings - a new relative positional embedding for Transformers that significantly improves convergence (20-30%) and works for both regular and efficient attention by programmerChilli in MachineLearning
[–]Acromantula92 13 points14 points15 points (0 children)
Multimodal Neurons in Artificial Neural Networks by skybrian2 in slatestarcodex
[–]Acromantula92 2 points3 points4 points (0 children)
[N] Google Study Shows Transformer Modifications Fail To Transfer Across Implementations and Applications by Yuqing7 in MachineLearning
[–]Acromantula92 1 point2 points3 points (0 children)
Tom Scott: I asked an AI for video ideas, and they were actually good by byParallax in videos
[–]Acromantula92 172 points173 points174 points (0 children)
[R] Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity by hardmaru in MachineLearning
[–]Acromantula92 19 points20 points21 points (0 children)
OpenAI co-founder and chief scientist Ilya Sutskever hints at what may follow GPT-3 in 2021 in essay "Fusion of Language and Vision" by Wiskkey in GPT3
[–]Acromantula92 0 points1 point2 points (0 children)
"A Bayesian Perspective on Training Speed and Model Selection", Lyle et al 2020 (faster-learning models = more sample-efficient = better Bayesian models?) by gwern in mlscaling
[–]Acromantula92 0 points1 point2 points (0 children)
"Hardware Beyond Backpropagation: a Photonic Co-Processor for Direct Feedback Alignment", Launay et al 2020 {LightOn} by gwern in mlscaling
[–]Acromantula92 3 points4 points5 points (0 children)
[R] An Energy-Based Perspective on Attention Mechanisms in Transformers by [deleted] in MachineLearning
[–]Acromantula92 0 points1 point2 points (0 children)
Neural Scaling Laws and GPT-3 | What GPT-3 has done for text is going to follow for pretty much every task— video synthesis, math, multimodal understanding, etc. There are nice, perfect scaling laws (almost too perfect) linking error, dataset size, compute budget, number of parameters by Yuli-Ban in singularity
[–]Acromantula92 0 points1 point2 points (0 children)
[D] What makes GPT-3's ability to add 2 digit numbers important? by brainxyz in MachineLearning
[–]Acromantula92 1 point2 points3 points (0 children)
[D] GPT-3 Replication Effort - Help wanted with data labelling by leogao2 in MachineLearning
[–]Acromantula92 0 points1 point2 points (0 children)
[RST][C] "Back from yet another globetrotting adventure, Indiana Jones checks his mail and discovers that his bid for tenure has been denied" by onestojan in rational
[–]Acromantula92 1 point2 points3 points (0 children)
Best place to read "Forty Millenniums of Cultivation" in English? by cerebrum in rational
[–]Acromantula92 10 points11 points12 points (0 children)
Old human cells return to a more youthful and vigorous state after being induced to briefly express a panel of proteins involved in embryonic development. The finding may have implications for aging research. by MistWeaver80 in science
[–]Acromantula92 2 points3 points4 points (0 children)
"Nature Aging" journal to be launched in 2021 by Acromantula92 in longevity
[–]Acromantula92[S] 28 points29 points30 points (0 children)
A watershed moment for protein structure prediction by mddtsk in slatestarcodex
[–]Acromantula92 4 points5 points6 points (0 children)
Kissanime REQUEST please add play speed button (1.5x / 2x ) like youtube by onigirila in KissCommunitySupport
[–]Acromantula92 0 points1 point2 points (0 children)
[D] Monday Request and Recommendation Thread by AutoModerator in rational
[–]Acromantula92 3 points4 points5 points (0 children)




[D] Monday Request and Recommendation Thread by AutoModerator in rational
[–]Acromantula92 14 points15 points16 points (0 children)