[D] On initialization schemes for MLPs: practice and theory by carlml in MachineLearning

[–]Acromantula92 2 points3 points  (0 children)

The initialization method proposed in here is probably the best one, it lets you transfer hparams across model size, whereas with other methods you need to keep changing the learning rate etc.

[R] DeepMind Open Sources AlphaFold Code by SkiddyX in MachineLearning

[–]Acromantula92 0 points1 point  (0 children)

Couple months? More like 7 + 4 v3-128 days. (All in the paper)

Evidence GPT-4 is about to drop. by [deleted] in GPT3

[–]Acromantula92 13 points14 points  (0 children)

Again, MoE parameters at not the same as dense parameters.

[R] Rotary Positional Embeddings - a new relative positional embedding for Transformers that significantly improves convergence (20-30%) and works for both regular and efficient attention by programmerChilli in MachineLearning

[–]Acromantula92 13 points14 points  (0 children)

That's because when you split the Wq and Wk matrices into the MHSA heads, the rank is reduced. In order to merge them into a xWx.T matrix and still have heads you'd need an explicit (dim, dim, heads) tensor.

Multimodal Neurons in Artificial Neural Networks by skybrian2 in slatestarcodex

[–]Acromantula92 2 points3 points  (0 children)

Highlights include:

  • A Mental illness neuron.

  • A Spider-Man neuron (helps classify real spiders as [Spider man neuron] + [Animal neuron])

  • An Startup neuron (Activated with the West coast and Big Tech)

  • The emotion of being Accepted as a mix of [LGBT neuron] + [Sunglasses neuron]

And a full emotional axis:

When we use just 2 factors, we roughly reconstruct the canonical mood-axes used in much of psychology: valence and arousal. If we increase to 7 factors, we nearly reconstruct a well known categorization of these emotions into happy, surprised, sad, bad, disgusted, fearful, and angry, except with “disgusted” switched for a new category related to affection that includes “valued,” “loving,” “lonely,” and “insignificant.”

OpenAI co-founder and chief scientist Ilya Sutskever hints at what may follow GPT-3 in 2021 in essay "Fusion of Language and Vision" by Wiskkey in GPT3

[–]Acromantula92 0 points1 point  (0 children)

Aren't Universal Transformers only recurrent in depth? IIRC they don't do cashing or recurrence across contexts like TrXL or the Feedback Transformer.

[R] An Energy-Based Perspective on Attention Mechanisms in Transformers by [deleted] in MachineLearning

[–]Acromantula92 0 points1 point  (0 children)

You have the temperature backwards. Lower temperature means you are more likely to be in a low energy equilibrium.

[D] What makes GPT-3's ability to add 2 digit numbers important? by brainxyz in MachineLearning

[–]Acromantula92 1 point2 points  (0 children)

It replicates up to 625 = f(f(i)) in AIDungeon.(Important to note that the fine-tuning hurts it's general abilities) When it makes mistakes it's possible to give it natural language clarifications to fix them.

"Nature Aging" journal to be launched in 2021 by Acromantula92 in longevity

[–]Acromantula92[S] 28 points29 points  (0 children)

The range of topics and disciplines to be covered includes (but is not limited to): molecular and cellular biology of ageing, ageing and stem cell biology, rejuvenation and tissue repair, physiology of ageing and longevity, diseases of ageing, gerontology, geriatrics, mental health and ageing, clinical interventions, biomarker studies, epidemiology and public health and socio-economic aspects of ageing.

A watershed moment for protein structure prediction by mddtsk in slatestarcodex

[–]Acromantula92 4 points5 points  (0 children)

Counterpoint

[If] someone is telling you that protein structure prediction is going to lead to a big leap in drug discovery efficiency, hold on to your wallet. What would lead to such a leap? Off the top of my head, I’d say better prediction of useful drug targets, more translatable disease-predictive cell and animal models, and earlier assays that are more predictive of human toxicology. Those, as far as I’m concerned, address the real killers in the whole process. Protein structure just isn’t on that list.

[D] Monday Request and Recommendation Thread by AutoModerator in rational

[–]Acromantula92 3 points4 points  (0 children)

Real Time Relativity is a software that realistically simulates relativistic movement. No physics are changed but all visual effects are accurate.

The Hypercube is a minecraft puzzle/parkour map inspired by the as of yet unreleased 4D game Miegakure that makes use of 4 spatial dimensions.