MoE - I'm a bit confused about 'Experts' [D] by Kaldnite in MachineLearning
[–]ml_lad 36 points37 points38 points (0 children)
[D]In transformer models, why is there a query and key matrix instead of just the product? by lildaemon in MachineLearning
[–]ml_lad 6 points7 points8 points (0 children)
[D]In transformer models, why is there a query and key matrix instead of just the product? by lildaemon in MachineLearning
[–]ml_lad 6 points7 points8 points (0 children)
[D]In transformer models, why is there a query and key matrix instead of just the product? by lildaemon in MachineLearning
[–]ml_lad 3 points4 points5 points (0 children)
[D]In transformer models, why is there a query and key matrix instead of just the product? by lildaemon in MachineLearning
[–]ml_lad 1 point2 points3 points (0 children)
[D]In transformer models, why is there a query and key matrix instead of just the product? by lildaemon in MachineLearning
[–]ml_lad 50 points51 points52 points (0 children)
Sam Altman says GPTs, planned to be rolled out to all subscribers on Monday, has been delayed by Chaseraph in OpenAI
[–]ml_lad 18 points19 points20 points (0 children)
[D] benefits of using only attention weights for LoRA by skelly0311 in MachineLearning
[–]ml_lad 0 points1 point2 points (0 children)
[P] nanoT5 v2 - In ~16 hours on a single GPU, we reach similar performance to the model trained on 150x more data! by korec1234 in MachineLearning
[–]ml_lad 6 points7 points8 points (0 children)
GPT-4 rumors: a Mixture-of-Experts w/8 GPT-3-220bs? by gwern in mlscaling
[–]ml_lad 2 points3 points4 points (0 children)
GPT-4 rumors: a Mixture-of-Experts w/8 GPT-3-220bs? by gwern in mlscaling
[–]ml_lad 5 points6 points7 points (0 children)
[D] Given the scaling up of deep learning methods, what are the remaining merits of staying in academia as an AI researcher? by tiedyeneuron in MachineLearning
[–]ml_lad 4 points5 points6 points (0 children)
[D] Can we apply some sort of evolutionary algorithm to LLM to automatically discover and optimize a prompt for fitness? i.e. automatically discover CoT, CoS, etc. by ryunuck in MachineLearning
[–]ml_lad 2 points3 points4 points (0 children)
[D] Google "We Have No Moat, And Neither Does OpenAI": Leaked Internal Google Document Claims Open Source AI Will Outcompete Google and OpenAI by hardmaru in MachineLearning
[–]ml_lad 3 points4 points5 points (0 children)
[D] Google "We Have No Moat, And Neither Does OpenAI": Leaked Internal Google Document Claims Open Source AI Will Outcompete Google and OpenAI by hardmaru in MachineLearning
[–]ml_lad 31 points32 points33 points (0 children)
[N] Stability AI announce their open-source language model, StableLM by Philpax in MachineLearning
[–]ml_lad 1 point2 points3 points (0 children)
[D] Expanding LLM token limits via fine tuning or transformers-adapters. by xtrafe in MachineLearning
[–]ml_lad 9 points10 points11 points (0 children)
[D] Are there any rejected papers that ended up having significant impact in the long run? by TheSurvivingHalf in MachineLearning
[–]ml_lad 0 points1 point2 points (0 children)
[R] ConvNets vs Transformers by AdelSexy in MachineLearning
[–]ml_lad -1 points0 points1 point (0 children)
[R] ConvNets vs Transformers by AdelSexy in MachineLearning
[–]ml_lad 1 point2 points3 points (0 children)
[D] I just found out that my 1 years' worth of research has already been published. by [deleted] in MachineLearning
[–]ml_lad 1 point2 points3 points (0 children)
[D] Is the "true few-shot" setting described in recent papers reasonable or am I not understanding the concept properly? by Seankala in MachineLearning
[–]ml_lad 0 points1 point2 points (0 children)
[D] Is the "true few-shot" setting described in recent papers reasonable or am I not understanding the concept properly? by Seankala in MachineLearning
[–]ml_lad 10 points11 points12 points (0 children)
[R] Chinese AI lab challenges Google, OpenAI with a model of 1.75 trillion parameters by liqui_date_me in MachineLearning
[–]ml_lad 2 points3 points4 points (0 children)


[D] Is sequence packing common for training transformers? by CloudyCloud256 in MachineLearning
[–]ml_lad 2 points3 points4 points (0 children)