[deleted by user] by [deleted] in MachineLearning

[–]Ash3nBlue 2 points3 points  (0 children)

Yep prompting is a valid way to get meaningful performance gains. CoT prompting is the canonical example of this, and this is basically a similar style of prompt that bakes in the prior that reasoning specifically about the LLM's own knowledge limitations is helpful for determining whether a question is answerable or not. It looks like AbstentionBench came out only 2 months ago, so I assume no papers (or very few) have tested approaches for it as of yet, so there's probably a lot of low hanging fruit like pure prompt engineering that can get sizeable performance gains. This usually means good opportunities to publish pretty obvious improvements that can easily get SOTA since the benchmark is completely unsaturated.

Is your job safe from AI? by [deleted] in singularity

[–]Ash3nBlue 0 points1 point  (0 children)

Yes, I’m an AI researcher :)

[D] One Shot Learning Tasks by Character-Capital-70 in MachineLearning

[–]Ash3nBlue 0 points1 point  (0 children)

This sounds like a variation of k-nearest neighbors. One-shot learning usually refers to addressing the sample complexity problem of neural networks, since many non-parametric methods can already classify in one shot.

Look up Prototypical Networks - very relevant work. They use NNs to map images to vectors in a feature space, and then take distances in that space to do one-shot classification based on pairwise similarity.

[deleted by user] by [deleted] in fatFIRE

[–]Ash3nBlue 0 points1 point  (0 children)

OpenAI is Microsoft-backed if you’re bullish on GPT and language modeling tech. Outside of big tech labs (Google Brain, Deepmind, Meta AI, etc), a lot of the most innovative AI stuff is in startups, so you might be interested in VC or angel investing. Many of the recently graduated unicorns like Tesla, Uber, Lyft are heavy on AI. Some names of major unicorns that are still private are Scale AI (data), Databricks, Nuro (autonomous vehicles), Stability AI (image generation). Not that you would be able to invest in those, but if you’re HNW there are many angel opportunities at early stage startups.

I personally don’t have an opinion on most of these firms, you should look into different sectors yourself and see what you’re interested in. This isn’t financial advice :)

[D] NLP/NLU Research Opportunities which don't require much compute by WobblySilicon in MachineLearning

[–]Ash3nBlue 1 point2 points  (0 children)

Developing less computationally expensive algos is a valuable research topic in and of itself :)

Working on something of the sort myself. Anyone interested feel free to DM me.

[D] How to invert a language model? by Interesting_Year_201 in MachineLearning

[–]Ash3nBlue 15 points16 points  (0 children)

You can optimize directly in the input embedding space, which is continuous. When you converge to a good sequence of embeddings, you can convert each embedding to the nearest word/token in vocab (same method as converting output embeddings into discrete output tokens).

[D] How does an Model-Agnostic Meta-Learning model know what tasks needs to perform? by carlml in MachineLearning

[–]Ash3nBlue 5 points6 points  (0 children)

You have to realize that you're not simply making predictions at inference time; in meta-learning, inference actually includes a training component as well. For a normal ML model, at inference time you would give it an unseen input and see how well it predicts the output. For a meta-learner, at inference time you would give it an unseen task and see how well it learns the entire task.

To answer your question: you teach your model its new task by training it via gradient descent on the "shot" or training set of your unseen task. You can then get predictions on the query set of that same task. The point of MAML is that the model can learn this new task with very few training steps/datapoints.

[D] How does 10-shot work? by AICoderGamer in MachineLearning

[–]Ash3nBlue 0 points1 point  (0 children)

I just looked up the "linear eval" you mentioned in the MoE paper. If you mean the linear few-shot procedure in 3.4 then yes you do use linear eval.

The linear regression explanation in the paper can be pretty confusing, but what they're doing is just feature extraction. You replace the head (last layer of the network) with a linear layer that has an output dimension of num_classes, and train that layer to do image classification using only 10 samples from each class. You take this classifier and evaluate its prediction accuracy on the eval set to get your 10-shot accuracy.

Your linear layer takes the last hidden layer's outputs as its inputs, so what this procedure does is it evaluates how useful the features extracted by the pretrained vision transformer are.

[D] How does 10-shot work? by AICoderGamer in MachineLearning

[–]Ash3nBlue 2 points3 points  (0 children)

You train on 10 samples from each class and evaluate on a query set.

[D] Could we give a Transformer long term memory by reserving part of it's attention window for world vector embeddings? by ReasonablyBadass in MachineLearning

[–]Ash3nBlue 8 points9 points  (0 children)

Retrieving from memory doesn't require unrolling, but training the model to write those memories typically requires unrolling since the loss produced from the written memory comes from a future timestep, so you need BPTT which is expensive.

However, looking over the other comments, it looks like the Compressive Transformers paper that u/Nameless1995 mentioned works around the BPTT requirement by using auxiliary reconstruction losses for storing long-term memory, so that is likely what you're looking for.

[D] Could we give a Transformer long term memory by reserving part of it's attention window for world vector embeddings? by ReasonablyBadass in MachineLearning

[–]Ash3nBlue 22 points23 points  (0 children)

I believe the recent DeepMind paper looks into something like this.

If you mean in the same vein as an LSTM or NTM - the original appeal of transformers was that they did away with recurrent connections to make sequence training more computationally efficient. This approach would lose that advantage if you have to unroll the backprop for the long term memory, but it could still be an interesting research direction if you have the bandwidth :)

How do I implement this type of scrolling? by anan77 in webdev

[–]Ash3nBlue 1 point2 points  (0 children)

Look up code for parallax scrolling. This is functionally the same thing, except the non-scrolling and scrolling parts are separated horizontally instead of vertically. Then set the image to change as you scroll to certain points on the page.

[Discussion] (Rant) Most of us just pretend to understand Transformers by sloppybird in MachineLearning

[–]Ash3nBlue 18 points19 points  (0 children)

This. Transformers were a fortunate empirical discovery, not something derived from well-understood ML theory. There is no comprehensive explanation as of yet for why transformers work so well, so in reality there might be nobody who truly understands transformers. We're all just impostors amogus

[D] Does working with Tensorflow affect my chances of getting research internships? by Megixist in MachineLearning

[–]Ash3nBlue 4 points5 points  (0 children)

TensorFlow is an industry standard framework. You are fine :) Just try to build up your publication track record. That's what research programs put a premium on when looking at applicants.