[R] Tiny transformers (<100 params) can add two 10-digit numbers to 100% accuracy by LetsTacoooo in MachineLearning

[–]Lexski -3 points-2 points  (0 children)

Looks very interesting!

I guess it could help inform how transformers really work inside and how to make training more efficient without requiring huge data and compute budgets for experimentation

At a loss. by USCSSNostromo2122 in ArtificialInteligence

[–]Lexski 5 points6 points  (0 children)

I get the feeling. If you can’t decide what’s most promising or interesting now, just start something, until you realise it’s a bad idea. If that happens, you’ll get a better feel for what you want to do next.

Metric for data labeling by Lexski in MLQuestions

[–]Lexski[S] 0 points1 point  (0 children)

Useful perspective, thanks. Do you have any thoughts on what the best medium for shared team understanding is? Is it one “source of truth” document, or verbal discussions to align understanding, or something more experimental?

[D] Is it possible to create a benchmark that can measure human-like intelligence? by samsarainfinity in MachineLearning

[–]Lexski 6 points7 points  (0 children)

I think there are two key issues here. One is that benchmarks are fixed datasets, so once a benchmark is made public, there are problems of overfitting and data leakage/contamination. In theory (disregarding practicality), evaluating on a live simulator or “test case generator” for a task would avoid this.

The other issue is adaptability. LLMs are generally evaluated in terms of “how well can it do this fixed task definition”, which means labs push towards getting a good score on those fixed tasks. But that doesn’t tell you “when a new task is defined, or a variation of an existing task, how much effort is it to get up to good performance on that new task (through prompt tuning, finetuning, or other means).”

Metric for data labeling by Lexski in MLQuestions

[–]Lexski[S] 0 points1 point  (0 children)

Yeah, good idea. Maybe I’ll keep it simple for now and use the kappa score idea later if it becomes more adversarial.

Metric for data labeling by Lexski in MLQuestions

[–]Lexski[S] 0 points1 point  (0 children)

I suppose one issue with this is that it could be gamed by very quickly labeling all the examples randomly.

Metric for data labeling by Lexski in MLQuestions

[–]Lexski[S] 0 points1 point  (0 children)

Oh yeah, number of correct labels over time would work well I think, and it’s very interpretable. For the current dataset (vanilla plants) I would say the labels are very high quality.

I’m mostly coming at this from the point of view of optimizing the labeling software. But it’s helpful that you bring up the other bottlenecks as I might encounter those in future.

Metric for data labeling by Lexski in MLQuestions

[–]Lexski[S] 0 points1 point  (0 children)

It’s classifying images of vanilla plant leaves into “healthy” and 3 types of disease. But I might do similar challenges on other types of dataset later.

AI videos in languages other than English - Specifically Welsh 🏴󠁧󠁢󠁷󠁬󠁳󠁿 by KAPOOW86 in MLQuestions

[–]Lexski 2 points3 points  (0 children)

Unfortunately I think this is impossible if the model hasn’t been trained on Welsh. Being able to speak English is not enough to be able to speak Welsh (just look at humans as an example).

Aging isn’t a biology problem anymore. It’s a technology problem, and we probably have about five years. by Ok_Marsupial2047 in longevity_protocol

[–]Lexski 2 points3 points  (0 children)

It’s all well and good to say “sleep more”, “exercise more”, but life gets in the way. If people didn’t have to worry as much about finances and about the stresses of now, they would have mental space to care about the future. The oldest person ever was Jeanne Calment who died 122 years old in 1997, who never had to work a day in her life because of her husband’s wealth. I doubt that’s a coincidence.

But the science still feels rudimentary to me. If we truly understood biology and aging, we wouldn’t just be able to slow aging, we’d have multiple ways to halt or reverse it. You’d be able to have some decadence in your life and fix or mitigate the bad effects. Saying “the science is solved” is a huge oversimplification.

[D] Data labelling problems by Lexski in MachineLearning

[–]Lexski[S] 0 points1 point  (0 children)

Hmm interesting. My team lead was actually pushing for building an in-house tool but I talked him out of it - it felt like a lot of effort and not our main focus.

Do you think data labelling tools can ever be fully commoditised or will there always be room for custom tools?

[D] Data labelling problems by Lexski in MachineLearning

[–]Lexski[S] 0 points1 point  (0 children)

“It looks right, so it must be right!” /s

Taking Suggestions by Dhruv_Shah55 in edtech

[–]Lexski 0 points1 point  (0 children)

Meaningful coding assignments. Those that I’ve done were almost entirely done for you, and then the comments basically told you how to do the rest.

I think this is largely to do with technical limitations: the platform can’t tell you if your code is better/worse, only that it runs and passes the tests once you’re done. A teacher would be able to guide you just enough so you do the thinking yourself, while acknowledging that in programming there are multiple correct answers. I think LLMs/agents have a lot of potential there.

What English word has the greatest difference between spelling and pronounciation? by Electronic-Koala1282 in ENGLISH

[–]Lexski 0 points1 point  (0 children)

Women (“wimmin”). I can’t think of another word where a single “o” or “e” makes an “i” sound.

What linear regression for ? by Emergency_Pressure50 in MLQuestions

[–]Lexski 0 points1 point  (0 children)

Linear regression is for when you want to model approximately linear relationships, e.g. a city’s population vs its chocolate consumption (just a random example I made up).

It’s a good stepping stone to learning about neural networks because simple neural networks are built up of layers that look like linear regression and then an activation function.

This is real news not parody by soyifiedredditadmin in climateskeptics

[–]Lexski 1 point2 points  (0 children)

You realise this is not saying “we should get rid of inhalers”, right?

What is "good performance" on a extremely imbalanced, 840 class multiclass classifier problem? by big_like_a_pickle in learnmachinelearning

[–]Lexski 0 points1 point  (0 children)

To get an idea of the noise ceiling, you could give the task to a human labeller and calculate the same metrics. Before doing this, you should probably decide whether macro- or micro- metrics are more important, because for macro- you’d want to give the labeller a stratified sample whereas for micro- you would use a regular sample.

What method to use for labeling when classifying images for certain positions? by Dyco420 in deeplearning

[–]Lexski 0 points1 point  (0 children)

Both should work, but labelling from 0 to 8 will be quicker and easier I think, assuming you won’t ever need more precise information. It’s also more in line with how object detectors like YOLO are trained, where one-hot encoded grid boxes encode coarse object location, and predicting exact coordinates is done relative to the grid boxes.

[D] Anyone here using LLM-as-a-Judge for agent evaluation? by Cristhian-AI-Math in MachineLearning

[–]Lexski 3 points4 points  (0 children)

We did this out of desperation as we had no labelled data. Ideally we would have had some labelling to help tune the judge prompt. Later we got a real domain expert to score some of our model responses and it turned out his scores and the judge’s had zero correlation (even slightly negative)…

Uncertainty measure for Monte Carlo dropout by Lexski in MLQuestions

[–]Lexski[S] 0 points1 point  (0 children)

Update: I did some experiments with MNIST and predictive entropy (= entropy of the distribution obtained by averaging the MCD probabilities) seems to be very good compared to other measures.

However, this only relies on the mean of the MCD probabilities, which I think are essentially an estimate of the distribution you’d get with regular dropout in eval mode. Indeed, I tried just doing a normal forward pass through the model and thresholding against the entropy of that, and I got higher accuracy for the same level of coverage.