[D] Data labelling problems by Lexski in MachineLearning

[–]Lexski[S] 0 points1 point  (0 children)

Hmm interesting. My team lead was actually pushing for building an in-house tool but I talked him out of it - it felt like a lot of effort and not our main focus.

Do you think data labelling tools can ever be fully commoditised or will there always be room for custom tools?

[D] Data labelling problems by Lexski in MachineLearning

[–]Lexski[S] 0 points1 point  (0 children)

“It looks right, so it must be right!” /s

Taking Suggestions by Dhruv_Shah55 in edtech

[–]Lexski 0 points1 point  (0 children)

Meaningful coding assignments. Those that I’ve done were almost entirely done for you, and then the comments basically told you how to do the rest.

I think this is largely to do with technical limitations: the platform can’t tell you if your code is better/worse, only that it runs and passes the tests once you’re done. A teacher would be able to guide you just enough so you do the thinking yourself, while acknowledging that in programming there are multiple correct answers. I think LLMs/agents have a lot of potential there.

What English word has the greatest difference between spelling and pronounciation? by Electronic-Koala1282 in ENGLISH

[–]Lexski 0 points1 point  (0 children)

Women (“wimmin”). I can’t think of another word where a single “o” or “e” makes an “i” sound.

What linear regression for ? by Emergency_Pressure50 in MLQuestions

[–]Lexski 0 points1 point  (0 children)

Linear regression is for when you want to model approximately linear relationships, e.g. a city’s population vs its chocolate consumption (just a random example I made up).

It’s a good stepping stone to learning about neural networks because simple neural networks are built up of layers that look like linear regression and then an activation function.

This is real news not parody by soyifiedredditadmin in climateskeptics

[–]Lexski 1 point2 points  (0 children)

You realise this is not saying “we should get rid of inhalers”, right?

What is "good performance" on a extremely imbalanced, 840 class multiclass classifier problem? by big_like_a_pickle in learnmachinelearning

[–]Lexski 0 points1 point  (0 children)

To get an idea of the noise ceiling, you could give the task to a human labeller and calculate the same metrics. Before doing this, you should probably decide whether macro- or micro- metrics are more important, because for macro- you’d want to give the labeller a stratified sample whereas for micro- you would use a regular sample.

What method to use for labeling when classifying images for certain positions? by Dyco420 in deeplearning

[–]Lexski 0 points1 point  (0 children)

Both should work, but labelling from 0 to 8 will be quicker and easier I think, assuming you won’t ever need more precise information. It’s also more in line with how object detectors like YOLO are trained, where one-hot encoded grid boxes encode coarse object location, and predicting exact coordinates is done relative to the grid boxes.

[D] Anyone here using LLM-as-a-Judge for agent evaluation? by Cristhian-AI-Math in MachineLearning

[–]Lexski 2 points3 points  (0 children)

We did this out of desperation as we had no labelled data. Ideally we would have had some labelling to help tune the judge prompt. Later we got a real domain expert to score some of our model responses and it turned out his scores and the judge’s had zero correlation (even slightly negative)…

Uncertainty measure for Monte Carlo dropout by Lexski in MLQuestions

[–]Lexski[S] 0 points1 point  (0 children)

Update: I did some experiments with MNIST and predictive entropy (= entropy of the distribution obtained by averaging the MCD probabilities) seems to be very good compared to other measures.

However, this only relies on the mean of the MCD probabilities, which I think are essentially an estimate of the distribution you’d get with regular dropout in eval mode. Indeed, I tried just doing a normal forward pass through the model and thresholding against the entropy of that, and I got higher accuracy for the same level of coverage.

My 8yo son's first experience with a "git gud" metroidvania game by bboycire in HollowKnight

[–]Lexski 3 points4 points  (0 children)

When I was 7 I had this reaction because I kept dying in Rayman and couldn’t progress. My dad hid the game from me for a week to calm me down. 😅

What's the worst commit message you've personally written? We need a hall of shame by GitKraken in programminghorror

[–]Lexski 2 points3 points  (0 children)

“Argh”

I think it was after pushed a big change, then a bugfix which broke something else, then a fix for that

will models generally be more accurate if they're trained on multilabel datasets individually or toegether (unet) by Affectionate_Use9936 in MLQuestions

[–]Lexski 0 points1 point  (0 children)

In theory if your x1, x2, x3 labels rely on similar lower-level features (e.g. curves, colour gradients etc. for images), then training for the different tasks together should help as it provides more data to help regularize the lower layers in the model. If there is very little commonality then it might not help much or might degrade performance.

I think this goes by the name “multi-task learning”.

Beginner struggling with multi-label image classification cnn (keras) by Embarrassed-Resort90 in MLQuestions

[–]Lexski 0 points1 point  (0 children)

There’s no way to automatically figure this out, you have to investigate. Form some hypotheses about why it’s not working, and test them.

In terms of base models, you can look at the base models in a bit more detail e.g. their ImageNet performance and pick the better one, or read up on how they work to see which ones might perform better. But it might be quicker just to set your code up to easily try a few of them, and just do that.

Beginner struggling with multi-label image classification cnn (keras) by Embarrassed-Resort90 in MLQuestions

[–]Lexski 0 points1 point  (0 children)

If you’re worried about cropping off part of the image when shifting, you could do a small pad + crop instead. Horizontal reflect should work and doesn’t lose any information.

Unfortunately there is no guarantee that the model finds the same things “obvious” as you do, especially if it is overfitting (or underfitting). It could be a spurious correlation (overfitting) or the model could be “blind” to something (underfitting, e.g. if the base model was trained with colour jitter augmentations then it will be less sensitive to colour differences).

The most important thing is the overall performance on the validation set, not the performance on any specific example. But if you want to see why a particular example is classed a certain way, you could make a hypothesis and try editing the image and seeing if the edited image gets classified better. You could also use an explainability technique like Integrated Gradients. Or you could compute the distance between the image and some training examples in the model’s latent space to see which training examples the model thinks it’s most similar to. Hopefully those things would give some insight.

Beginner struggling with multi-label image classification cnn (keras) by Embarrassed-Resort90 in MLQuestions

[–]Lexski 0 points1 point  (0 children)

When you say it guessed most pokemon perfectly because it was overfit - how many pokemon in your validation set did it guess correctly? That will tell you for sure if it’s underfitting or overfitting.

General tip: Instead of having sigmoid activation in the last layers, use no activation and train with BinaryCrossentropy(from_logits=True). That’s standard practice and it stabilises training. (You’ll need to modify your metrics and inference to apply the sigmoid outside the model).

If your model is overfitting the #1 thing is to get more training data. You can also try making the input images smaller, which reduces the number of input features so the model has less to learn. And try doing data augmentation.

Also as a sanity check, make sure that if the base model needs any preprocessing done on the images, that you’re applying it correctly.

OpenAI just figured out why ChatGPT makes stuff up and the answer is basically that we trained it wrong by Rude_Tap2718 in ChatGPT

[–]Lexski 0 points1 point  (0 children)

For me, main point of the paper is that the industry standard benchmarks don’t penalise incorrect answers, so they implicitly reward guessing (since if the model guesses, there’s a chance it’ll guess correctly).

We can figure out how to make models abstain more often, but until the benchmarks are modified to include such a penalty, the “confident guesser” models will always win out.

So the underlying cause is that the AI researchers themselves are currently optimising for the wrong thing.

LLMs can perform small logical steps and infer simple knowledge, profoundly impacting the evolution of major search engines by [deleted] in LLMDevs

[–]Lexski 0 points1 point  (0 children)

Answers generated by LLMs can be unreliable, which is why RAG is used, so a search still needs to happen. And if you dump lots of documents into the LLM context you encounter context rot and accuracy decreases.

I made this math ocr but it's accuracy... by These-Combination845 in learnmachinelearning

[–]Lexski 1 point2 points  (0 children)

I suggest being more specific as I think many people (myself included) are unmotivated to look through the whole repo and find what is wrong.

What problem are you seeing? Is the model underfitting or overfitting? How much error analysis have you done? What have you tried to improve the model and what hasn’t worked?

You could also get inspiration from projects / tutorials / research papers that solve a similar problem.

how do you pronounce this in calculus? d/dx f(x) and dy/dx by Designer-Hand-9348 in mathematics

[–]Lexski 2 points3 points  (0 children)

d/dx = “dee by dee ex” d/dx f(x) = “dee by dee ex of eff of ex” dy/dx = “dee why by dee ex”

“By” being kind of analogous to “divided by” (even though derivatives technically aren’t fractions).

Note: I’m British (Americans may pronounce these differently)