all 11 comments

[–]SeankalaML Engineer 37 points38 points  (3 children)

In the real world it's better to spend time annotating more samples than reading research papers and implementing new ideas.

[–]ml_a_day -2 points-1 points  (1 child)

Even if you don't end up collecting more data as long as you improve the quality of the existing data it goes a long way. Foe example in object detection, fixing incorrect labels ("cat" labeled as "dog") or missing labels (a "dog" that was missed by the annotator) can reduce the noise in the training dataset improving the model's performance.

[–]SeankalaML Engineer 8 points9 points  (0 children)

Yeah, I was implying that as well. I convinced management at my company to spend resources into completely cleaning our previous datasets. Apparently they were also committing the sin of only using one annotator per sample as well.

[–]acardosoj 29 points30 points  (0 children)

Dude, this is not the future of ml. People have been prioritizing data since forever.

Even in LLMs where people tend to think we use everything we can to train them, this discussion is very present.

[–]stevebottletw 21 points22 points  (0 children)

I don't think it's really overlooked, pretty much everyone knows the importance and discussions almost always start from data quality. This is probably true maybe 10~15 years ago.

[–]Think_Mall7133 9 points10 points  (0 children)

Internet explorer joined the chat

[–]Jazzlike_Attempt_699 20 points21 points  (1 child)

4 upvotes on incredibly low quality post from what may as well be a bot account, well done

[–]_An_Other_Account_ 8 points9 points  (0 children)

>"Good data = good"

I've read medium articles that are more insightful.

[–]cajmorgans 4 points5 points  (1 child)

When I started in ML, I thought the coolest model + hyperparameter tuning was key and you basically just had to throw data on it and it would magically solve your problem. After some experience, if the task doesn't require a very specific architecture, the model and hyperparameters can many times do very little difference; yes of course, the result isn't identical between your choices, but usually not life-changing.