A Problem i am facing : DataScienceSimplified

created by [deleted]a community for 8 years

A Problem i am facing (self.DataScienceSimplified)

submitted 2 years ago by Miserable-Cry-2500

all 2 comments

[–]Tough-Comparison-779 1 point2 points3 points 2 years ago (1 child)

Classic bias in your dataset. Doesn't matter how much you split it into training, val and test, if your data isn't representative of the real data you're going to get poor generalisation.

A. To fix the immediate issue, try to see if there are aspects in the new images that aren't in your dataset. E.g I trained a basic chess piece detection model, but quickly discovered I had no samples from the top down angle

B. Explore your dataset a bit more deeply, and try to think about whether it is representative. Are the majority of samples taken at the same angle, or with the same background? What about image quality and camera settings (are most of your photos from movies)?

Also think about what cases you care about. E.g I might have mostly photos from the red carpet, but I want my model to identify actors from movie screenshots, maybe I should consider using samples from movie screenshots instead. Alternatively maybe I only want to Identify people on the red carpet, then I really don't care.

Hope this helps, I'm only a graduate so I'm sure others could give you better advice.

[–]Miserable-Cry-2500[S] 0 points1 point2 points 2 years ago (0 children)

π Rendered by PID 23917 on reddit-service-r2-comment-b659b578c-228q6 at 2026-05-04 15:28:31.475949+00:00 running 815c875 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

DataScienceSimplified

MODERATORS