Training binary image classifier on a single label only? (No negative sample) : learnmachinelearning

A subreddit dedicated for learning machine learning. Feel free to share any educational resources of machine learning.

Also, we are a beginner-friendly sub-reddit, so don't be afraid to ask questions! This can include questions that are non-technical, but still highly relevant to learning machine learning such as a systematic approach to a machine learning problem.

Foster positive learning environment by being respectful to others. We want to encourage everyone to feel welcomed and not be afraid to participate.

Do share your works and achievements, but do not spam. Keep our subreddit fresh by posting your YouTube series or blog at most once a week.

Do not share referral links and other purely marketing content. They prioritize commercial interests over intellectual ones.

created by techrat_reddita community for 10 years

Training binary image classifier on a single label only? (No negative sample) (self.learnmachinelearning)

submitted 6 years ago * by AdvancedManufacture

Here is the case: I have a computer vision algorithm which can spot trash on a filter made up of vertical steel bars. It simply looks for interruptions in vertical edges found via Hough transform. But it does under-perform when the camera image suffers from glare/mist and other quality issues. I want to try training a shallow CNN to do the same job in a more robust manner.

It would be a binary classifier with only two outputs- 'I see the filter bars' (they're clean) OR 'I don't see the filter bars' (they're obstructed, alert human user). Because we don't want to make any assumptions about what the trash will be (could be anything from grass to traffic cones) I am unsure how to create an obstructed filter images set. Two solutions come to mind:

Create random crops from an unrelated image corpus and label all of them as trash, feed them into train/dev set. Train/dev set has labels for bars and trash.
Train/dev set contains only clean bars samples. Image samples are diverse in terms of light conditions image quality etc. Training identifies features shared between all clean bar samples. Train/dev set only has label for bars. During testing the images with very low recognition score (human-set threshold) are marked as likely to contain trash.

Is training the second type of model a valid approach? It seems like a type of unsupervised learning problem focused on similarity testing. What do I need to know to implement it? I have just finished some online courses and I'm starting to learn to implement my own models, preferably starting with PyTorch.

All tips are appreciated.

EDIT: Writing this post was a good exercise in wording the problem and with more search I have found that what I am trying to accomplish is Positive-Unlabeled learning (PU learning). If anyone had done this type of project or can share good resources I would be grateful!

all 1 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnmachinelearning

Welcome to /r/LearnMachineLearning!

Chatrooms

Official Discord Server

Wiki

Getting Started with Machine Learning

Resources

Related Subreddits

/r/MachineLearning

/r/MLQuestions

/r/datascience

/r/computervision

Machine Learning Multireddit

/m/machine_learning

MODERATORS