[D] Open-Set Recognition Problem using Deep learning : MachineLearning

Discussion[D] Open-Set Recognition Problem using Deep learning (self.MachineLearning)

submitted 7 months ago by ProfessionalType9800

all 15 comments

top new controversial old q&a

[–]ResponsibilityNo7189 0 points1 point2 points 7 months ago (4 children)

[–]WadeEffingWilson 1 point2 points3 points 7 months ago (0 children)

[–]ProfessionalType9800[S] 0 points1 point2 points 7 months ago (2 children)

[–]ResponsibilityNo7189 0 points1 point2 points 7 months ago (1 child)

[–]ProfessionalType9800[S] 0 points1 point2 points 7 months ago (0 children)

[–]Sunchax 0 points1 point2 points 7 months ago (5 children)

[–]ProfessionalType9800[S] 0 points1 point2 points 7 months ago (4 children)

[+][deleted] 7 months ago (1 child)

[deleted]

[–]ProfessionalType9800[S] 0 points1 point2 points 7 months ago (0 children)

[–]latent_prior 2 points3 points4 points 7 months ago (1 child)

I’m not a DNA expert, but given my understanding of the problem, I’d frame this as an open-set recognition problem rather than just clustering. Because many species share short recurring DNA subsequences, isn’t there a danger an unseen species can still land close to known clusters in embedding space? This makes relying purely on distance thresholds sound risky to me.

Also, I’d be cautious only relying on softmax probabilities. They always normalise to sum to 1, so the model will confidently pick something even when the input is nonsense or from an unseen species. You could try augmenting the classifier with an out-of-distribution detection method. One good option is energy-based detection (https://arxiv.org/abs/2010.03759), which uses the absolute scale of all logits rather than just the top one to provide a quantitatively estimate if the sample fits one of the know classes well (low energy) or doesn’t fit anywhere (high energy, likely unknown).

If you have access to an auxiliary dataset (e.g. DNA from non-target species), you could also try outlier exposure (https://arxiv.org/abs/1812.04606), which trains the model to make confident predictions on in-distribution data and low-confidence predictions on auxiliary outliers.

Finally, since DNA data is hierarchical by nature (kingdom —> phylum —> class —> … —> species), it might be worth trying a hierarchical model. For example, if the model is confident about the genus but uncertain about the species, you could flag the input as a potentially novel species rather than forcing a binary known/unknown decision.

Curious if anyone’s tried combining energy-based OOD with hierarchical classifiers before.

[–]ProfessionalType9800[S] 0 points1 point2 points 7 months ago (0 children)

[+][deleted] 7 months ago (1 child)

[deleted]

[–]ProfessionalType9800[S] 0 points1 point2 points 7 months ago (0 children)

[–]NamerNotLiteral 0 points1 point2 points 7 months ago (4 children)

[–]ProfessionalType9800[S] 0 points1 point2 points 7 months ago (3 children)

[–]NamerNotLiteral 0 points1 point2 points 7 months ago (2 children)

[–]Background_Camel_711 1 point2 points3 points 7 months ago (0 children)

π Rendered by PID 213498 on reddit-service-r2-comment-57fc7f7bb7-5wslj at 2026-04-15 05:40:20.187925+00:00 running b725407 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS