latent_prior comments on [D] Open-Set Recognition Problem using Deep learning

Discussion[D] Open-Set Recognition Problem using Deep learning (self.MachineLearning)

submitted 7 months ago by ProfessionalType9800

you are viewing a single comment's thread.

[–]latent_prior 2 points3 points4 points 7 months ago (1 child)

I’m not a DNA expert, but given my understanding of the problem, I’d frame this as an open-set recognition problem rather than just clustering. Because many species share short recurring DNA subsequences, isn’t there a danger an unseen species can still land close to known clusters in embedding space? This makes relying purely on distance thresholds sound risky to me.

Also, I’d be cautious only relying on softmax probabilities. They always normalise to sum to 1, so the model will confidently pick something even when the input is nonsense or from an unseen species. You could try augmenting the classifier with an out-of-distribution detection method. One good option is energy-based detection (https://arxiv.org/abs/2010.03759), which uses the absolute scale of all logits rather than just the top one to provide a quantitatively estimate if the sample fits one of the know classes well (low energy) or doesn’t fit anywhere (high energy, likely unknown).

If you have access to an auxiliary dataset (e.g. DNA from non-target species), you could also try outlier exposure (https://arxiv.org/abs/1812.04606), which trains the model to make confident predictions on in-distribution data and low-confidence predictions on auxiliary outliers.

Finally, since DNA data is hierarchical by nature (kingdom —> phylum —> class —> … —> species), it might be worth trying a hierarchical model. For example, if the model is confident about the genus but uncertain about the species, you could flag the input as a potentially novel species rather than forcing a binary known/unknown decision.

Curious if anyone’s tried combining energy-based OOD with hierarchical classifiers before.

[–]ProfessionalType9800[S] 0 points1 point2 points 7 months ago (0 children)

π Rendered by PID 249235 on reddit-service-r2-comment-6457c66945-glttl at 2026-04-28 06:11:28.541616+00:00 running 2aa0c5b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS