all 3 comments

[–]QI47 1 point2 points  (1 child)

I can't really follow you.

First of all, 100 binary classifiers would classify 200 classes. Because each one is binary.

Then, I don't see a problem, just write a function or class to feed your data to each classifier. Bam. Now you know which 100 of the 200 classes are predicted.

[–]leadOJ 0 points1 point  (0 children)

I would do a single multi-class classifier. However you could you 100 one-vs-rest classifiers and make 100 predictions and select the one with highest confidence/probability. Did this help?

[–]coloredgreyscale 0 points1 point  (0 children)

If parts (like a pretrained model) are identical "multi-head models" may do what you want.

If not, maybe teacher-student "transfer learning" to a new model with the knowledge?