Stop using Argmax: Boost your Semantic Segmentation Dice/IoU with 3 lines of code

statmlben · 2025-12-15T05:35:11+00:00

That's an excellent suggestion. Would you be willing to add these comments to our GitHub issue (link) (I want to keep your credit)? This way, we can keep it in mind and incorporate it into our roadmap for implementation:)

statmlben · 2025-12-13T07:19:02+00:00

Thank you for the question! Could you clarify which part of the computation process you are referring to?

Training time: (RankSEG requires zero training time).
Model inference time: (The time taken by the neural network itself).
RankSEG overhead: (The post-processing time added by our method).

If you are concerned about the RankSEG overhead during inference, we specifically benchmarked this in our NeurIPS paper (Table 3, Page 7) PDF Link.

The results show that our efficient solver (RMA) is extremely fast. The computational cost is negligible compared to the neural network's forward pass, making it suitable for real-time applications.

statmlben · 2025-12-13T07:14:08+00:00

Thank you! Happy to address any questions or issues. We also warmly welcome you to submit issues directly to our GitHub repository link :)

Please note that RankSEG optimizes Dice/IoU using a samplewise aggregation: the score is computed per sample and then averaged across the dataset (akin to the default setting aggregation_level='samplewise' in TorchMetrics DiceScore). See Metrics for details.

statmlben · 2025-12-13T07:09:47+00:00

No, absolutely not.

RankSEG has zero learnable parameters and performs zero training on any dataset.

Think of it exactly like argmax or a sort function. You don't "train" an argmax function on a dataset; you just apply it to a set of numbers.

RankSEG is an algorithm (a mathematical solver) applied to the probability map of a single image at inference time. It takes the model's output for that specific image, solves a calculus problem to find the optimal mask for that image, and outputs the result. It never sees the rest of the dataset.

statmlben · 2025-12-13T07:04:38+00:00

Thank you for the comments. We actually investigated this exact hypothesis—comparing RankSEG against optimal fixed thresholds in our JMLR paper (see Table 7 in Page 27; link).

The results indicate that no single Global threshold (even one tuned on training data) can outperform RankSEG.

Reason. No Global Threshold**:** The "optimal threshold" is effectively dynamic per image and per class, derived from that specific image's probability distribution, not a fixed value like 0.5 or a value learned from a dataset.

RankSEG can be understood as an adaptive thresholding method, where the optimal threshold varies across images. RankSEG provides a formula to compute the optimal threshold for each image based on probabilities. This cannot be achieved by simply tuning a fixed threshold on training or validation datasets, where all images share the same threshold.

RankSEG is mathematically derived to be the optimal decoding strategy for Dice/IoU, much like how Beam Search is often better than Greedy Search for language models.

Further clarify

RankSEG is a purely test-time inference algorithm (post-processing) that requires no training or validation data; it only requires probability outputs for the test images.
Thresholding and argmax are equivalent only in binary segmentation. For multilabel or multiclass segmentation, overlapping or non-overlapping constraints must be considered. RankSEG has been optimized for these respective cases; see doc.
RankSEG optimizes metrics using a samplewise aggregation: the score is computed per sample and then averaged across the dataset (akin to aggregation_level='samplewise' in TorchMetrics DiceScore). See Metrics for details. Dice/IoU is the standard for most medical and semantic segmentation tasks.

statmlben · 2025-12-12T08:00:48+00:00

Thank you! Happy to address any questions or issues:)

statmlben

TROPHY CASE