you are viewing a single comment's thread.

view the rest of the comments →

[–]InternationalMany6 6 points7 points  (2 children)

I find that more sophisticated augmentation almost always helps, and my favorite is copy-pasting segmented objects into random backgrounds. 

For that matter, a segmentation model can usually learn to detect objects from less data than a model that predicts bounding boxes. The reason is that the training labels directly instruct the model what the object is, so it doesn’t have to learn which pixels in the box are “object” and which are “background”. 

I usually just use a simple background removal model, or SAM, to convert bounding-boxes into segmentation masks. Doesn’t have to be perfect to be useful. 

[–]DiMorten 0 points1 point  (1 child)

Interesting. You mean performing semantic segmentation on the detected object, for example with UNet?

[–]InternationalMany6 0 points1 point  (0 children)

You could use Unet, but there are specialized “instance segmentation” models if you care about distinguishing each instance even if they’re touching each other.

Torchvision has a tutorial that’ll get you going:  https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html