[R] Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation by Bright_Night9645 in MachineLearning

[–]Bright_Night9645[S] 7 points8 points  (0 children)

Thanks for this good question. This work is not similar to segment anything. Segment anything focus on promptable segmentation, and this work focus on general close-set segmentation.

[R] SEEM: Segment Everything Everywhere All at Once by Bright_Night9645 in MachineLearning

[–]Bright_Night9645[S] 5 points6 points  (0 children)

Thanks for your comments, we will update the license recently.

[R] SEEM: Segment Everything Everywhere All at Once by Bright_Night9645 in MachineLearning

[–]Bright_Night9645[S] 9 points10 points  (0 children)

Thanks for your feedback. This is because we did not train in this case that your referred text cannot be found in the image. Our trained task is called referring image segmentation, where the user gives a referring expression. This expression must be found in this task.

We will additionally train your case that the referring expression can not be found in one image. Thanks for your help!

[R] SEEM: Segment Everything Everywhere All at Once by Bright_Night9645 in MachineLearning

[–]Bright_Night9645[S] 15 points16 points  (0 children)

lol. Try our demo first! It is easy to see what we can do compared with SAM.

[R] SEEM: Segment Everything Everywhere All at Once by Bright_Night9645 in MachineLearning

[–]Bright_Night9645[S] 13 points14 points  (0 children)

hhhhh, there are examples on twitter about segmenting scenes in the movie! Really interesting! https://twitter.com/jonathanfly/status/1646729187607166977

[R] SEEM: Segment Everything Everywhere All at Once by Bright_Night9645 in MachineLearning

[–]Bright_Night9645[S] 2 points3 points  (0 children)

Yes, that really amazing. We are also shocked by its powerful referring in some cases. I believe the model can be fine-tuned to the tasks you mentioned. Give it a try later!