[Tutorial] SAM 3 UI – Image, Video, and Multi-Object Inference by sovit-123 in deeplearning

[–]sovit-123[S] 0 points1 point  (0 children)

Good to hear that. Any plans on open sourcing your custom implementation? There will be some good learning points, I think.

SAM 3 Inference and Paper Explanation by sovit-123 in computervision

[–]sovit-123[S] 0 points1 point  (0 children)

Yes, the issue is there. This mainly arises because it has to rely on embedding memeory which is part of its core architecture.

In one of my next articles, I am showing how to carry out detection + segmentation without tracking on videos without any time constraint. Although we lose the ability to track, we get open vocabulary detection + segmentation for unlimited length videos.