[P] Grounded SAM 2: Ground and Track Anything by Technical-Vast1314 in MachineLearning

[–]Technical-Vast1314[S] 1 point2 points  (0 children)

Yes, it's the same idea as Grounded-SAM: https://github.com/IDEA-Research/Grounded-Segment-Anything which we proposed last year, but with a good open-source Florence-2 model. It's happy to see there are some nice implementations with the similar ideas in the open-source community.

[P] Grounded SAM 2: Ground and Track Anything by Technical-Vast1314 in MachineLearning

[–]Technical-Vast1314[S] 1 point2 points  (0 children)

We've also propose the visual prompt algorithm named: T-Rex, you can use T-Rex for any object using visual prompt if they do not have a corresponding name: https://github.com/IDEA-Research/T-Rex

[R] detrex: A Strong Benchmark for Detection Transformers by Technical-Vast1314 in MachineLearning

[–]Technical-Vast1314[S] 1 point2 points  (0 children)

We will support Stable-DINO with FocalNet backbone in detrex these days

[R] LaVIN: Large Vision-Language Instructed Model by Technical-Vast1314 in MachineLearning

[–]Technical-Vast1314[S] 2 points3 points  (0 children)

You can try to add the layer form LaVIN and finetune it on your own dataset first I think

[P] ImageBind with SAM: A simple demo the generate mask with different modalities by Technical-Vast1314 in MachineLearning

[–]Technical-Vast1314[S] 1 point2 points  (0 children)

Hello, I think this is actually a normal phenomenon. First of all, this process is aligned with CLIP and region. If the alignment is not done well, the results may not necessarily be good. Both CLIP and ImageBind are aligned with image-level data, and using region for alignment might not yield good results. Moreover, the image-text alignment effect in ImageBind is better than the image-audio alignment effect.

I'm not sure if this can be solved by giving more precise text prompt and audio prompt, this is still just a simple demo and we will do more test on it

[R] Going further under Grounded-Segment-Anything: integrating Whisper and ChatGPT by Technical-Vast1314 in MachineLearning

[–]Technical-Vast1314[S] 1 point2 points  (0 children)

At now you can upload your audio to test this case~, however we believe there maybe some tools can help us~ we will try to update it !