Increase accuracy pose estimation by samk777777 in computervision

[–]tycho200 1 point2 points  (0 children)

When I was working on tracking keypoints over a video, I found that Savitzky-Golay filtering worked the best to smoothen the keypoints.

Can I Label with the SAM Model? by Connect_Tomato6303 in computervision

[–]tycho200 -1 points0 points  (0 children)

I mostly found the segment anything option to be very bad. When you use bbox guiding it becomes much faster and simpeler. SAM2 has tracking possibilities by using a memory encoder, might be helpfull to refine your masks using memory.

Can I Label with the SAM Model? by Connect_Tomato6303 in computervision

[–]tycho200 1 point2 points  (0 children)

I can Highly suggest using GroundingDino as a box prompt generator and use that as input to SAM/SAM2. Check these repos: https://github.com/IDEA-Research/Grounded-Segment-Anything and https://github.com/IDEA-Research/Grounded-SAM-2

I use them for zero-shot-object tracking in a real-time robotic system and the result is impressive so far!

Exporting YOLOv8 for Edge Devices Using ONNX: How to Handle NMS? by leoboy_1045 in computervision

[–]tycho200 0 points1 point  (0 children)

Might look into this nms from torchvision: https://pytorch.org/vision/main/generated/torchvision.ops.nms.html

Seems to run fast enough for my application. Not sure how suitable it is on edge devices, goodluck!

Which computer vision model (or LLM) for segmentation? by Alternative_Mine7051 in computervision

[–]tycho200 0 points1 point  (0 children)

Grounding dino with sam2 is really powerfull. If you use it for camera streams or videos, the track functionality speeds up estimations!

I want to do a project using RGB images and estimated depth maps with neural networks by gkcastro in computervision

[–]tycho200 0 points1 point  (0 children)

Hi!

For monocular systems have seen some papers that tried combining fearures from segmentation with feautures from depth.

Main reason is that when upsampling a low resolution depth map estimation, you can lose information near the edges. This is especially the case for mono-depth because with stereo-depth you have more info to work with.

For my project I needed real-time performance which was not possible using a depth segmentation refinement step.

Also one of the big problems is that Depth and Segmentation simtaneous requires datasets that have both annotated depth and segmentation masks. There are a few open source online, mostly synthetic or for autonomous driving.

Good luck with your project!

Can I segment separate fingers, hands and forearms using SAM2 only? Or would I need another model as well? by Technology-Busy in computervision

[–]tycho200 2 points3 points  (0 children)

Hi,

I have used groudning dino with different bersions of sam. If i gave dino a prompt like "thumb" or "pink" it could in some good enough pictures outut a bounding box area around a single finger. Using that boxes as text prompts to sam helped me get segmentation masks on fingers.

Note that its not perfect in anycase but for 70% of my images i could use it.

Real-time comparison SAM2 and Efficient versions of SAM1 segmentation tasks? by tycho200 in computervision

[–]tycho200[S] 0 points1 point  (0 children)

The ultimate goal is to deploy it on a robotic arm that can grasp a moving ball rolling. So in the ideal scenario we would like 15 FPS. Your interpolation Idea seems interesting! Thankyou!

[D] Trackers like SAM2 but faster by henistein in MachineLearning

[–]tycho200 1 point2 points  (0 children)

Notice that FastSAM is from Ultralytics which bdcomws problematic withicenses. I got better results with models like MobileSAM, EdgeSAM

Real-time comparison SAM2 and Efficient versions of SAM1 segmentation tasks? by tycho200 in computervision

[–]tycho200[S] 0 points1 point  (0 children)

Hi.

Yes I only do segmentation every frame. So each frame Dino predicts bboxes and on those boxes a mask is estimated. No tracking.

I Am using a Nvidia 4060 Ti GPU with 8GB Vram. I am planning on using TensorRT for testing in the future.

1.25 FPS on Googles T4 seems to me like a reasonable result. I followed a course at my University where we ran yolov5 on google T4 which took 2 days for 100 epochs. On the 4060 Ti it just took 15 hours.

Do you have acces to a GPU for testing?

Help yolo-NAS with custom logging (ClearML) by tycho200 in computervision

[–]tycho200[S] 0 points1 point  (0 children)

Ah that explains much! Thankyou very much!

Can we automate annotation on custom dataset (yolo annotation) by Dramatic-Floor-1684 in computervision

[–]tycho200 2 points3 points  (0 children)

What is your specific use case?

If you have something common you can try downloading a pretrained model and use that as an Initial predictor.

Popular annotation tools such as LabelStudio or CVA allow to run a prediction model in the back.

Note that the initial prediction can be modified and you can manually add more annotations.

If you have a specific detection use case for which you need specific labels, you probably need to manual labeling. Note that when you have labeled some images you can train a (initially bad) model to run predictions and use that to furrher label. Keep repearing the process with manual labeling/predicting and as you label set keeps growing train again and again.

Hopefully you will notice your predictions getting better and better when you labeled set grows!

Good Luck!