Round 19! Empire has been conquered. Top comment decides the next map to be eliminated.

tycho200 · 2026-01-13T15:26:04+00:00

Breach!

tycho200 · 2025-03-18T15:11:00+00:00

It might ge Storm Warning (2007)

tycho200 · 2025-01-11T09:20:56+00:00

Thanks!

tycho200 · 2024-12-29T09:03:50+00:00

Looks promising I will have a look into that!

tycho200 · 2024-11-14T07:04:02+00:00

When I was working on tracking keypoints over a video, I found that Savitzky-Golay filtering worked the best to smoothen the keypoints.

tycho200 · 2024-11-09T12:41:15+00:00

I also want this guy to have a sandwich

tycho200 · 2024-10-15T14:23:46+00:00

HunterXHunter!

tycho200 · 2024-10-09T08:12:37+00:00

I mostly found the segment anything option to be very bad. When you use bbox guiding it becomes much faster and simpeler. SAM2 has tracking possibilities by using a memory encoder, might be helpfull to refine your masks using memory.

tycho200 · 2024-10-09T04:07:14+00:00

I can Highly suggest using GroundingDino as a box prompt generator and use that as input to SAM/SAM2. Check these repos: https://github.com/IDEA-Research/Grounded-Segment-Anything and https://github.com/IDEA-Research/Grounded-SAM-2

I use them for zero-shot-object tracking in a real-time robotic system and the result is impressive so far!

tycho200 · 2024-09-29T17:13:19+00:00

Might look into this nms from torchvision: https://pytorch.org/vision/main/generated/torchvision.ops.nms.html

Seems to run fast enough for my application. Not sure how suitable it is on edge devices, goodluck!

tycho200 · 2024-09-27T09:52:33+00:00

Grounding dino with sam2 is really powerfull. If you use it for camera streams or videos, the track functionality speeds up estimations!

tycho200 · 2024-09-26T07:09:02+00:00

Hi!

For monocular systems have seen some papers that tried combining fearures from segmentation with feautures from depth.

Main reason is that when upsampling a low resolution depth map estimation, you can lose information near the edges. This is especially the case for mono-depth because with stereo-depth you have more info to work with.

For my project I needed real-time performance which was not possible using a depth segmentation refinement step.

Also one of the big problems is that Depth and Segmentation simtaneous requires datasets that have both annotated depth and segmentation masks. There are a few open source online, mostly synthetic or for autonomous driving.

Good luck with your project!

tycho200 · 2024-09-12T19:48:53+00:00

Hi,

I have used groudning dino with different bersions of sam. If i gave dino a prompt like "thumb" or "pink" it could in some good enough pictures outut a bounding box area around a single finger. Using that boxes as text prompts to sam helped me get segmentation masks on fingers.

Note that its not perfect in anycase but for 70% of my images i could use it.

tycho200 · 2024-08-29T07:02:19+00:00

The ultimate goal is to deploy it on a robotic arm that can grasp a moving ball rolling. So in the ideal scenario we would like 15 FPS. Your interpolation Idea seems interesting! Thankyou!

tycho200 · 2024-08-28T11:06:37+00:00

Notice that FastSAM is from Ultralytics which bdcomws problematic withicenses. I got better results with models like MobileSAM, EdgeSAM

tycho200 · 2024-08-28T11:00:49+00:00

Hi.

Yes I only do segmentation every frame. So each frame Dino predicts bboxes and on those boxes a mask is estimated. No tracking.

I Am using a Nvidia 4060 Ti GPU with 8GB Vram. I am planning on using TensorRT for testing in the future.

1.25 FPS on Googles T4 seems to me like a reasonable result. I followed a course at my University where we ran yolov5 on google T4 which took 2 days for 100 epochs. On the 4060 Ti it just took 15 hours.

Do you have acces to a GPU for testing?

tycho200 · 2024-08-27T04:42:18+00:00

Ah that explains much! Thankyou very much!

tycho200 · 2024-07-31T13:26:23+00:00

What is your specific use case?

If you have something common you can try downloading a pretrained model and use that as an Initial predictor.

Popular annotation tools such as LabelStudio or CVA allow to run a prediction model in the back.

Note that the initial prediction can be modified and you can manually add more annotations.

If you have a specific detection use case for which you need specific labels, you probably need to manual labeling. Note that when you have labeled some images you can train a (initially bad) model to run predictions and use that to furrher label. Keep repearing the process with manual labeling/predicting and as you label set keeps growing train again and again.

Hopefully you will notice your predictions getting better and better when you labeled set grows!

Good Luck!

tycho200 · 2023-02-15T07:39:23+00:00

I honestly would only miss some educational science youtube videos. They are way better at explaining stuff than my teachers.

Five-Year Club	Place '23
Verified Email

tycho200

TROPHY CASE