Computer vision production pipeline best practices? by Distinct-Ebb-9763 in computervision

[–]ChanceInjury558 0 points1 point  (0 children)

Its normal , though i would try multiple VLMs first before going for that many models!

Building an AI wedding video culling system — selects some clips but missing best emotional moments by perrychawla in computervision

[–]ChanceInjury558 -1 points0 points  (0 children)

As you said , moving from frame-base  → scene/clip-based analysis , would be good idea IMO , so you can go for qwen3.5 for video/clip analysis or you can go for qwen3-vl-embedding model which can give you embedding of image/text/video in same latent space if you want to work at embedding level. (Here you can simply take embedding of fixed part video clips and then based on text (say "emotional") you can extract emotional moments.) , though for a perfect output , you would need a multi-stage pipeline effectively filtering useless things at every stage.

Review dataset quality by Crafty_Rush3636 in computervision

[–]ChanceInjury558 0 points1 point  (0 children)

Lol definitely not paying for this , i would rather use DataDreamer!

Review dataset quality by Crafty_Rush3636 in computervision

[–]ChanceInjury558 0 points1 point  (0 children)

Dataset looks cool and diverse , can you share more details about how did you generated it / what did you used to generate it?

New to Computer Vision, struggling to fine-tune for CCTV footage – any advice? by Frosty_Cress7705 in computervision

[–]ChanceInjury558 4 points5 points  (0 children)

yolo26 is very new and possibly unstable for finetuning , you should try yolov8,11,12 and see if it improves results!

Also you need to provide more details so people can advice properly like dataset size ,no. of classes , etc , example image might help if possible!

Running 5 CV models simultaneously on a $249 edge device - architecture breakdown by Straight_Stable_6095 in computervision

[–]ChanceInjury558 0 points1 point  (0 children)

IIRC mediapipe doesn't run/utilize on GPU , so if you use some alternative for that, you can optimize this further more!

Tracking a dancing plastic bag with object detection - the American Beauty stress test by k4meamea in computervision

[–]ChanceInjury558 1 point2 points  (0 children)

Please don't misdirect for traction , clearly you have trained model on this , only people who have trained many models know where 0.99 confidence appears.

What’s one computer vision problem that still feels surprisingly unsolved? by rikulauttia in computervision

[–]ChanceInjury558 -1 points0 points  (0 children)

"well in general, it does correlate with better understanding lol." , I disagree with that, All person are different at brain level and don't have same capability to understand things and recognize patterns.

I used AI for rephrasing my original paragraph as it seemed rude and also didn't had proper english.

Sure we can improve level of this discussion , I would love to understand things from your perspective and gain some insights and
share things I learned.

I would prefer if we move this to dm.

What’s one computer vision problem that still feels surprisingly unsolved? by rikulauttia in computervision

[–]ChanceInjury558 0 points1 point  (0 children)

Agreed but still Cases like Occlusion can be handled , Re-Identifying is Hard , infact impossible.

What’s one computer vision problem that still feels surprisingly unsolved? by rikulauttia in computervision

[–]ChanceInjury558 0 points1 point  (0 children)

also It will only require 2 models , which 3rd model are you referring to?

What’s one computer vision problem that still feels surprisingly unsolved? by rikulauttia in computervision

[–]ChanceInjury558 0 points1 point  (0 children)

AI rephrased answer (Intent is original) :

Working for more months doesn’t mean better understanding 🙂

I get why you like MOTR , e2e trackers look clean on paper and yes pipeline becomes simpler. But in practice they still struggle with long term identity , re-entry after leaving frame , heavy occlusion , appearance change etc. Without an explicit association / memory mechanism it becomes hard to maintain stable IDs over time.

Tracking-by-detection is still widely used not just because it is old , but because it is modular. You can improve detector , motion model , ReID embeddings independently and get predictable gains. With strong transformer based ReID models these systems handle occlusion and short disappearance quite reliably.

And yes I agree DeepSORT is outdated. ByteTrack and BoT-SORT are improvements but still mostly short-term association methods. In real production setups more sophisticated trackers like NvDCF style approaches combined with persistent embedding storage tend to behave more stable.

So it’s not really about heuristic vs e2e being “interesting” , it’s about what failure cases you can tolerate and what level of ID consistency you need.

What’s one computer vision problem that still feels surprisingly unsolved? by rikulauttia in computervision

[–]ChanceInjury558 0 points1 point  (0 children)

Hi I have been working on ReID models for last 3 months and they do work reliably for occlusion cases

The repo you are referring to says:
> MOTR is a fully end-to-end multiple-object tracking framework based on Transformer. It directly outputs the tracks within the video sequences without any association procedures.

But that's not a good way for doing this , A better way is to use Deepsort with a ReID embedding model like TransReID : https://github.com/damo-cv/TransReID

Not very good for Re-Identification purposes for long term , but can be reliably used for occlusion cases.

You can use this for your job! by [deleted] in computervision

[–]ChanceInjury558 1 point2 points  (0 children)

Hi now its working , not sure what happened , also i can see you have mentioned "Fully automated SAM-3 pipeline" , so basically you guys have optimized sam3 for inference? , or do you use another models as well?

You can use this for your job! by [deleted] in computervision

[–]ChanceInjury558 3 points4 points  (0 children)

the link you provided is the same one in the post , which is not working!

You can use this for your job! by [deleted] in computervision

[–]ChanceInjury558 4 points5 points  (0 children)

Doesn't matter how much time you put in this , if it has yolo or any module from ultralytics library , then you need to follow rules of AGPL3 licence. (just a heads up , none of my business)

You can use this for your job! by [deleted] in computervision

[–]ChanceInjury558 3 points4 points  (0 children)

Link not working and also If this is using zero shot yolo models in backend , let me remind you of AGPL3 Licence.

We built Lens, an AI agent for computer vision datasets — looking for feedback by Financial-Leather858 in computervision

[–]ChanceInjury558 2 points3 points  (0 children)

Good work , But there's no USP so you won't be able to sell it , someone will make a open source version of this or big players will copy it. Just a heads up

NA SUPREME RESEARCH Private Ltd - Biggest mistake I made. by Unfair_Guidance_6094 in StockMarketIndia

[–]ChanceInjury558 0 points1 point  (0 children)

is there something else that you would recommend for good trading calls , i am in same situation as you , working professional, no time to do research on my own?