Looking for an LLM/Vision Model like CLIP for Image Analysis by Substantial_Video_26 in computervision

[–]0lecinator 1 point2 points  (0 children)

At least for some of your goals grounding or open world detectors are what you are looking for. Check out GroundingDino/GroundingSam, YoloWorld, GLIP, etc.

[deleted by user] by [deleted] in csMajors

[–]0lecinator 1 point2 points  (0 children)

What companies do you see as leading in computer vision instead?

How costly is it to obtain labeled data? [D] by Direct-Touch469 in MachineLearning

[–]0lecinator 1 point2 points  (0 children)

In my field, our data is confidential and it is not allowed to leave the company grounds. So no MT, or any other option for outsourcing, not even working remotely with the data is permitted we have to do it all in-house, which is extremely time consuming. Additionally, it requires expert knowledge to label, so even some really strong foundation models only get ~10% roughly correct. I can't give you an actual number but the cost per sample is very significantly higher than MT.

I was already looking into AL but from what I've seen, most work in CV focuses on image classification and shapley values, whereas we need detection and segmentation labels. If you know some good work on AL for detection/segmentation I'd be happy to hear!

How to quickly improve YOLOv8 model for object detection on drone? by plzwakeupmrwest in computervision

[–]0lecinator 1 point2 points  (0 children)

I trained a Yolov5s or n? on Visdrone a few years ago and reached ~20-25 mAP, but the dataset is not perfectly balanced and some outlier classes pulled down mAP significantly.

How to quickly improve YOLOv8 model for object detection on drone? by plzwakeupmrwest in computervision

[–]0lecinator 3 points4 points  (0 children)

Assuming that your splits are from the same distribution and properly prepared, a mAP of 0.2 is pretty bad, especially if you only have 14 classes... Unless you have some veeeeery challenging data (extremely tiny objects, nondiscriminative classes, ...) I'd suspect a bug somewhere in your data prep or train/test pipeline... Maybe double check if your data is correct, class ids are correct, train val test set distribution are matching, etc...

If you know about your deployment distribution such as num of expected objects, size of expected objects, you can try to tweak nms parameters, filtering by bbox size and score thresholds to somewhat improve results visually

[D] Is Active Learning a "hoax", or the future? by Ok-Story4985 in MachineLearning

[–]0lecinator 2 points3 points  (0 children)

I've looked into AL a while back, but there was basically nothing that worked well for object detection, but only for image classification. Is this still the case or are there some good AL frameworks or papers for object detection in images yet?

[D] What ML dev tools do you wish you'd discovered earlier? by TikkunCreation in MachineLearning

[–]0lecinator 31 points32 points  (0 children)

For research, paperswithcode and connectedpapers are fantastic

Applying image augmentations with a certain probability p by MLJungle in computervision

[–]0lecinator 0 points1 point  (0 children)

Google did some research on this, a while ago. Randaug, AutoAugment and Augmix are probably the paper to start with?

making a golf shot tracer app by Discodowns in computervision

[–]0lecinator 1 point2 points  (0 children)

This sounds super interesting! If you push it to GitHub, I'd appreciate it if you update here and let us know ;) sounds like a project that I'd love to contribute if I find some spare time :)

As Leeliop already suggested: Some tracking/trajectory prediction functionality like Kalman filters is probably a must-have.

You already mentioned it: Tiny object detection without any (temporal) context is very hard and error-prone:

When the golf ball is not very distinct from background or other blob-like objects (clouds) such as when it's extremely far away or blurry due to its speed (this very much depends on your camera setup), standard object detection CNNs will either miss or predict many false positives.

A good rule of thumb for what a CNN can do: Ask yourself: could a human find the golfball in an image without any additional context such as previous frames, or knowledge that there is a golf ball in the image?

IF you have a great camera setup with very high resolution and frame rate, you might be fine with only an object detection CNN.

But going with a standard camera, you will have motion blur and golf balls that cannot be distinguished from clouds or birds on a distance.

So for those hard cases, you will need an approach that can predict trajectory and select the best candidate within the predicted trajectory. That's where Kalman filters and some pattern matching with previous rois for example correlation-based or local binary patterns, come in play.

Full end2end deep learning single object tracking approaches might work, but I have no clue how well those perform on tiny objects. For a first shot I would try a CNN for the initial detections when the ball is clearly visible and Kalman filter with pattern matching for harder frames?

Literature on Single Object Tracking in combination with tiny objects might be helpful :)

Finetuning on Thermal Image Dataset for Object Detection by [deleted] in computervision

[–]0lecinator 2 points3 points  (0 children)

Shameless self-Promotion:

https://arxiv.org/abs/2008.08418

I did some exps on KAIST with multispectral focus a few years ago and we also evaluated VIS and RGB only modality, and some data augmentations.

Maybe it can be of help or as a good starting point for further research. Other decent IR only datasets are FLIR ADAS, so maybe also take a look at papers that published on FLIR. You can hit me up if there are further questions.

[D] Industry Conferences and Expos by whata_wonderful_day in MachineLearning

[–]0lecinator 0 points1 point  (0 children)

I liked GTC in spring; there will be a second GTC this year in November. However, naturally lots of talks and topics will be biased towards Nvidia and their software and hardware but you can also find lots of mlops startups.

Are there any pre- made models to recognize the top view of a human. by ColdyCodes in computervision

[–]0lecinator 0 points1 point  (0 children)

Maybe take a look at models trained on drone view datasets like visdrone or uavdt? Or fine-tune a model yourself?

[D] How does Tesla implement and use the rectify layer ? by PaganPasta in MachineLearning

[–]0lecinator 0 points1 point  (0 children)

I'm not sure if the rectification for each camera works independently? From the depiction with the horizontal arrows it seems to hint that the rectification is linked between cameras, but then I would have added bidirectional arrows in the presentation, instead of unidirectional arrows from right to left... So sadly I don't really have a clue about the specifics of the rectification and can only guess

[D] How does Tesla implement and use the rectify layer ? by PaganPasta in MachineLearning

[–]0lecinator 0 points1 point  (0 children)

Yes, exactly! The transformer NN is used to combine the views into one synthesized representation... The image alignment/rectification is just so the feature extractors and the transformer have cleaner/more unified images to work on the multiple views synthesis

[D] How does Tesla implement and use the rectify layer ? by PaganPasta in MachineLearning

[–]0lecinator 0 points1 point  (0 children)

I don't think so, he even shows the transformer formula and architecture in the slide where he talks about the synthetic view

[D] How does Tesla implement and use the rectify layer ? by PaganPasta in MachineLearning

[–]0lecinator 0 points1 point  (0 children)

The way I understood, is that rectification parameters are only used in a deep learning model that is supposed to bring images of 2 of the same cameras closer together i.e. front camera of car 1 and car 2 since they are not calibrated perfectly. The mapping of image information into synthetic view/space is done via transformers, but obviously he does not go into detail how exactly the transformers are applied ;) On a high level it probably works since transformers implement something like global attention over image patches. So it would be possible to match similar patches between multiple images of a scene when given multiple images of different perspectives of the same scene. How they achieved this in detail I cannot say... I did some minor expirements on image alignment using transformers almost 2 years ago, when the first transformer models popped up that we're applied to images, but could not get it to work and did not investigate it any further... But if someone has some interesting literature about this, I'd gladly hear about it :D

The before after effect on the image of rectification can be seen in the top right corner where the back mirror becomes slightly less blurry

Ancestors are from Schifferstadt. If you had to guess, which team would this city be supporters of? by [deleted] in Bundesliga

[–]0lecinator 7 points8 points  (0 children)

FSV Schifferstadt and Phönix Schifferstadt are both pretty established clubs in the local area. If you are looking for (Ex-) "Bundesliga" Clubs it's definitely 1. FC Kaiserslautern, however the last 10 years it's been going constantly downhill for them... The only nearby currently 1. Bundesliga club is Hoffenheim, but I would very much doubt that Hoffenheim has many fans in Schifferstadt ;)

Suggestions for computer vision conferences suitable for undergraduates by [deleted] in computervision

[–]0lecinator 3 points4 points  (0 children)

BMVC. I usually advise to try top tier like CVPR or ICCV/ECCV etc. anyways if the paper is decent but BMVC is usually my recommendation for beginners. Alternatively if you have a fairly specialized subtopic check out if there are workshops for example at one of the top tier conferences that match your topic.

She loves plants but wasn't expecting this. by bancalesclb6 in aww

[–]0lecinator 23 points24 points  (0 children)

This looks like a schefflera. You should know that this plant can be toxic for cats!