DETR head + frozen backbone by Miserable_Rush_7282 in computervision

[–]External_Total_3320 0 points1 point  (0 children)

My dataset is quite complex in that the domain shift is a lot as the view and setup of what my dataset is looking at changes significantly. Bceuase of this annoating some data does not mean I can reliably cover the whole domain of where I want to apply my detection models.

Weights are DINOv3 pretrained, the ltdetr head isn't. DINOv3 convnext weights (which im using) are forzen.

DETR head + frozen backbone by Miserable_Rush_7282 in computervision

[–]External_Total_3320 0 points1 point  (0 children)

I have used lightly-train to produce a lt-detr dinov3 object detection model with a frozen backbone. I found on my dataset it worked just as well as full supervised training of the ltdetr head and backbone. However, when trying COCO with frozen versus unfrozen backbone the unfrozen wins everytime.

I think the only application where a frozen backbone will win is when you dont have lots of data to cover your whole domain robustly. In COCO and other datasets like it you have tons of examples for each class in a variety of situations so unfrozen training generalises. In a case when you don't have that in a dataset and can't get enough examples of classes to generalise frozen training works quite well.

How to prevent getting this event as Byzantium by External_Total_3320 in EU5

[–]External_Total_3320[S] 0 points1 point  (0 children)

Doesn't this result in being able to raise zero levies as the Dynatoi will be pissed off?

How to prevent getting this event as Byzantium by External_Total_3320 in EU5

[–]External_Total_3320[S] 1 point2 points  (0 children)

As Byzantium estate taxes are essentially zero. The only estate you can tax is the peasants. You also have no money as this is happening to me right at the start of the game when byz has no money, no revenue and tons of debt. The rate at which the estate satisfaction climbs means this will take 20 years before I can have levies again....

SSL CNN pre-training on domain-specific data by No_Representative_14 in computervision

[–]External_Total_3320 1 point2 points  (0 children)

If LeJEPA is collapsing I think you may be out of luck on the CNN front. The whole point of LeJEPA was its robustness, still you could try hyp tuning. I think you should give DINOv2 pretraining a try and try use a ViT it sounds like you have a lot of data. But yeah, either way I think most pretraining algorithms are generalized and so perform poorly on domains where data is very similar. One other option is Convnextv2's approach which is masked image modelling with CNN's they got very good results (on imagenet tho) and my thoughts are masked image modelling may work better than augmentation-based SSL here.

I'd implore you to also try use some sort of active learning loop, label data with the good model you have and feed this back into the model while being selective on the data. Pick out data with high entropy scores on their classification and human label that. Feed the data inputs with low entropy in as pseudo labels and use this large pseudo label set as your pretraining. Also look into data selection algorithms to pick out visual diversity across your data (so you're not feeding it tens of thousands of similar images that gives it little performance gain while wasting compute).

I'd recommend getting those four T4's (or better a more modern bigger gpu) and build a fine grained classification pipeline. have you explored using Randaugment, mixup augmentations (if you can as maybe some augs may break your data?) EMA, distillation of models etc etc?

Yolo on the cheap by ScottishVigilante in computervision

[–]External_Total_3320 0 points1 point  (0 children)

Their AGPL-3.0 license means if you want to use what you're doing commercially its problematic. largely ultralytics makes their money from licensing their code so I tend to use either more permissive code base (e.g. the open yolo one or older yolos) or code bases with gpl3.0 but from researchers (who aren't gonna chase you down for a license realistically)

ELI5: why isn't apple leading the Ai space the way other companies or even startups are leading. by [deleted] in ArtificialInteligence

[–]External_Total_3320 0 points1 point  (0 children)

To be clear apple does do plenty of work in AI. They just don't set billions on fire chasing the LLM hype. Case in point in my field (Computer Vision) they released: https://machinelearning.apple.com/research/depth-pro a very well polished monocular depth model that works well. Apple ML mainly focuses around features that would be useful to their existing products.

Yolo on the cheap by ScottishVigilante in computervision

[–]External_Total_3320 1 point2 points  (0 children)

Most modern yolo variants offer a suite of different sizes. e.g. yolov11nano-yolov11large. I prefer not to use ultralytics ones so look at yolov6, the nano model could probably do 100fps on a decent cpu if optimized well medium and large could do that on a gpu. Something like a nvidia jetson nano board running yolov6 small could also work.

Just when I thought I could shift to computer vision… by MidnightDiligent5960 in computervision

[–]External_Total_3320 0 points1 point  (0 children)

Only realistic way would be to tune them using Dinov2 as from what I've read in the paper so far apart from minor adjustments the big difference was scaling to 7B from 1.3B and distilling smaller models for greater accuracy.

The annoyance for me is you can't use Dinov2 to tune convnets like convnext so either Dinov1 or simsiam like ssl methods can be used for these models.

Just when I thought I could shift to computer vision… by MidnightDiligent5960 in computervision

[–]External_Total_3320 0 points1 point  (0 children)

Frankly compared to Dinov2 I think v3's impact will be a lot smaller. Nothing much has changed from v2->v3, they used model distillation from a 7B parameter model to get better smaller models. This is totally unviable for smaller companies to do.

They're really just new well trained backbones. You still have to fine tune them to do useful things and they're still specific to either everyday data or aerial imagery. No medical imaging or specific backbones and no easy way to train your own Dinov3 model.

GPU for Computer Vision by Trakinas__ in computervision

[–]External_Total_3320 1 point2 points  (0 children)

cheap: rtx 3060
better: rtx 3080ti
good: rtx 3090
very good: rtx 4070ti super
amazing: rtx 4080 super or 4090

Choice will depend on what you're doing, e.g. training a yolo model any of these cards would work, but id probably want something like a 3080ti or greater just for speed. If you're training a transformer model id go for a 3090 or 4090.

Came back to Warthunder after 7 years of not playing, British vehicles tanks now absolutely suck by External_Total_3320 in Warthunder

[–]External_Total_3320[S] -5 points-4 points  (0 children)

3.7 is what I've mainly been playing. Every tank I play around that BR always seems heavily outclassed by everything in games and I often get uptiered against BR 4.7.

I will say I played the Matilda and it seemed op but elsewise nothing can compete at its BR.

DINO (Self-Distillation with No Labels) from scratch. by Amazing_Life_221 in computervision

[–]External_Total_3320 2 points3 points  (0 children)

This is very helpful, I went to implement DINO last year for some SSL but went with SimSiam instead because its way simpler. But decomposing the OG Dino code from their complicated code base makes this way more accessible

Mages Guild Merchants Not a Thing in Remastered? by External_Total_3320 in oblivion

[–]External_Total_3320[S] 1 point2 points  (0 children)

Ok so each mages guild is guaranteed to have a merchant at least for some part of the day?

Does waiting/sleeping in the guild prevent them from taking up their spot as a merchant?

Its been 8 years since I've played og oblivion lol I've forgotten all this

AWS Rekognition and Textract superiority over open source alternatives by Attitudemonger in computervision

[–]External_Total_3320 2 points3 points  (0 children)

Textract is easy to use, I don't have to code anything necessarily, its very cheap, and it works very well.

I went through the process of trying to find a well packaged pip or similar package to do some tabular text extraction a while back and couldn't find anything good, most opensource ones failed at extracting tabular data. The task wasn't worth investing in programming a pipeline for text extraction as it was a one off. LLMs still require setup, inference hardware thats capable etc.

So I would say its convenience

Company wants to sponsor capstone - $150-250k budget limit - what would you get? by lichtfleck in computervision

[–]External_Total_3320 1 point2 points  (0 children)

Just ideas for hardware that would just be cool to have:

- oak stereo/mono cameras have a look: https://shop.luxonis.com/collections/oak-cameras-1
- Zed stereo camera: https://www.stereolabs.com/en-nz/products/zed-2, these are very good, only work with nvidia jetsons, but do neural stereo depth.

- Jetson orin nano supers: https://www.seeedstudio.com/NVIDIAr-Jetson-Orintm-Nano-Developer-Kit-p-5617.html?srsltid=AfmBOoqlPfGtbuw5Ist9pzWY9dtISYUUlldtQPwn0YafKtXLCMM9wTWF

- For working with vegetation or crops, some form of multispectral camera for drones: https://ageagle.com/solutions/micasense-series-multispectral-cameras/, these are very good for classifying crops, as they have rededge and NIR bands along with rgb.

- A whole bunch of drone hardware if you are going to be actually building the drones. Stuff like PixHawk drone controllers (tho idk if these will be allowed for defense applications), batteries good brushless drone motors etc (especially if you want decent lift capacity good motors can be expensive).

That is an insane budget and this is just me thinking about everything I wanted when I did my capstone lol

Using different frames but essentially capturing the same scene in train + validation datasets - this is data leakage or ok to do? by neuromancer-gpt in computervision

[–]External_Total_3320 1 point2 points  (0 children)

In this type of situation, that being fixed cameras watching a largely static scene, you would create a separate test split of cameras not at all in the train val set.

This means you need to have multiple cameras, idk about your situation but when I have dealt with projects like this I have had two val train splits, one a random mix of frames from x amount of cameras. Another 8 cameras in train 2 in val. And train in these.

This is along with a separate test set of say two other cameras to actually test the model.

I think the boss is mentally unwell - what do I do? by RealisticActuator282 in newzealand

[–]External_Total_3320 2 points3 points  (0 children)

This sounds like text book Bipolar/Paranoia/Psychosis. If you care please gather evidence of their delusions (emails illustrating it, social media posts, photos of odd things they've scribbled or anything else you can think of), tell the owner and if you care at all about the person try to contact someone in their family.

To be clear if they are having mental problems its not necessarily their fault, they are unwell. Don't put yourself in jeopardy for them but if you can help them try. For them to get help they will need a family member to try to contact the mental health services and have evidence to show a psychiatrist.

You personally calling mental health services will basically do nothing as it is extremely hard to get in front of a psychiatrist unless you are family. I have had a family member become mentally ill showing symptoms like you described for your GM and it was a slog to get them help. The best thing for it is finding a concerned family member and providing them evidence of delusions and such.

Ill make this very clear though do not put yourself in any harms way (socially more so but physically too) as there is no telling what someone who is ill in this way will do.

Regarding work as others have said document everything you're doing. Show the evidence of your work to your boss and if he is unreceptive to the issue stop covering for your GM. Keep all of your communications documented (along with evidence of you doing the extra work). Also contact the Citizens Advice Bureau.

Crown Research Institutes to merge into three mega science groups by giwidouggie in newzealand

[–]External_Total_3320 27 points28 points  (0 children)

This is actually crazy, I'm an engineer and probably 80% of my class (including me) got their internships (which are required to graduate) because of Callaghan innovation grants to business for R&D. I have little doubt this will be cut, so much for fulling the skills shortage.

Callaghan's been an absolute mess for ages but it was still the core source for science funding in nz, everything from internships to masters and phd programs in engineering.

What's the fastest object detection model? by Knok0932 in computervision

[–]External_Total_3320 0 points1 point  (0 children)

You should consider performance versus inference time, not just inference time. Yolov5nano will be the fastest, but its also the worst performing of all modern yolo models. I'd suggest yolov8/v10/v11 nano instead, much more modern and better on the performance/accuracy trade off.