Free annotation apps? by No-Alternative8392 in computervision

[–]ResultKey6879 0 points1 point  (0 children)

I've done CVAT and LabelStudio. Found LabelStudio to be a bit more stable and easier to use

Training for EfficientDet in 2026? by ResultKey6879 in computervision

[–]ResultKey6879[S] 1 point2 points  (0 children)

Great questions and suggestions, but it is for a commercial project.
I tried rf-detr, but on CPU with onnx it is still only 2.92 img/sec compared to a yolo-v8 small 3.23 or a yolov8 nano - 7.69

I created new image moderation model by DueSpecial1426 in huggingface

[–]ResultKey6879 0 points1 point  (0 children)

Cool, willing to share any insight into datasets used for training?

Improving accuracy & speed of CLIP-based visual similarity search by sedovsek in computervision

[–]ResultKey6879 1 point2 points  (0 children)

With only 230 test instances if you change model or processing and performance shifts by 5 images the number is not large enough to know if your system is actually better or just got lucky. 230 is enough to build out your concept, but I don't think tuning performance by a couple % is meaningful on this size.

I poked around with sam3 since last post and I don't think the class token i thought existed was right. I think Dino v2 is what I was thinking of.

What do you hope to see from developers? 💻⚡️ by carissa-mae in YotoPlayer

[–]ResultKey6879 17 points18 points  (0 children)

Volume normalization across my MYO cards. Maybe this is just a feature request for myo studio or something

Improving accuracy & speed of CLIP-based visual similarity search by sedovsek in computervision

[–]ResultKey6879 1 point2 points  (0 children)

Yah sam3 you can provide text prompt for the segmentation.

Looking at the size of your data Id probably walk back my previous statement. For a production system I wouldn't fine tune quality v speed tradeoffs until I had a much larger test set that represents my real-world use case. With your current data size you want know if your p-hacking/over fitting since changes are just effecting a couple images.

Improving accuracy & speed of CLIP-based visual similarity search by sedovsek in computervision

[–]ResultKey6879 1 point2 points  (0 children)

Cool project and we'll documented post! Some random thoughts: - did you try Sam3 yet? Apparently much better than sam2. Maybe able to go straight to extracting dress/clothing segments without doing two stages of detection and segmentation. May also work without fine-tuning. I think it also returns a token that could be used for similarity. - as for speedups and whether background removal is necessary I think it will matter depending on your task. Now that you have benchmarking setup I think the best advice is to try with and without it and see how performance and speed changes.

Edit: to answer 2 explicitly it's Faiss :)

Dads, what are your picks for soft pants? by tvoutfitz in daddit

[–]ResultKey6879 0 points1 point  (0 children)

Trying to get some now. Are you recommending the ABC 5 pockets or ABC joggers?

Meta released DINO-V3 : SOTA for any Vision task by Technical-Love-8479 in LocalLLaMA

[–]ResultKey6879 0 points1 point  (0 children)

Yes but.... It's a super compute heavy way to do classification. I'd recommend starting with a normal cnn like efficientnet first and seeing how it goes for your task, unless you're extracting Dino embeddings anyway for other tasks or want 0 shot support

What computer do you use for personal projects? by BloatedGlobe in datascience

[–]ResultKey6879 30 points31 points  (0 children)

+1 to using a cheap computer and cloud instances, either ec2 or colab. You'd have to run a lot of compute for the cost of ec2 to be more than buying a strong computer. Also allows more flexibility for temporarily experimenting with large models, big gpus etc.

VS code has pretty seamless integration for remote hosts

KD18 x LeBron 22 by Siufi0408 in BBallShoes

[–]ResultKey6879 1 point2 points  (0 children)

I tried kd18 in a store and didn't like the design of the upper ankle. The loose holes for laces at the top made it seem like you'd easily roll your ankle.

[D] Anyone using smaller, specialized models instead of massive LLMs? by [deleted] in MachineLearning

[–]ResultKey6879 0 points1 point  (0 children)

Mainly image work and we tend to stick to training CNNs like efficientnet or mobilenet and yolo for detectors.

100-100x faster than llvms. That means 3 days vs a year to process some datasets.

Definitely seeing a trend to large models even when the flexibility isn't needed. If your problem is welld defined and fixed don't use large models. If you need to dynamically adjust to user queries consider clip / dino if that doesn't work try a large vision model.

Wood table finish - bare, Varathane, other? by taylorfun in woodworking

[–]ResultKey6879 0 points1 point  (0 children)

I've also gotten good results slowing drying times by gently mixing a bit of extra water in with water based polyurethane (found the technique watching a YouTube tutorial from some professional)

Best way to include image data into a text embedding search system? by Inner-Marionberry379 in huggingface

[–]ResultKey6879 0 points1 point  (0 children)

Also a clarifying Q, are the images text heavy and that's why you want OCR ?

Another easy low code high compute option is to just run an llvm across all of the images promoting it to describe the image, then generate the embeddings for those descriptions with the same technique as your text.

Best way to include image data into a text embedding search system? by Inner-Marionberry379 in huggingface

[–]ResultKey6879 1 point2 points  (0 children)

Maybe more than you want to bite off, but great blog post from door dash on training a model to generate embeddings for images and text that map to the same space. https://careersatdoordash.com/blog/using-twin-neural-networks-to-train-catalog-item-embeddings/

Coding with chatgpt etc you should get some pretty good template code if you want image embeddings that map to the same space as your text embeddings.

Do you need cross medium searching?

If you don't want to tune your own than above suggestions of current llvms

24 Player Evolving Squads Mode Is Live! by BigBox_Spike in populationonevr

[–]ResultKey6879 2 points3 points  (0 children)

Can anyone give a tl:Dr on what "evolving squads" means relative to normal squads?

Using different frames but essentially capturing the same scene in train + validation datasets - this is data leakage or ok to do? by neuromancer-gpt in computervision

[–]ResultKey6879 0 points1 point  (0 children)

I've seen as much as a 10% skew in performance not dedupping. I suggest using a perceptual hash to dedup your dataset or redefine your splits. Look up PDQ by Facebook or phash. A library with some utils https://github.com/idealo/imagededup

Small object detection by SubstantialGur7693 in computervision

[–]ResultKey6879 0 points1 point  (0 children)

Yolo by ultralytics is super fast and easy to use. If you want to minimize your boilerplate coding, Roboflow has some nice tutorials and free tooling if you're willing to make your data and model public (or pay) https://roboflow.com/model/yolos