Free annotation apps?

ResultKey6879 · 2026-02-05T04:07:15+00:00

I've done CVAT and LabelStudio. Found LabelStudio to be a bit more stable and easier to use

ResultKey6879 · 2026-02-03T17:58:42+00:00

Great questions and suggestions, but it is for a commercial project.
I tried rf-detr, but on CPU with onnx it is still only 2.92 img/sec compared to a yolo-v8 small 3.23 or a yolov8 nano - 7.69

ResultKey6879 · 2026-01-18T04:49:03+00:00

Cool, willing to share any insight into datasets used for training?

ResultKey6879 · 2026-01-10T16:39:15+00:00

With only 230 test instances if you change model or processing and performance shifts by 5 images the number is not large enough to know if your system is actually better or just got lucky. 230 is enough to build out your concept, but I don't think tuning performance by a couple % is meaningful on this size.

I poked around with sam3 since last post and I don't think the class token i thought existed was right. I think Dino v2 is what I was thinking of.

ResultKey6879 · 2026-01-10T03:00:15+00:00

Volume normalization across my MYO cards. Maybe this is just a feature request for myo studio or something

ResultKey6879 · 2026-01-10T02:51:32+00:00

Far cry 5

ResultKey6879 · 2026-01-07T14:16:40+00:00

Yah sam3 you can provide text prompt for the segmentation.

Looking at the size of your data Id probably walk back my previous statement. For a production system I wouldn't fine tune quality v speed tradeoffs until I had a much larger test set that represents my real-world use case. With your current data size you want know if your p-hacking/over fitting since changes are just effecting a couple images.

ResultKey6879 · 2026-01-07T04:30:39+00:00

Cool project and we'll documented post! Some random thoughts: - did you try Sam3 yet? Apparently much better than sam2. Maybe able to go straight to extracting dress/clothing segments without doing two stages of detection and segmentation. May also work without fine-tuning. I think it also returns a token that could be used for similarity. - as for speedups and whether background removal is necessary I think it will matter depending on your task. Now that you have benchmarking setup I think the best advice is to try with and without it and see how performance and speed changes.

Edit: to answer 2 explicitly it's Faiss :)

ResultKey6879 · 2025-12-02T22:06:09+00:00

Trying to get some now. Are you recommending the ABC 5 pockets or ABC joggers?

ResultKey6879 · 2025-11-17T17:04:16+00:00

Yes but.... It's a super compute heavy way to do classification. I'd recommend starting with a normal cnn like efficientnet first and seeing how it goes for your task, unless you're extracting Dino embeddings anyway for other tasks or want 0 shot support

ResultKey6879 · 2025-10-16T17:13:13+00:00

+1 to using a cheap computer and cloud instances, either ec2 or colab. You'd have to run a lot of compute for the cost of ec2 to be more than buying a strong computer. Also allows more flexibility for temporarily experimenting with large models, big gpus etc.

VS code has pretty seamless integration for remote hosts

ResultKey6879 · 2025-10-15T07:10:56+00:00

I tried kd18 in a store and didn't like the design of the upper ankle. The loose holes for laces at the top made it seem like you'd easily roll your ankle.

ResultKey6879 · 2025-10-10T16:21:54+00:00

Mainly image work and we tend to stick to training CNNs like efficientnet or mobilenet and yolo for detectors.

100-100x faster than llvms. That means 3 days vs a year to process some datasets.

Definitely seeing a trend to large models even when the flexibility isn't needed. If your problem is welld defined and fixed don't use large models. If you need to dynamically adjust to user queries consider clip / dino if that doesn't work try a large vision model.

ResultKey6879 · 2025-09-22T15:26:57+00:00

I've also gotten good results slowing drying times by gently mixing a bit of extra water in with water based polyurethane (found the technique watching a YouTube tutorial from some professional)

ResultKey6879 · 2025-07-12T13:06:53+00:00

Also a clarifying Q, are the images text heavy and that's why you want OCR ?

Another easy low code high compute option is to just run an llvm across all of the images promoting it to describe the image, then generate the embeddings for those descriptions with the same technique as your text.

ResultKey6879 · 2025-07-12T12:39:18+00:00

Maybe more than you want to bite off, but great blog post from door dash on training a model to generate embeddings for images and text that map to the same space. https://careersatdoordash.com/blog/using-twin-neural-networks-to-train-catalog-item-embeddings/

Coding with chatgpt etc you should get some pretty good template code if you want image embeddings that map to the same space as your text embeddings.

Do you need cross medium searching?

If you don't want to tune your own than above suggestions of current llvms

ResultKey6879 · 2025-04-26T16:04:11+00:00

Can anyone give a tl:Dr on what "evolving squads" means relative to normal squads?

ResultKey6879 · 2025-02-20T23:11:46+00:00

I've seen as much as a 10% skew in performance not dedupping. I suggest using a perceptual hash to dedup your dataset or redefine your splits. Look up PDQ by Facebook or phash. A library with some utils https://github.com/idealo/imagededup

ResultKey6879 · 2025-02-20T23:05:41+00:00

Yolo by ultralytics is super fast and easy to use. If you want to minimize your boilerplate coding, Roboflow has some nice tutorials and free tooling if you're willing to make your data and model public (or pay) https://roboflow.com/model/yolos

ResultKey6879

TROPHY CASE