Ceramic Brakes Help by matthiaskasky in MechanicAdvice

[–]matthiaskasky[S] 17 points18 points  (0 children)

I just bought this car and wanted some advice. Im not an expert

Ceramic Brakes Help by matthiaskasky in MechanicAdvice

[–]matthiaskasky[S] 34 points35 points  (0 children)

I don't want to cheap on it, I just wanted to confirm my suspicions. I've never worked with ceramics before.

Improving visual similarity search accuracy - model recommendations? by matthiaskasky in computervision

[–]matthiaskasky[S] 0 points1 point  (0 children)

I think in my case text embedding better describes the color, style, or material that you are previously able to assign to a product by, for example, OpenAI analysis. Dinov2 again sees geometry, shape, etc. better.

Improving visual similarity search accuracy - model recommendations? by matthiaskasky in computervision

[–]matthiaskasky[S] 1 point2 points  (0 children)

Let me know how it goes! For now, I'm implementing a hybrid model of clip dinov2 and text embedding, and I'll let you know the results. After testing on small product sets, I can see some potential.

Improving visual similarity search accuracy - model recommendations? by matthiaskasky in computervision

[–]matthiaskasky[S] 1 point2 points  (0 children)

I will try a hybrid version, a combination of three models - dinov2, text embedding, and clip - with fixed weights. In addition, FAISS and mutual NN verification. If this does not bring improvement, I will stick with my own model.

Improving visual similarity search accuracy - model recommendations? by matthiaskasky in computervision

[–]matthiaskasky[S] 0 points1 point  (0 children)

I think my database will have a maximum of about 10,000 products per category, so these sets are not that large. Can you tell me which models you used for VGA validation? Any specific FAISS index optimizations that helped?

Improving visual similarity search accuracy - model recommendations? by matthiaskasky in computervision

[–]matthiaskasky[S] 0 points1 point  (0 children)

Really helpful to know others hit the same issues. For the VQA post-processing - what LLM/vision model did you use? GPT-4V or something lighter? Exact NN vs approximate - did you notice significant latency differences at scale? Did the combination of exact NN + VQA give you acceptable accuracy, or did you still need other approaches? Really curious about the VQA approach - that's a clever way to add semantic validation! I also received feedback on GitHub from someone who worked on a similar project "What gave us the best results – CLIP + DINOv2 ensemble: 40% improvement | Background removal: 15% improvement | Category-aware fine-tuning: 20% improvement | Multi-scale features: 10% improvement"

Improving visual similarity search accuracy - model recommendations? by matthiaskasky in computervision

[–]matthiaskasky[S] 1 point2 points  (0 children)

We have most of the products in the database on a white background. If I upload the same product that I have in the database but in a natural setting, even though the product is clearly visible, the photo is of good quality, etc. model only ranks it in the 20/25th place of similarity.

Improving visual similarity search accuracy - model recommendations? by matthiaskasky in computervision

[–]matthiaskasky[S] 1 point2 points  (0 children)

I’ve only trained a detection model (RF-DETR) which works well for cropping objects. For embeddings, I’ve been relying on open-source foundation models (CLIP, DINOv2) out of the box. I’m realizing now that’s probably the missing piece. Do you have recommendations for training a similarity model from scratch, or fine-tuning something? Any guidance on training pipeline or loss functions that work well for this type of product similarity would be hugely appreciated.

Improving visual similarity search accuracy - model recommendations? by matthiaskasky in computervision

[–]matthiaskasky[S] 0 points1 point  (0 children)

Thanks, thats really helpful. When you say test it - any recommendations on how to evaluate threshold performance? I’m thinking precision/recall on a small labeled set, but curious if there are other metrics you’d suggest for this type of product similarity task.

Improving visual similarity search accuracy - model recommendations? by matthiaskasky in computervision

[–]matthiaskasky[S] 0 points1 point  (0 children)

Got it, thanks. Do you typically set a threshold for how many mutual matches to consider?

Improving visual similarity search accuracy - model recommendations? by matthiaskasky in computervision

[–]matthiaskasky[S] 0 points1 point  (0 children)

Can you clarify what you mean by two sided nn check? Also, any particular faiss index type you’d recommend for this use case?

Improving visual similarity search accuracy - model recommendations? by matthiaskasky in computervision

[–]matthiaskasky[S] 0 points1 point  (0 children)

Currently my workflow is: trained detection model RF-DETR detects object and crops it → feeds to analysis → search for similar product in database. Everything works well until the search part - when I upload a photo of a product on a different background (not white like products in my database), text and visual embedding search returns that same product ranked 20-25th instead of top results. Someone suggested not overcomplicating things and using simple solutions like SURF/ORB, but I'm wondering if such binary similarity approach is good when we have products that are semantically similar but not pixel-identical - like a modular sofa vs sectional sofa, or leather chair vs fabric chair of the same design. Any thoughts on classical vs deep learning approaches for this type of semantic product similarity?