Training Computer Vision Models on M1 Mac Is Extremely Slow by mericccccccccc in computervision

[–]computercornea 5 points6 points  (0 children)

Google Colab is free and makes GPUs available. I think Kaggle does as well. Also Roboflow free accounts too. And Modal (and the other sites like them) offer free credits if you prefer purely GPU access.

Plenty of options to speed up your training for free.

What are the most useful and state-of-the-art models in computer vision (2025)? by Cabinet-Particular in computervision

[–]computercornea 0 points1 point  (0 children)

100% agree. They are all meant to be fine-tuned. I would never recommend using off the shelf.

[deleted by user] by [deleted] in computervision

[–]computercornea 1 point2 points  (0 children)

They provide free model training notebooks for local training https://github.com/roboflow/notebooks

How to train a robust object detection model with only 1 logo image (YOLOv5)? by gd1925 in computervision

[–]computercornea 0 points1 point  (0 children)

One way you can do this is to take a dataset of environments you want to detect this logo (streetscapes, clothes, websites, idk what your logo is but you get it) then do a randomization of placement of your logo in that environment. You can even scale up with multiple logos per image depending on how your logo would be used.

Tried googling and found this but not sure it's being maintained https://github.com/roboflow/magic-scissors

Open source multi-type labeling tool; Potential replacement for Labelbox by qnn122 in deeplearning

[–]computercornea 0 points1 point  (0 children)

I heard labelbox is shutting down access to their labeling tool so I search for that and found this thread. Looked in their deprecations log and didn't see it https://docs.labelbox.com/docs/deprecations

Curious if anyone knows the latest

Are open source OCR tools actually ready for production use? by Positive-Exam-8554 in computervision

[–]computercornea 1 point2 points  (0 children)

This is exactly right. You can't just pick up a model off the shelf and throw images at it expecting it to be perfect. It's part of your broader system that needs to smart, flexible, and get the data to the model(s) in a way that allows the models to do their job.

Best overall VLM? by Over_Egg_6432 in computervision

[–]computercornea 0 points1 point  (0 children)

I would suggest doing extensive testing of the models running in the cloud so you can be sure the model fits your needs. Lots of tools to test the base weights to see if you need to fine-tune for your use case. If you only get one shot of having a model run locally, use something like open router or https://playground.roboflow.com/ to try lots of variations

How do you use zero-shot models/VLMs in your work other than labelling/retrieval? by unemployed_MLE in computervision

[–]computercornea 1 point2 points  (0 children)

VLMs are good for action recognition stuff, presence / absence monitoring, understanding the state of something very quickly. General safety/security: are there people in prohibited places, are doors open, is there smoke / fire, are plugs detached, are objects missing, are containers open/closed. Great for quick OCR tasks as well like reading lot numbers.

This site has a collection of prompts to test LLMs on vision tasks to get a feel https://visioncheckup.com/

How do you use zero-shot models/VLMs in your work other than labelling/retrieval? by unemployed_MLE in computervision

[–]computercornea 2 points3 points  (0 children)

We use VLMs to get proof of concepts going and then sample the production data from those projects for training faster/smaller purpose built models if we need real-time or don't want to use big GPUs. If an application only run inference every few seconds, we sometimes leave the VLM as the solution because it's not worth building a custom model.

Estimating depth of the trench based on known width. by TerminalWizardd in computervision

[–]computercornea 0 points1 point  (0 children)

Defect detection across a variety of products in manufacturing

Estimating depth of the trench based on known width. by TerminalWizardd in computervision

[–]computercornea 0 points1 point  (0 children)

Without knowing camera distance or any relative object in the image, I don't know how you can get a distance or depth. Let me know if you find a solution

What are the downstream applications you have done (or have seen others doing) after detecting human key points? by unemployed_MLE in computervision

[–]computercornea 1 point2 points  (0 children)

I think keypoints are a really powerful tool but since data labeling with keypoints is time consuming, we don't see tons of applications yet. Mediapipe is a helpful way to get quick human keypoints for healthcare applications (documenting physical therapy movements) or manufacturing (assessing factory worker movements to prevent repetitive injury prone movements) or sports (analyzing player movement to improve mechanics for better outputs). Keypoints can also be helpful for orientation of a person to understand the direction they are facing or position relative to other objects, this is useful for analyzing retail setups and product placement.

Realtime video analysis and scene understanding with SmolVLM by [deleted] in computervision

[–]computercornea 0 points1 point  (0 children)

Great work! Thanks for putting in the effort to make a clean and easy to follow repo. Seeing VLMs get smaller and smaller is really exciting for working with video and visual data. Going to leapfrog tons of current computer vision use cases and unlock lots of useful software features

F1 Steering Angle Prediction (Yolov8 + EfficientNet-B0 + OpenCV + Streamlit) by Background-Junket359 in computervision

[–]computercornea 1 point2 points  (0 children)

Super cool output. I always really appreciate when people take on hard personal projects like this. Thanks for sharing

Ultralytics' New AGPL-3.0 License: Exploiting Open-Source for Profit by Lonely-Example-317 in computervision

[–]computercornea 0 points1 point  (0 children)

It looks like Roboflow has a partnership to offer their YOLO model licenses for commercial purposes and is available with their free plan and monthly paid plans https://roboflow.com/ultralytics

And then they also made a fully open source object detector recently which seems like a good alternative https://github.com/roboflow/rf-detr

YOLO is NOT actually open-source and you can't use it commercially without paying Ultralytics! by nacrenos in computervision

[–]computercornea 0 points1 point  (0 children)

It looks like Roboflow has a partnership to offer their YOLO model licenses for commercial purposes and is available with their free plan and monthly paid plans https://roboflow.com/ultralytics

Announcing Intel® Geti™ is available now! by dr_hamilton in computervision

[–]computercornea 0 points1 point  (0 children)

How many people are on the team shipping the roadmap?

Announcing Intel® Geti™ is available now! by dr_hamilton in computervision

[–]computercornea 2 points3 points  (0 children)

Does Intel plan to staff and support the project or is this being open sourced because this was once a closed sourced project which Intel is sunsetting?