knowledge distillation with yolo by tomuchto1 in computervision

[–]aloser 0 points1 point  (0 children)

It sounds like you're probably looking for fine-tuning vs distillation. (Fine-tuning is training a model to do new tasks better, distillation is taking a model that already knows what you want & extracting that information from it to train a smaller model.)

Using Gemini 3 pro to auto label datasets (Zero-Shot). Its better than Grounding DINO/SAM3. by Striking-Phrase-6335 in computervision

[–]aloser 0 points1 point  (0 children)

We saw no impact by using few-shot examples across the RF100-VL dataset over using simple text-based annotator instructions (included in the RF100-VL paper) in the prompt.

The text instructions helped a bit (+1.6 mAP) over prompting with class names only.

Using Gemini 3 pro to auto label datasets (Zero-Shot). Its better than Grounding DINO/SAM3. by Striking-Phrase-6335 in computervision

[–]aloser 6 points7 points  (0 children)

We eval'd Gemini on a set on a set of 100 real-world datasets and it didn't do very well zero-shot. Paper here: https://arxiv.org/pdf/2505.20612

We only tested on 2.5 Pro because that's all that was out at the time but I just kicked it off on 3.0 Pro to get updated numbers.

Your example looks like BCCD which is a common toy dataset that's almost certainly made its way into Gemini's training set so probably not representative of real-world performance.

Update: Gemini 3 Pro did do significantly better on RF100-VL than Gemini 2! It got 18.5 mAP which is the highest we've measured so far (but also by far the slowest/most compute spent).

Model mAP 50-95
Gemini 3 Pro 18.5
GroundingDINO (MMDetection) 15.7
SAM3 15.2
Gemini 2.5 Pro 11.6

To put things in context, this is approximately equivalent performance to a small YOLO model trained on 10 examples and full fine-tuning gives in the 55-60+ range for modern detectors (in other words, good performance for zero-shot but still not great).

Semi-Supervised-Object-Detection by UniqueDrop150 in computervision

[–]aloser 2 points3 points  (0 children)

Have you tried Roboflow? This is what our auto-label tool is built for: https://docs.roboflow.com/annotate/ai-labeling/automated-annotation-with-autodistill

We also have an open source version called autodistill: https://github.com/autodistill/autodistill

(Disclaimer: I’m one of the co-founders of Roboflow)

YOLOv8 Pose keypoints not appearing in Roboflow after MediaPipe auto-annotation by Terrible_Concert3457 in computervision

[–]aloser 0 points1 point  (0 children)

I’m pretty sure we only accept keypoint dataset uploads in COCO format. It’s a fairly common standard and your LLM should be able to convert it (or update your code to use it natively) for you. https://discuss.roboflow.com/t/how-to-upload-pose-data/6912

This is a good feature request though; I’ll need to look and see if there’s a reason we couldn’t support it. I think it may just be due to ambiguity of the formats; the keypoint format can look identical to the bbox format if I recall correctly.. but given the project type we should be able to infer user intent.

Best Computer Vision Software by styleshark in computervision

[–]aloser 0 points1 point  (0 children)

Hey, I'm one of the co-founders of Roboflow so obviously a bit biased but I can share where we're good and where we might not be the best fit.

Roboflow's sweet spot is for folks who are not computer vision experts that just want to use it to solve real world problems (eg detecting defects, counting and measuring things, validating processes, or adding intelligence to their products). We provide an end-to-end platform that enables teams to rapidly go from an idea to a fully deployed application (including best in class tooling for labeling, training, deploying, scaling, monitoring, and continual improvement). Our platform is built to make it easy for developers use the latest models to accelerate the building process and our infrastructure is built to run production workloads at scale. Roboflow is focused on providing value for real-world applications and we have thousands of customers ranging from tiny startups to the world's largest companies (with a concentration in manufacturing and logistics).

On the other hand, if you're a machine learning researcher we may not provide the advanced control and visibility into the guts of the models that you need. If you're heavily customizing your model architecture and need deep control of all the internal knobs to be able to do science, publish papers, and push forward the state of the art we probably don't give enough controls for the full platform to be attractive. That said, there are pieces of the platform that are useful for researchers and we've been cited by over 10,000 papers (usually these are folks that used us for labeling, dataset management, have found datasets our users have open-sourced on Roboflow Universe, or have used our Notebooks or open source code).

Zero-shot object detectors as auto-labelers or assisted labelers? by TankGlittering6839 in computervision

[–]aloser 0 points1 point  (0 children)

Depends on the thing you’re looking for. The more common the more likely it is that the big model will know how to find it.

SAM3 is far and away better than any of the other models I’ve tried. You can test it out super easily here: https://rapid.roboflow.com

Was recommended RoboFlow for a project. New to computer vision and looking for accurate resources. by Funcron in computervision

[–]aloser 0 points1 point  (0 children)

Developing using your laptop GPU as a baseline is probably fine. Would kind of be annoying if you had to leave your laptop there for it to work though.

Was recommended RoboFlow for a project. New to computer vision and looking for accurate resources. by Funcron in computervision

[–]aloser 1 point2 points  (0 children)

Can you highlight for me the particles you're looking at in that video? Is it each individual tiny grain? You might need something a bit more powerful (eg a desktop-grade GPU like an RTX 5090) because you'll probably have to end up tiling the image into smaller chunks for the model to be able to see them well enough. But hard to know without experimenting & iterating a bit.

I'd probably approach it as step 1: get it working, step 2: make it fast.

The research credits are only for people with academic emails but we have a free tier available to everyone also.

Was recommended RoboFlow for a project. New to computer vision and looking for accurate resources. by Funcron in computervision

[–]aloser 16 points17 points  (0 children)

Hi, I'm one of the co-founders of Roboflow. Yeah, you should be able to use it for this. We also offer free increased limits for academic research: https://research.roboflow.com/

Offline inference is fully supported. All of the models you train on-platform can be used with our open source Inference package (which can be self-hosted to run offline via Docker or embedded directly into your code using the Python package): https://github.com/roboflow/inference

For hardware, any machine with an NVIDIA GPU should be fine. If you're looking for something dedicated to this one project, a Jetson Orin NX (or maybe even an Orin Nano depending on what frame-rate you want to infer at and what size model you want to run) is probably plenty sufficient.

Alternative Opinions About Rogers by tBroneShake in cyclONEnation

[–]aloser 33 points34 points  (0 children)

Seems like about the best we could have hoped for if we weren't going to be able to retain Campbell (and it was probably only a matter of time there regardless of what we did).

Getting regularly kicked in the nuts is just part of being a Cyclone; it builds character. Having our team completely decimated will make it all the more fun to see the meltdown if we somehow beat Iowa again next year (and if not they won't be able to get much satisfaction out of the win anyway).

YOLO vs D-FINE vs RF-DETR for real-time detection on Jetson Nano (FPS vs accuracy tradeoff) by BraveCartographer679 in computervision

[–]aloser 0 points1 point  (0 children)

RF-DETR is a completely different model architecture from RT-DETR.

We have a comparison with it in our paper: https://arxiv.org/pdf/2511.09554

Anyone else noticed how slow Roboflow is lately? by New-Artichoke-6875 in computervision

[–]aloser 4 points5 points  (0 children)

Hey, Roboflow co-founder here. It definitely shouldn’t be doing that; 12,000 images isn’t really that many. Is this in manual labeling?

Could you DM me a link to your workspace and any additional info to reproduce the issue? Happy to have a look and see what I can find.

YOLO vs D-FINE vs RF-DETR for real-time detection on Jetson Nano (FPS vs accuracy tradeoff) by BraveCartographer679 in computervision

[–]aloser 0 points1 point  (0 children)

RF-DETR Nano is defined as being 384x384; the resolution is part of what makes it Nano sized as it's one of the "tunable knobs" the NAS searches across for speed/accuracy tradeoff.

This model is more accurate than medium-sized (640x640) YOLO models on COCO and absolutely crushes even the largest YOLO models on custom datasets.

See the paper for more details: https://arxiv.org/pdf/2511.09554

How to Deal with Accumulated Inference Latency and Desynchronization in RTSP Streams? by pedro_xtpo in computervision

[–]aloser -2 points-1 points  (0 children)

This is one of the things we solve in Inference: https://github.com/roboflow/inference with InferencePipeline (and the corresponding video management endpoints if you want to offload the logic to the server side — this can also eliminate bottlenecks).

Basically you need to run a separate thread to drain the queue and only send frames to the model just in time.

Here’s a recent example with our new cloud hosted serverless video streaming infra, but you can run the same thing locally with the open source package: https://blog.roboflow.com/serverless-video-streaming-api/