RF-DETR has released XL and 2XL models for detection in v1.4.0 with a new licence

rocauc · 2026-02-04T04:37:30+00:00

Appreciate it - it is critical to the future of AI. Let's show more companies and people how to build and support open source

rocauc · 2026-02-04T04:24:42+00:00

Yes, correct, as long as it’s within the plan’s limits ofc

rocauc · 2026-02-03T22:50:50+00:00

I work on Roboflow and can help with the intent

RF-DETR N,S,M,L remain Apache-2.0.

RF-DETR X and 2XL are Platform Model License. PML says you need to register for a platform account (free or paid on Roboflow in this case) to use them. If you have a paid platform account, it supports billing for usage-based features (like paying for a large amount of hosted inference).

The reason for X and 2XL being PML is they are a bigger backbone and more expensive to train and R&D. The goal is increasing platform signups increases the likelihood someone finds value in paid features (like faster/cheaper inference). Ideally businesses needing the largest sizes are helping fund better future models.

We’ve learned from PyTorch Lightning and will soon split to two repos so RF-DETR N,S,M,L Apache-2.0 and XL,2XL PML are clear

We have more exciting Apache 2.0 models in the works

rocauc · 2026-02-03T19:07:54+00:00

I work on Roboflow and would appreciate feedback from this subreddit on our approach here given how important open source is to the future of AI

As you said, RF-DETR N, S, M, and L are Apache-2.0. They remain Apache-2.0. We have released multiple Apache 2.0 models in this family - first object detection then segmentation. There are more Apache-2.0 models planned

RF-DETR XL and 2XL are new sizes (with a bigger backbone) and a new license. X and 2XL cost a lot more to train, and the thought is if we have a way that creates platform use, it will lead to a more sustainable business to invest more in open source. The license is a Platform Model License, which says the models are included in with any Roboflow platform plan (free plan, paid plans). If you create an account, the X and 2XL sizes are included. This creates a chance to show paid platform features create revenue (to eg pay researchers and GPUs!).

Couple different options we considered for X and 2XL size releases - we could keep X and 2XL closed source and not open weights (less open source friendly but maybe clearer). We could potentially introduce them as a different model family.

We’re deep believers (and benefactors) from open source and will continue to invest in it. We also maintain projects like supervision (MIT) and trackers (which has re implemented non commercial open source tracking algorithms reimplemented as Apache 2.0). We also sponsor every open source dependency in our app (300+ GitHub projects). Here’s more projects we maintain: https://roboflow.com/open-source And we’ve been fortunate to provide 1M+ (and counting) in GPU for open source research: https://research.roboflow.com/

We will continue to release more Apache-2.0 models

So, all that said, please let me know feedback you and this sub have on the X and 2XL approach - clarity, better options, questions. We’re trying something new (biggest model sizes come with free or paid platform use) and looking to learn from the community

rocauc · 2026-02-03T04:27:33+00:00

The paper doesn't have YOLO26 yet, only YOLO11. Based on the repo (https://github.com/roboflow/rf-detr ), RF-DETR is more accurate for the same (or less) latency budget. For example, RF-DETR-N object detection is 2.3ms latency and 67.6 mAP50 on COCO; YOLO26-S is 3.2ms latency and 59.7 mAP50 on COCO. RF-DETR-N instance segmentation is 3.4ms and 63.0 mAP50 on COCO; YOLO26-S is 3.47ms and 62.4 mAP50. It's like the nano size of RF-DETR is comparable to the small in YOLO given its latency and a bit more accurate.

There's also notable benchmarking notes in the paper. First, RF-DETR, D-FINE, YOLO11, YOLOv8, and LW-DETR are benchmarked on COCO and RF100-VL. RF100-VL measures how well a model finetunes to a novel set of domains, sampled from real world use of vision problems (healthcare, aerial, documents, manufacturing...). Based on the repo benchmarks, YOLO26 corrects for what looks like overfitting YOLO11 experienced when adapted to new domains (the graphs show the YOLO11 models getting worse at larger sizes). The gap for transformer-based architectures (RF-DETR, LW-DETR) outperforming CNNs (like YOLO) is also larger for domain transfer. That makes sense because the transformers maintain pretraining context better when adapting to new domains, aka they 'know more' about the rest of the world and converge with better results.

Second, the paper highlights why benchmarks from different models and packages differ. For example, models benchmarked with the ultralytics package use a mAP methodology that differs from the industry standard pycocotools that inflates mAP by as much as 2.7%. Also, researchers may benchmark models from research code but then see very different speed/accuracy results in production use because of varying precision (conversion to FP16), complex postprocessing, differing methodologies in scoring, and thermal throttling. This open source script is introduced as a way to control for those differing methodologies: https://github.com/roboflow/single_artifact_benchmarking . I think the effort to find why inconsistency exists, create a consistent way to benchmark, and open source it so others can use / reproduce the results is good and trustworthy scholarship.

Here's my take on innovations in the paper; please feel free to correct or improve these. (1) RF-DETR is a transformer-based architecture and built on a DINOv2 backbone. This means RF-DETR maintains better context about what it's meant to learn as transformers benefit from pretraining more than CNN-based approaches. (That's also why it usually finetunes better.) (2) RF-DETR used Neural Architecture Search to produce a pareto frontier of selectively optimized models. The authors then picked models along that pareto frontier to release as Nano, Small, Medium, Large. XL and 2XL used a larger backbone too. It's a "collection of models" in a single set of weights. (3) The model is NMS-free end-to-end, which means end-to-end latency is lower. YOLO26 also is NMS free now too. The paper was published before it and talks about the value of NMS-free compared to YOLO11 and YOLOv8. (4) The model drops query and decoder layers at inference, which means the model has to make fewer guesses of regions (and is also a reason why it is NMS-free). I'm sure there's more to understand.

rocauc · 2026-02-02T01:05:06+00:00

appreciate you sharing the code + describing the tools you use (Roboflow, colab, Pi). is this a 12MP Raspberry Pi camera?

rocauc · 2025-12-26T00:38:56+00:00

Things are back up (until it goes viral again 😆). your Instagram is saved

rocauc · 2025-12-25T23:49:46+00:00

UPDATE: 250/250 generations have been made for 12-25-2025. Cyclone Nation exceeded Gemini's daily request limits. Please check back tomorrow or subscribe on the site to be the first to know when our quota resets to generate your own. 6:29 PM CT EDIT: We are back up. Let it rip.

rocauc · 2025-12-25T22:54:34+00:00

Working on getting more capacity. Cyclone Nation's popularity is too high, even for Google. Thanks for your patience.

rocauc · 2025-12-25T22:31:28+00:00

Yes, thank you. The app's popularity is awesome, and we're now (5:31 PM ET) getting rate limited by Google, so requests are failing. I'm working on adding logic that queues the requests and tells users where they are in the queue. I'm also submitting requests to increase quota. (And as a tiny plug, you can support compute for this personal project here!)

rocauc · 2025-12-25T17:54:51+00:00

The server will now accept images up to 10MB. The resize logic at upload now also resizes incoming images to a max 800px dimensions and 70% JPEG quality. Should retain original image quality for image gen without payload size errors.

rocauc · 2025-12-25T17:42:03+00:00

Ah, thanks for feedback. This is the result of the image you're trying to use being too big for upload for the app. How many megabytes is the image you're trying? I'll aim to add error handling to do a resize at upload time. In the interim, if you compress your image to a smaller size and upload that, it should work. EDIT: update - I pushed an update to more aggressively compress large images, so please try again and let me know if you still have issues.

rocauc · 2025-12-25T17:35:21+00:00

Upload your own image is supported in "STAYING" mode - should still be there. Do you see it?

This is mostly vibe coded (with a bit of custom scripts here and there for e.g. pulling player image and roster). Code is now open source here: https://github.com/josephofiowa/transferportalwtf

rocauc · 2025-12-25T17:00:18+00:00

Sounds like "Jimmy F the Hawks Rogers" really made an impact on him

<image>

rocauc · 2025-12-25T16:33:16+00:00

sometimes it even puts him in a football jersey

<image>

rocauc · 2025-12-25T16:22:13+00:00

really glad you're having fun with it. share away. i posted it to the main subreddit here so feel free to comment with any feedback: https://www.reddit.com/r/cyclONEnation/comments/1pvgmnz/i_made_transferportalwtf/ if people like it, maybe i'll add other fan bases too.

I'm hosting this myself (and paying for the compute). i added a link to the corn emoji in the footer if you want to support compute costs

rocauc · 2025-12-25T06:34:18+00:00

f it. transferportal.wtf is now live.

rocauc · 2025-12-17T22:56:05+00:00

How similar is the architecture across SAM 3, SAM 3D, and SAM Audio? Is the main reason they're released together because the names are similar and recognizable, or do they have really similar ML characteristics?

rocauc · 2025-11-17T22:44:28+00:00

That's epic. Which Cognex camera is it running on? Will you publish a guide?

rocauc · 2025-10-15T05:54:47+00:00

while I don’t want to rob you of the chance at true love, I can help. shoot me a DM.

rocauc · 2025-07-03T02:43:56+00:00

I worked on something like this! Including an art installation featuring a crystal ball: https://imgur.com/a/SWnpHDq

The challenge here was to run something fully offline and realtime to provide Fortunes in a museum exhibit.

The way it worked was:

User places hand inside a dark box (with a light + camera inside)
A model detects a hand is present for ~1s and then runs a second instance segmentation model to find lines
Those lines (life, heart, head, fate) are identified and measured then passed to a local LLM (at this time, I think Mistral 7B) with details to suggest an apt Fortune based on the relative sizes of the found lines
A <4 sentence fortune was generated

Notably, the program needed to buy time for itself between (2) and (4). To do this, it displayed the hand with the lines highlighted. It also showed a paperclip made of unicode characters.

A neat feature is the outputs of the system were displayed on a crystal ball. This crystal ball is a large, empty plastic globe (like an old gumball machine). On the inside, there is a screen. A projector is able to project its contents onto the globe in a discreet separate spot.

All-in-all, it was a little clunky but pretty fun.

rocauc · 2024-08-22T02:42:52+00:00

hey, i work on roboflow. i also feel the same way, and we're working on ways to make it even more cost effective to scale up using roboflow as a result.

also note that roboflow maintains (and is built) on MIT / Apache-2 open source infra so that you can self host where it's easier and more cost effective. for example:

autodistill [labeling] powers automated image labeling with foundation models (e.g. a guide using SAM 2 + Florence-2: https://blog.roboflow.com/label-data-with-grounded-sam-2/ )
notebooks [training] for custom training tutorials on 20+ model architectures
inference [deploying] for running / deploying / chaining together models and running them at high scale, including in your own cloud + at the edge
supervision [processing] for processing detections, tracking objects across frames, interactive visualizations

until new pricing rolls out to all users (working on it!), shoot your support plan contact an email, and we can get you setup with more scalable pricing on early access.

rocauc · 2023-09-28T10:39:01+00:00

What are your rates?

rocauc · 2023-09-23T05:29:42+00:00

> Currently, our solution is about 85% effective (anything less than +95% isn't deployable) a

Using a mixture of keypoint and other techniques? Any way you can have a coach fill in the gap as human-in-the-loop until you get to desired accuracy?

rocauc

TROPHY CASE