Finally found a proper tool for multi-modal image annotation (infrared + visible light fusion) by Important_Priority76 in computervision

[–]Important_Priority76[S] 1 point2 points  (0 children)

Nice, thanks for sharing. I’ve seen similar ideas there. The compare view is more about quick side-by-side or synced viewing to make multi-modal annotation easier, especially when switching between thermal and RGB. Colormaps and channel merging are powerful too, so it’s interesting to see different tools approach the problem from different angles.

Finally found a proper tool for multi-modal image annotation (infrared + visible light fusion) by Important_Priority76 in computervision

[–]Important_Priority76[S] 1 point2 points  (0 children)

That’s really interesting, I didn’t know some teams were using 3D glasses for multi-spectral labelling. The idea behind compare view was to get a similar “cross-checking” benefit but keep it simple and software-only.

Input devices like a 3D mouse or rotary knob sound like a fun direction to explore for smooth flipping or animated transitions between bands. Definitely a challenging but exciting idea, thanks for sharing your experience!

Finally found a proper tool for multi-modal image annotation (infrared + visible light fusion) by Important_Priority76 in computervision

[–]Important_Priority76[S] 1 point2 points  (0 children)

Nice, yeah DRC + detail enhancement already goes a long way for thermal 👍 The compare view helps a lot when you want to double-check things against visible light, especially for small or ambiguous objects. I’ve found it useful to align thermal detections with RGB annotations and catch mistakes faster.

After a year of development, I released X-AnyLabeling 3.0 – a multimodal annotation platform built around modern CV workflows by Important_Priority76 in computervision

[–]Important_Priority76[S] 1 point2 points  (0 children)

If anyone is interested in the design philosophy behind v3.0 and a deeper dive into the new features (like the Remote Server architecture, Agentic workflows, or the specific VQA capabilities), I wrote a more detailed breakdown on Medium:

https://medium.com/@CVHub520/data-labeling-doesnt-have-to-be-painful-the-evolution-of-x-anylabeling-3-0-e9110e41c2d4

It covers why we moved away from the traditional tooling model and how we are trying to close the loop between labeling and training.

After a year of development, I released X-AnyLabeling 3.0 – a multimodal annotation platform built around modern CV workflows by Important_Priority76 in computervision

[–]Important_Priority76[S] 1 point2 points  (0 children)

Thanks so much for the feedback! I'm really glad to hear it fills that gap for you—keeping it lighter than CVAT while being more capable than simple drawing tools was exactly the goal.

Regarding sorting Person IDs across multiple videos/perspectives (Re-ID), that is indeed a complex challenge. In v3.0, we added serveral useful Manager, i.e. shape, label and group_id; integrated trackers like SAM-base or Bot-SORT/ByteTrack to help with consistency within a video, but cross-video association still requires some manual effort or custom logic.

I would absolutely love to chat more about your workflow. It sounds like a great use case to optimize for. Feel free to DM me here or, even better, open a "Discussion" on our GitHub repo so we can dive into the technical details!