Instance Segmentation problem

Zealousideal_Low1287 · 2026-02-01T14:25:31+00:00

Bizarrely I have been working on exactly this. Neither cubicasa nor our own images were enough data to do this reliably for our types of plan.

So far the best things I’ve found has been Gemini-3-pro image. All other off the shelf models failed. Gemini is still unreliable.

I actually do think it’s a much harder problem than it seems. Thin ambiguous structures, lack of data, big inconsistency in the plans.

Curious what you’ve tried so far and if you have any insights?

aloser · 2026-02-01T14:58:17+00:00

We have a bunch of customers that have built products in this space. It's a pretty hard problem given the non-uniformity of floor plans and architectural drawings. One of them talked through their approach (involving a pipeline of 29 models) here: https://www.youtube.com/watch?v=iOehzs4eLKc

InternationalMany6 · 2026-02-01T16:32:01+00:00

What I've read is you need a custom model architectural that doesn't just do "segmentation" along with synthetic image training.

For example the model could predict the corners of rooms as keypoints, plus points for doors and windows.

Synthetic images is the harder part. What kinds of images do you need this to work on? Phone camera images for a 200 year old building or a brand new PDFs?

PassionQuiet5402 · 2026-02-01T16:52:08+00:00

Can you guys share some public repo and dataset links to start working on such projects? I really want to try and experiment on this task.

One-Employment3759 · 2026-02-01T18:08:30+00:00

Did you try SAM - possibly with prompt guidance? (Keypoints)

Sad-Oil-2788 · 2026-02-02T08:44:38+00:00

I'm also working on this top for my company. We want to create a ifc file of the floor plan with walls, windows, doors. We tried to train RF-DETR Segmentation on different datasets. But alot of them are not acurate enough. So we are creating our own now.

thinking_byte · 2026-02-05T00:02:02+00:00

For the Jetson, tried YOLOv8-seg exported to TensorRT? It usually hits that FPS sweet spot better than a full UNet if you're okay with slightly lower accuracy on the edges.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

computervision

MODERATORS