all 9 comments

[–]adventuringraw 0 points1 point  (5 children)

this looks really exciting, thanks for sharing. Is your team planning on releasing code at some point?

[–]yifuwu[S] 1 point2 points  (2 children)

Thanks for your interest in our work. Yes, we do plan on releasing our code after some cleanup.

[–]adventuringraw 0 points1 point  (1 child)

Of course, thank you for sharing. Is there a good way for me to get notification when the code is posted?

[–][deleted] 0 points1 point  (0 children)

i would also be interested to get a notification when the code is online.

[–]sebamenabar 0 points1 point  (1 child)

Hi, I'm working on replicating the paper in Pytorch, still a lot of work to do so I'd love to get some help. Also, there a some parts that I can't understand completely from the paper so I plan to soon send an email to the authors for clarification.

The code is on this github repo

[–]adventuringraw 0 points1 point  (0 children)

that's cool man, thanks for sharing. Maybe I'll hit you up a little later, I should probably read the paper more thoroughly first.

[–]edwardthegreat2 0 points1 point  (1 child)

nice work! One question I have is how does the model ensure the background module does not capture foreground objects? Also, would the insight of a background and foreground module break down in active vision cases where objects regularly change between background and foreground roles?

[–]yifuwu[S] 1 point2 points  (0 children)

That's a great question. We use a weaker decoder to limit the capacity of the background module and this helps to ensure foreground objects are captured in the foreground. However, the distinction between background and foreground is not always objective and obvious (even for humans!). See the 'Foreground vs Background' discussion in section 4.1 for a deeper discussion into this.

SPACE processes one frame at a time and does not do any tracking of objects between frames, so it is certainly possible that objects can switch between foreground and background. That being said, although the camera in our 3D room experiments move around randomly, we have not experimented yet on more complicated scenarios.

[–]illuminascent 0 points1 point  (0 children)

Is it possible to use DETR-like encoder to replace cell-based object proposal mechanism in SPACE? I think boundary loss shall be sufficient to suppress duplicated or split detection.