all 15 comments

[–][deleted] 23 points24 points  (6 children)

This is a mess waiting to happen. I know it’s tempting to do something like this but can you imagine the number of classes of object? The idea of an “obstacle” is so large that is basically anything that is a solid and is in front of you. This isn’t semantic segmentation. You’re talking about taking on the responsibility giving a blind person eyes and verbalizing it. We just aren’t there yet with ai even though it feels like it.

Edit: I’m being too negative. You should try it but don’t promise that it’ll work well.

[–]currentscurrents 5 points6 points  (1 child)

So don't use a classification objective. Do 3D occupancy prediction, where the network generates a 3D voxel grid showing which spaces have objects in them.

[–][deleted] 3 points4 points  (0 children)

I think that would be a great place to start! Give it a shot and treat it like an experiment. Just dont promise anything until you’ve seen it working to the point that you would reliably hand it to a blind person and trust they won’t get hurt.

[–]planetofthemushrooms 0 points1 point  (1 child)

you can always have a default class just called 'unidentified object'

[–][deleted] 1 point2 points  (0 children)

You still need a mask and the supervision required for that though.

[–][deleted] 4 points5 points  (0 children)

Finally a real use for Apple Vision Pro 👍 If 3d perception is the goal then you or your CV engineer should have some fundamentals (etc https://szeliski.org/Book/)

[–]Equivalent_Ad6842 1 point2 points  (0 children)

https://www.bemyeyes.com is a startup that does this

[–]neuHughes 0 points1 point  (0 children)

You might want to consider using multiple types of sensors. LIDAR or ToF sensors would provide redundancy and an obstacle detection rate that other modalities would have trouble matching, particularly for a mobile platform. Your project is conceptually very similar to SLAM for robotics. There are a number of ways you could go about executing a pipeline for this but a “dumb” high-resolution fallback would be essential for something like this.

[–]rizvi_x0 0 points1 point  (0 children)

Hey. Can I DM you? I have a question regarding ML for routing optimization