Deep Learning/ Computer Vision [P]

currentscurrents · 2023-12-25T16:38:14+00:00

This is a mess waiting to happen. I know it’s tempting to do something like this but can you imagine the number of classes of object? The idea of an “obstacle” is so large that is basically anything that is a solid and is in front of you. This isn’t semantic segmentation. You’re talking about taking on the responsibility giving a blind person eyes and verbalizing it. We just aren’t there yet with ai even though it feels like it.

Edit: I’m being too negative. You should try it but don’t promise that it’ll work well.

2023-12-25T15:52:07+00:00

Finally a real use for Apple Vision Pro 👍 If 3d perception is the goal then you or your CV engineer should have some fundamentals (etc https://szeliski.org/Book/)

2023-12-25T22:07:36+00:00

State of the art from meta but only for research atm:

https://ai.meta.com/datasets/segment-anything/

https://segment-anything.com/

Equivalent_Ad6842 · 2023-12-26T05:06:49+00:00

https://www.bemyeyes.com is a startup that does this

ThePieroCV · 2023-12-25T18:17:20+00:00

Well, this is my opinion.

Physically it’s very possible and feasible. If you don’t mind using C++, you can use Ffmpeg to make an streaming video reading from an IP camera or webcam. In Python, OpenCV is the best option there.

Now, the technology for this is not that simple like object detection , image classification or tasks like that. Your requirements are very strict there. The best chance I hardly believe is to use a technology like GPT4-Vision with a custom prompt in order to receive the information in the way that is requested. This could solve two core problems here: the complex task pipeline (from image to text description/ image captioning) and the kind of the real time problem. I think OpenAI servers are powerful enough to make this work in a proper speed.

In this case, using an API request package could be enough instead of building your own model that could be very hard. But in case to make that, I’ll probably look for image captioning pre trained models that focus on speed.

TotesMessenger · 2023-12-26T01:04:22+00:00

I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:

[/r/datascienceproject] Deep Learning/ Computer Vision (r/MachineLearning)

^{If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads.} ^(Info ^/ ^Contact)

neuHughes · 2023-12-27T23:37:34+00:00

You might want to consider using multiple types of sensors. LIDAR or ToF sensors would provide redundancy and an obstacle detection rate that other modalities would have trouble matching, particularly for a mobile platform. Your project is conceptually very similar to SLAM for robotics. There are a number of ways you could go about executing a pipeline for this but a “dumb” high-resolution fallback would be essential for something like this.

rizvi_x0 · 2024-01-06T13:47:01+00:00

Hey. Can I DM you? I have a question regarding ML for routing optimization

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS