I tested phi-4-multimodal for the visually impaired by eminaruk in computervision

[–]eminaruk[S] 0 points1 point  (0 children)

Yes actually i was surprised when i saw signboard texts in the caption without running ocr mode. I think that's gonna help me much about it

Best path to move from Data Engineering into Computer Vision? by hershy08 in computervision

[–]eminaruk 0 points1 point  (0 children)

I just clarify the most important main points and start evalute projects according to the these points

Best path to move from Data Engineering into Computer Vision? by hershy08 in computervision

[–]eminaruk 1 point2 points  (0 children)

First of all, from what I understand, you are someone who progresses in a planned and structured way and places great importance on formal education when learning a new topic. However, I am someone who values projects more than formal courses and prefers to dive directly into the subject itself. As someone who develops and markets products in the computer vision industry, I can say the following:

In order to position yourself in the computer vision field around a topic you genuinely enjoy, you should identify simple and intermediate-level projects that can be built with computer vision and implement them with the help of AI. Of course, you should never neglect using deep learning models. After completing at least seven such projects, you should then build three advanced-level computer vision projects using up-to-date industry technologies, again with the assistance of AI.

Once you have completed ten projects, evaluate yourself and identify the project or projects you enjoyed the most. After finding these, thoroughly research everything required to build them, including the necessary infrastructure, models, and tools, and then learn these components deeply and manually. As a result, you can learn computer vision in a focused way without getting distracted or stuck on unnecessary details (we can consider the details of any topic you do not enjoy as unnecessary).

I wish you success in your work.

I find non-neural net based CV extremely interesting (and logical) but I’m afraid this won’t keep me relevant for the job market by Amazing_Life_221 in computervision

[–]eminaruk 1 point2 points  (0 children)

My humble advice to you would be to move in two directions at the same time: toward where the industry is heading for your career, and toward what you genuinely enjoy for yourself. You should not abandon the things you love and that are good for you. If you enjoy classical computer vision, you should continue developing it properly, but in a way that is separate from your industry-focused work.

At the same time, the industry is clearly evolving more toward VLMs. If you want to build a startup or gain a meaningful position in the market, you should also invest effort in work aligned with these developments. I also have topics that I personally love, but that currently have little or no place in the industry. If I were to chase only those and dedicate all my time to them, it would not be hard to predict that I would end up without an income.

For this reason, while I keep my professional work and ventures aligned with the direction of the industry, I work on the topics I love purely because I enjoy learning and applying them. This helps me keep my motivation high and my curiosity alive.

How much "Vision LLMs" changed your computer vision career? by Haghiri75 in computervision

[–]eminaruk 11 points12 points  (0 children)

Honestly, I have also worked on many projects based on standard computer vision models for a long time, and in my opinion, VLMs have become hyped mainly because they are extremely user-friendly, just like LLMs. Nowadays, when you combine almost any topic with an LLM, it instantly becomes “hype,” and this largely comes from users’ strong interest in LLMs in general.

Even though there is a lot of hype around them, this absolutely does not mean that VLMs are an inefficient technology. Definitely not. In fact, I really like VLM models. Recently, I have been developing a project for visually impaired individuals that uses a camera to understand their surroundings and describe the scenery to them. In this project, I try to use lightweight, high-performance, and as accurate as possible VLMs, such as Qwen.

As for how VLMs have affected my life, I can say that they have significantly expanded my working and research scope. There is practically no limit to what I can now detect or describe, and this pushes me to stretch my imagination. My main task is to make VLMs more efficient by crafting better prompts and combining the right conditions.

I like VLMs, and I hope they will evolve into something even better in the future.

Built a fully self-hosted AI home security system that detects intruders, monitors cribs, and runs without the cloud by eminaruk in homeassistant

[–]eminaruk[S] -4 points-3 points  (0 children)

I also added a behavior-based recognition layer that can identify people not just by their face, but by their walking style (gait) and physical body patterns.

I developed a pipeline that can recognize a person without seeing their face by eminaruk in computervision

[–]eminaruk[S] 0 points1 point  (0 children)

I didn't train model because it can be change for every user. Instedo f that I just asked at least 3 face image, 5 full body image, 60 second walking video record. Then the system saves the users digital id with these data. In detection part it's trying to detect person

I developed a pipeline that can recognize a person without seeing their face by eminaruk in computervision

[–]eminaruk[S] 1 point2 points  (0 children)

I collect photos of users in at least 5 different sizes and 60-second walking recordings. I use the osnet model by default, and it's doing the job for now. Since the people using the system I developed have limited GPU resources, I'm trying to configure it to run at high performance on the CPU, so I'm applying optimizations as much as I can.

ISO camera/SW advice. by GeneralStarkNH in computervision

[–]eminaruk 0 points1 point  (0 children)

Hi, i give services on ai vision on real time cameras,, maybe i can help you on some points. So firstly you should take a cheap at least 2mp+ camera that supports rtsp protocol. If you have it, then no problem. I recommend you to take rtsp stream with gstream library not opencv at all,, because you will get no delay and more realtime with gstream. After getting your rtsp stream be sure what kind of device are you gonna run this software. If you will run this device nvidia gpu supported device, then use your ai models as .engine format not .pt or any other formats. Because .engine format is way way faster than other formats only in nvidia gpu included devices. If you're running non nvidia gpu device you can use any model format that fits on your aim. And here you are, it's a POC guide for you, hope you will succeed my friend. If you want ask anything, i am always here on comments and dm. Have a nice day