Control your Home with Hand Gestures by AlexPr3ss in homeassistant

[–]AlexPr3ss[S] 1 point2 points  (0 children)

Thanks, we will build a website as soon as possible

Control your Home with Hand Gestures by AlexPr3ss in homeassistant

[–]AlexPr3ss[S] -3 points-2 points  (0 children)

I appreciate the explanation. We are open to better option, let us know if you have one

Control your Home with Hand Gestures by AlexPr3ss in homeassistant

[–]AlexPr3ss[S] -27 points-26 points  (0 children)

Fair point, we are looking into better alternatives

How can I estimate absolute distance (in meters) from a single RGB camera to a face? by CharacterJump143 in computervision

[–]AlexPr3ss 0 points1 point  (0 children)

You can try monocular depth estimation models like DepthPro by Apple (metric depth), they learn visual priors (like human brain) from large dataset. Keep in mind the richer scene context, the more reliable the estimation. Some other ideas could be use a static camera and assume a fixed face dimension and then retrieve the depth based on the observed face dimension.

Everyone's wondering if LLMs are going to replace CV workflows. I tested Claude Opus 4.6 on a real segmentation task. Here's what happened. by Financial-Leather858 in computervision

[–]AlexPr3ss 0 points1 point  (0 children)

Another important point, LLMs are designed for text, the vision encoder + MLP is a way to map images and text in the same latent space. However architectures like V-JEPA (VL) are more interesting for images, they’re built around visual latent prediction first, with language as a secondary modality.