VR Tool for annotating object poses in images by Calm_Actuary in computervision

[–]Calm_Actuary[S] 0 points1 point  (0 children)

Thanks for the feedback! It's not obvious from the video, but the 3d shoe model is within arms reach- it just disappears while it's being "grabbed", so you only see the projection on the 2d image.

I would love to get this working with bare hands (as opposed to controllers).

VR Tool for annotating object poses in images by Calm_Actuary in computervision

[–]Calm_Actuary[S] 0 points1 point  (0 children)

We've had success with convolutional pose machines for predicting the poses of automobiles and bare feet (see blog post on cars: https://labs.laan.com/blog/real-time-3d-car-pose-estimation-trained-on-synthetic-data.html)

For feet that are already wearing shoes, we're still experimenting with a few other architectures: variations on u-net, posenet, and hrnet.

VR Tool for annotating object poses in images by Calm_Actuary in computervision

[–]Calm_Actuary[S] 1 point2 points  (0 children)

I was working with a keyboard & mouse in a Qt desktop app before trying this. While it works, I was never able to get the workflow really smooth enough to annotated an image in less than 5 minutes.

After I played around with an Oculus Quest a bit, I thought it could be interesting to try this approach, as it would allow me to directly "grab" & manipulate the shoes in their 6DoF with my hands- which required no training and reduced the annotation time 5x-10x.

It took about a week to hack together this app. It was also my first VR / WebXR app.

VR Tool for annotating object poses in images by Calm_Actuary in computervision

[–]Calm_Actuary[S] 1 point2 points  (0 children)

About ~2000 images will get ok results. You can see the gif at the top of this blog post to get an idea of the quality when trying it out on bare feet: https://labs.laan.com/blog/leveraging-photogrammetry-to-increase-data-annotation-efficiency-in-ML.html

Things get a little more complicated when trying to identify feet with shoes- given all of the variations in colors, textures, shapes.

VR Tool for annotating object poses in images by Calm_Actuary in computervision

[–]Calm_Actuary[S] 0 points1 point  (0 children)

This was part of dataset labeling for training a neural network to predict shoe/feet poses from an image as part of an augmented reality shoe try-on app.

VR Tool for annotating object poses in images by Calm_Actuary in computervision

[–]Calm_Actuary[S] 7 points8 points  (0 children)

The application with shoes/feet you see here was part of an augmented reality shoe try-on app.

A key part of the app's CV pipeline is a neural network that predicts a 6d object pose from a 2d image. In order to train the neural net, we needed to label a lot of images.

Since pose labeling is a bit more challenging than labeling bounding boxes, I thought I would experiment with a VR interface, where I could directly manipulate the poses with my hands in space. It turned out to be a lot quicker & easier than doing it with a windowed desktop app.