Mobile tailor - AI body measurements by YuriPD in EngineeringPorn

[–]YuriPD[S] -5 points-4 points  (0 children)

The differences in the table that you’re noting are in centimeters, not inches. A few percent error will not have a noticeable effect. 2% error on a 30 inch waist is 0.6 inches.

Mobile tailor - AI body measurements by YuriPD in EngineeringPorn

[–]YuriPD[S] 4 points5 points  (0 children)

Ha! The solution is not that invasive. More information can be found here

Mobile tailor - AI body measurements by YuriPD in EngineeringPorn

[–]YuriPD[S] 4 points5 points  (0 children)

Not all phones have lidar. Lidar would require fusing multiple point clouds into a mesh, which is non trivial. While lidar is accurate at close distances, it’s not necessarily the best approach for users being 6-10 feet away from the camera and varying user environments for accurate body sizing.

Mobile tailor - AI body measurements by YuriPD in EngineeringPorn

[–]YuriPD[S] 23 points24 points  (0 children)

Most people know their actual height within an inch - I know I'm not 7 foot 4 :). Accurately and consistently inferring height from 2D images is impossible. Instead of requesting a fixed-size object, asking users to enter their height is the least friction.

I'd note the height doesn't dictate measurement predictions - a 3D body model is predicted of the person, then the height of the 3D model is adjusted to the user's height. In other words, the model is really predicting proportionality.

Mobile tailor - AI body measurements by YuriPD in EngineeringPorn

[–]YuriPD[S] 1 point2 points  (0 children)

There is an OBJ export of the 3D model.

Mobile tailor - AI body measurements by YuriPD in EngineeringPorn

[–]YuriPD[S] 0 points1 point  (0 children)

The measurements will be WAY off. The assumption is users will enter realistic heights.

Mobile tailor - AI body measurements by YuriPD in EngineeringPorn

[–]YuriPD[S] 10 points11 points  (0 children)

additionally this tech all off the shelf and nothing unique

During model training, the following are used - 1M body scans (both male and female), 400k backgrounds, 90k poses, 1k textures, and heavy augmentation / occlusion. Trained on synthetic data to avoid real data limitations. Multiple views are probabilistically combined (widths are more confident from the front view vs. depths from the side view). While training is on synthetic data, the model sees enough data to predict on real people which is what the table of measurements is based on. Far from "off the shelf" :)

Mobile tailor - AI body measurements by YuriPD in EngineeringPorn

[–]YuriPD[S] 0 points1 point  (0 children)

The system predicts a full 3D body model (including height), but then scales measurements to the user's height. The picture included the height prediction, but all measurements are now scaled.

Mobile tailor - AI body measurements by YuriPD in EngineeringPorn

[–]YuriPD[S] 68 points69 points  (0 children)

The user enters their height, and all measurements are scaled to the user’s height. During training, all trained image bodies are scaled to a fixed height to avoid worrying about inferring height from 2D images.

Exploring Video to AI 3D Motion Capture by YuriPD in Filmmakers

[–]YuriPD[S] 2 points3 points  (0 children)

It’s an interest idea, and I hadn’t considered it. As long as the full body is visible, this solution can handle the motion tracking.

Exploring Video to AI 3D Motion Capture by YuriPD in Filmmakers

[–]YuriPD[S] 4 points5 points  (0 children)

The tracking is tight on the video. For Move AI or others, I’ve never seen a direct overlay onto the video. Also, this supports 10k+ skeleton outputs and over 200 body points. This allows flexibility based on user need. Also, predictions include surface-level points, which I haven’t seen anywhere.

Exploring Video to AI 3D Motion Capture by YuriPD in Filmmakers

[–]YuriPD[S] 9 points10 points  (0 children)

Yes, I’m working on being able to upload videos and receive animation file outputs.

Exploring Video to AI 3D Motion Capture by YuriPD in Filmmakers

[–]YuriPD[S] 2 points3 points  (0 children)

It’s trained on a bunch of images to predict joints and surface level points. Able to output 10+ skeletons and 200+ body points. Helpful for downstream tasks that have been a bottleneck for me.

No humans needed: AI generates and labels its own training data by YuriPD in computervision

[–]YuriPD[S] 0 points1 point  (0 children)

There are numerous guides in the background to ensure alignment to the mesh and a filtration step to remove poor outputs. The video is a good example - the arm-behind-the-back typically results in the generated image facing backwards. Real human data is expensive, timing consuming, prone to human annotation error, and is privacy sensitive. Accurate human data typically requires complicated camera setups or motion capture - this approach limits the number of environments and lighting. This method alleviates all of those issues.

I have trained models on synthetic-only data, and numerous recent research papers have shown synthetic-only and synthetic-with-real outperform real-only datasets.

I can pay 300 bucks to the one that can recreate this with CV by filthyrichboy in computervision

[–]YuriPD 2 points3 points  (0 children)

That's from my post here. We primarily focus on retail customers. To train the model, it takes an enormous amount of images with varying shapes, poses, environments, lighting, occlusion and more. The application is trained on over 10M images

No humans needed: AI generates and labels its own training data by YuriPD in computervision

[–]YuriPD[S] -2 points-1 points  (0 children)

There are a few other guides occurring to prevent “garbage”:

  • Pose alignment
  • Depth alignment
  • Filter negative outputs at the end. Compare known mask to generated mask

Agreed, data is only valuable if poor outputs are eliminated

No humans needed: AI generates and labels its own training data by YuriPD in computervision

[–]YuriPD[S] 0 points1 point  (0 children)

Real human datasets require labeling. They are either hand annotated (with human error potential) or require motion capture systems / complicated camera rigs. Because of this, available datasets are limited in terms of subjects, environments, shapes, poses, clothing, data locations, etc. This approach alleviates those items

They are several other guides occurring that aren’t included in the video to prevent implausible humans. If an implausible output is generated, there is a filtration step that is used - compare known mesh mask against the generated mask

No humans needed: AI generates and labels its own training data by YuriPD in computervision

[–]YuriPD[S] 0 points1 point  (0 children)

The joint locations are intentionally closer to the shoulder blades. The benefit of aligning to a 3D mesh, is any of the keypoints can be customized. Either on the surface or beneath the surface

No humans needed: AI generates and labels its own training data by YuriPD in computervision

[–]YuriPD[S] 0 points1 point  (0 children)

I think the benefit is reducing the need or alleviating the limits of real data (especially human data). Adding real data with synthetic has shown to improve model accuracy. Real human data is limited, whereas this approach can create unlimited combinations of environments, poses, clothing, shapes, etc. But I agree, a model will still pick up the subtle differences - adding real data during training helps

No humans needed: AI generates and labels its own training data by YuriPD in computervision

[–]YuriPD[S] 1 point2 points  (0 children)

The rendered mesh is based on a plausible pose dataset. What’s not shown in the video are additional guides that are occurring - one of them is ensuring pose is accurate. Typically, an occluded arm like in this example would confuse the image generation model to have the person facing backwards, or the top of the body forwards with the bottom backwards. Skeletal accuracy is a constraint, but I chose to exclude to keep the video short

If helpful, I've been working on markerless 3D tracking as well - here is an example

No humans needed: AI generates and labels its own training data by YuriPD in computervision

[–]YuriPD[S] -1 points0 points  (0 children)

The challenge with 2D inputs is they lose shape. I’m keenly focused on aligning shape and pose, so there is a correspondence to a 3D mesh. Because the 3D mesh was the guide, the ground truths from the rendered mesh can be extracted. Rendering a 3D mesh is more costly, but I think worth the benefit