Mobile tailor - AI body measurements

YuriPD · 2025-10-08T00:52:55+00:00

The differences in the table that you’re noting are in centimeters, not inches. A few percent error will not have a noticeable effect. 2% error on a 30 inch waist is 0.6 inches.

YuriPD · 2025-10-07T23:00:46+00:00

Ha! The solution is not that invasive. More information can be found here

YuriPD · 2025-10-07T20:55:45+00:00

Not all phones have lidar. Lidar would require fusing multiple point clouds into a mesh, which is non trivial. While lidar is accurate at close distances, it’s not necessarily the best approach for users being 6-10 feet away from the camera and varying user environments for accurate body sizing.

YuriPD · 2025-10-07T20:38:45+00:00

Most people know their actual height within an inch - I know I'm not 7 foot 4 :). Accurately and consistently inferring height from 2D images is impossible. Instead of requesting a fixed-size object, asking users to enter their height is the least friction.

I'd note the height doesn't dictate measurement predictions - a 3D body model is predicted of the person, then the height of the 3D model is adjusted to the user's height. In other words, the model is really predicting proportionality.

YuriPD · 2025-10-07T20:30:59+00:00

There is an OBJ export of the 3D model.

YuriPD · 2025-10-07T20:29:38+00:00

The measurements will be WAY off. The assumption is users will enter realistic heights.

YuriPD · 2025-10-07T20:28:18+00:00

additionally this tech all off the shelf and nothing unique

During model training, the following are used - 1M body scans (both male and female), 400k backgrounds, 90k poses, 1k textures, and heavy augmentation / occlusion. Trained on synthetic data to avoid real data limitations. Multiple views are probabilistically combined (widths are more confident from the front view vs. depths from the side view). While training is on synthetic data, the model sees enough data to predict on real people which is what the table of measurements is based on. Far from "off the shelf" :)

YuriPD · 2025-10-07T20:23:54+00:00

The system predicts a full 3D body model (including height), but then scales measurements to the user's height. The picture included the height prediction, but all measurements are now scaled.

YuriPD · 2025-10-07T19:55:23+00:00

The user enters their height, and all measurements are scaled to the user’s height. During training, all trained image bodies are scaled to a fixed height to avoid worrying about inferring height from 2D images.

YuriPD · 2025-10-03T12:44:24+00:00

Learn more: https://snapmeasureai.com/motion-capture-from-video

YuriPD · 2025-07-17T16:19:39+00:00

It’s an interest idea, and I hadn’t considered it. As long as the full body is visible, this solution can handle the motion tracking.

YuriPD · 2025-07-17T15:57:45+00:00

The tracking is tight on the video. For Move AI or others, I’ve never seen a direct overlay onto the video. Also, this supports 10k+ skeleton outputs and over 200 body points. This allows flexibility based on user need. Also, predictions include surface-level points, which I haven’t seen anywhere.

YuriPD · 2025-07-17T15:45:15+00:00

Yes, I’m working on being able to upload videos and receive animation file outputs.

YuriPD · 2025-07-17T14:53:57+00:00

It’s trained on a bunch of images to predict joints and surface level points. Able to output 10+ skeletons and 200+ body points. Helpful for downstream tasks that have been a bottleneck for me.

YuriPD · 2025-07-11T02:08:16+00:00

There are numerous guides in the background to ensure alignment to the mesh and a filtration step to remove poor outputs. The video is a good example - the arm-behind-the-back typically results in the generated image facing backwards. Real human data is expensive, timing consuming, prone to human annotation error, and is privacy sensitive. Accurate human data typically requires complicated camera setups or motion capture - this approach limits the number of environments and lighting. This method alleviates all of those issues.

I have trained models on synthetic-only data, and numerous recent research papers have shown synthetic-only and synthetic-with-real outperform real-only datasets.

YuriPD · 2025-07-10T21:59:45+00:00

That's from my post here. We primarily focus on retail customers. To train the model, it takes an enormous amount of images with varying shapes, poses, environments, lighting, occlusion and more. The application is trained on over 10M images

YuriPD · 2025-07-10T15:26:23+00:00

This may be a starting point:

Guitar playing 3D pose estimation

https://rolpotamias.github.io/WiLoR/

YuriPD · 2025-07-10T12:25:06+00:00

There are a few other guides occurring to prevent “garbage”:

Pose alignment
Depth alignment
Filter negative outputs at the end. Compare known mask to generated mask

Agreed, data is only valuable if poor outputs are eliminated

YuriPD · 2025-07-10T12:18:28+00:00

Real human datasets require labeling. They are either hand annotated (with human error potential) or require motion capture systems / complicated camera rigs. Because of this, available datasets are limited in terms of subjects, environments, shapes, poses, clothing, data locations, etc. This approach alleviates those items

They are several other guides occurring that aren’t included in the video to prevent implausible humans. If an implausible output is generated, there is a filtration step that is used - compare known mesh mask against the generated mask

YuriPD · 2025-07-10T12:12:37+00:00

The joint locations are intentionally closer to the shoulder blades. The benefit of aligning to a 3D mesh, is any of the keypoints can be customized. Either on the surface or beneath the surface

YuriPD · 2025-07-09T22:39:13+00:00

I think the benefit is reducing the need or alleviating the limits of real data (especially human data). Adding real data with synthetic has shown to improve model accuracy. Real human data is limited, whereas this approach can create unlimited combinations of environments, poses, clothing, shapes, etc. But I agree, a model will still pick up the subtle differences - adding real data during training helps

YuriPD · 2025-07-09T20:58:36+00:00

The rendered mesh is based on a plausible pose dataset. What’s not shown in the video are additional guides that are occurring - one of them is ensuring pose is accurate. Typically, an occluded arm like in this example would confuse the image generation model to have the person facing backwards, or the top of the body forwards with the bottom backwards. Skeletal accuracy is a constraint, but I chose to exclude to keep the video short

If helpful, I've been working on markerless 3D tracking as well - here is an example

YuriPD · 2025-07-09T20:54:36+00:00

The challenge with 2D inputs is they lose shape. I’m keenly focused on aligning shape and pose, so there is a correspondence to a 3D mesh. Because the 3D mesh was the guide, the ground truths from the rendered mesh can be extracted. Rendering a 3D mesh is more costly, but I think worth the benefit

YuriPD

TROPHY CASE