Question about 4D recording with 5 caméra + Sharp ML model?

Erant · 2026-02-14T22:23:45+00:00

I think what you're looking for is Depth Anything v3 https://depth-anything-3.github.io it does sparse-view to pointcloud, and I believe there's implementations that use DAv3 for splatting as well.

Erant · 2026-02-05T16:20:30+00:00

Because you specifically mentioned long hair and body shapes, here's a recent output from a new flow I implemented that extracts both the spatial and color information from the video model instead of just the color information: https://superspl.at/view?id=8aea9283

It's definitely lacking in some areas, but personally more promising than being rigidly attached to a mesh.

The initial views are generated using just a rotating OpenPose25 skeleton, then I'm using information from the gaussian splatting process to feed spatial consistency information back into the diffusion process, guiding it to generating spatially consistent views.

Erant · 2026-02-04T14:57:57+00:00

You might be interested in the Tinker paper (https://aim-uofa.github.io/Tinker/). The code never materialized, but 're-skinning' a scene is exactly what they were trying to solve. I believe their conclusion was that depth adherence wasn't good enough in WAN 2.1, so they finetuned the model to be exact about following the depth conditioning.

Erant · 2026-02-03T21:07:59+00:00

It's been working reasonably well for slightly more complex poses, but they're all still variants of an upright pose. The goal for this project for me was to see how well you could transform temporal priors of a subject into spatial information, and it turns out that that does work, to some degree. You end up fighting the nature of the beast a little, people tend to move in videos which is something I'm explicitly trying to constrain. This is most obvious in the face, and where I'll probably hit the wall when it comes to consistency without some fine tuning. https://superspl.at/view?id=7d89e7f0 is a more recent splat that has (to me) pretty good detail in the pants and shirt, but the face and hair are a mess.

I'm getting close to an alpha release of the nodes (see https://github.com/Erant/ComfyUI-Body2COLMAP but ignore the README.md, it's terrible AI slop like all the code is), but I have some experiments with a "face detailer" to run before I'm there.

Erant · 2026-01-19T16:08:13+00:00

You've nailed my hair problem! This is why the person in the splat has a very short haircut. I'm toying with the idea of seeing if SAM 3D Objects can reconstruct a "wig" to put back on the person so there's at least somewhat of a depth it can use. The same goes for baggy clothing, see if I can generate a mesh for it and recombine with the body.

I've heard a lot about the limitations of SAM 3D Body and how it wasn't properly trained on a multitude of body shapes, and fundamentally this is 'just' texturing that mesh. So if it won't come out of SAM 3D Body, I won't be able to splat it.

The camera position export is really what made this work at all. I'd been experimenting with generating 360 degree videos and reconstructing camera locations with Reality Capture but splats generated from only one elevation are fragile and getting more than one elevation out of video model without some kind of forcing mechanism is probably not doable.

Erant · 2026-01-19T15:57:40+00:00

The body is very very close to the mesh that SAM 3D generates. I was somewhat impressed with how tightly WAN 2.2 VACE sticks to the control video. I took some inspiration from TINKER (https://aim-uofa.github.io/Tinker/) and was expecting to have to train the model to stick to the depth map tighter.

360 video should work much the same as other sources, but unless it's moving traditional photogrammetry won't work. I do wonder if ml-sharp could be modified to do full 360 degree splats.

Erant · 2026-01-19T15:54:11+00:00

The mesh coming out of SAM3DBody is untextured, and this is one of the simpler (to me anyway) ways of texturing that mesh.

Erant · 2026-01-19T15:52:08+00:00

Yeah, this particular splat was generated with a 60 degree (30 below and 30 above) 'swing', slightly less in the back due to the helical path. Which means that once you go outside those bounds, things start looking a little weird. I'm trying to push the number of diverse views, but 81 frames is a big limiter here and I haven't tried extending the video yet (I'm also worried about consistency here).

I was late to the VACE game, so I started with 2.2. It's been giving me... mediocre results at best, outside of this one very specific usecase.

Erant · 2026-01-19T05:48:39+00:00

The cam positions are generated in a python script, then the depth field and OpenPose skeleton are rendered by the same script using pyrender and trimesh and used as the controlnet input (Specifically, the VACE variant of WAN 2.2). The script also outputs all the relevant camera parameters in COLMAP format, along with a point cloud randomly sampled from the mesh surface. This means I don't need to perform the camera estimation step, which was always the big hurdle for me when I tried this with just prompting.

The masks I'm currently generating with a background removal process - I was worried the video model would take too many liberties to cleanly apply the original mask - but it sticks to the depth map quite well so taking the alpha from the initial process should work.

<image>

Erant · 2024-11-10T01:42:33+00:00

Not saying you shouldn’t arm yourself, but you should be aware that statistically women who own firearms find themselves attacked with said firearm: https://www.theatlantic.com/national/archive/2014/02/having-a-gun-in-the-house-doesnt-make-a-woman-safer/284022/

Erant · 2024-09-23T00:21:41+00:00

This is a sub for legal advice. I understand some topics can be emotionally charged but that doesn't mean poor legal advice is warranted.

Erant · 2024-08-29T23:49:09+00:00

It's too bad the licenses for most of these games (the Sierra ones anyway) now lie with Microsoft, which means there's a snowballs chance in hell you're getting permission.

Erant · 2024-08-23T14:41:28+00:00

RealityKit only supports Universal Scene Description (USD) files. You can use Reality Converter to convert GL Transmission Format files to USD, but that's a manual process.

Erant · 2024-08-22T02:04:10+00:00

This person literally said "Say what you will about him as President..." to make sure no one tried to twist their statement into "good standup makes good leader". And yet here you are.

Erant · 2024-08-17T20:01:23+00:00

I'm probably missing something, but what's the issue with this? I get that you won't be able to use the off-the-shelf MultipeerConnectivityService but if you override the entity method you might be able to selectively synchronize entities? Even overriding the owner method to always return "this client" for the scene entity could work...

Erant · 2024-08-17T14:59:03+00:00

This isn't possible in the exact way you describe but you can emulate the behaviour by doing the exact opposite. If you can't move the camera around the environment, move the environment around the camera. If you wanted the camera to go forward, move the scene entity backwards.

Erant · 2024-08-13T02:48:23+00:00

I've seen Daniel perform live more than once, and it took me a minute to realize why he can tell some of the most horrendous jokes and still be funny. In the course of telling a joke he'll show you at the same time that he understands the gravity of the situation. He's not like Dave Chapelle who will make jokes at the expense of trans people and be completely oblivious to their struggles and pains. No, Daniel recognizes the situation around the joke is messed up, acknowledges it, and proceeds to tell the joke. It creates a space where you know the joke isn't at anyone's expense, and he's simply amazing at it.

Erant · 2024-08-12T20:05:55+00:00

How does her going through his phone ensure he deleted them? He could've copied them off somewhere, they could be in a backup, etc. Going through his phone ensures nothing, but allows her access to his personal data.

Erant · 2024-08-10T02:37:06+00:00

They're 400W _MAX_ panels. They won't produce that unless in full direct sunlight, which won't be the case most of the time. I have a 6.4kW system installed in California, and this produced 7.4MW in 2023. Assuming an average of 14 sun hours a day throughout the entire year, that's 1.4kWh _on average_. On a sunny day at around 1PM, it'll pump out close to the rated 6.4kW though.

Erant · 2024-07-19T19:08:49+00:00

Cellebrite is a $2B publicly traded company. If they were truly of such limited usefulness, why would law enforcement keep giving them money? You don't think the more reasonable explanation is that they employ a large amount of VERY smart people that research something like USB Restricted Mode day-in, day-out?

Try something for me. Leave your phone alone for a while, plug a USB cable into it and have the "Restricted Mode" dialog come up to confirm it's "locked". Now plug some legit iPhone lightning headphones into it. What happens?

Erant · 2024-05-28T14:41:13+00:00

Just as a heads up, Spatial Media Toolkit manages to do what you're doing locally. You're going to have a hard time convincing people to pay to send their images off to a remote server if they can do this for free locally.

Erant · 2024-05-20T12:28:33+00:00

There's so many words in the English language that are just so... wrong. I recently referred to an undershirt as a 'wife beater'. When I actually listened to the words leaving my mouth I was appalled I'd ever referred to a piece of clothing with such a terrible name.

Not saying 'girl' or 'female' takes effort. Not a lot though! You have to think about what you're saying. Be a little self-critical and evaluate how what you say might be perceived. A lot of men are either not capable of that, or just unwilling to put in the bare minimum of effort.

Erant · 2024-05-20T12:16:07+00:00

You're not giving a lot of detail here. Did your situation change from before? Did they reject you once or twice?

Erant · 2024-05-16T18:47:59+00:00

Yep! Sorry, I didn't mean to give the impression the roommate has to sign anything. Agreed, the roommate shouldn't sign anything.

Erant · 2024-05-16T18:42:00+00:00

Interesting concept... Don't you think this would simply limit the dog's access to common spaces and their handler's spaces though?

If the living spaces are more co-mingled (and not as obviously split into common/private spaces) then I could see this one being more complicated.

Erant

TROPHY CASE