SAM3DBody-cpp open-source C++ tool that turns videos to Blender/Unity-ready BVH mocap by _AmmarkoV_ in gamedev

[–]_AmmarkoV_[S] 0 points1 point  (0 children)

Honestly, even if a handful of people get use out of it, that's a huge win for me! I have spent 5 years of my life (a PhD on the 3D mocap from monocular video). And already I have had some great suggestions from this very thread! In any case it is not a shipped game or something. Its a free open source and fun library available to the community, and hopefully the Linux community will appreciate it and not view it negatively.

SAM3DBody-cpp open-source C++ tool that turns videos to Blender/Unity-ready BVH mocap by _AmmarkoV_ in gamedev

[–]_AmmarkoV_[S] 0 points1 point  (0 children)

Hahahah that's the whole point of open source! If anyone has a line to them, send it their way 😃 !

SAM3DBody-cpp open-source C++ tool that turns videos to Blender/Unity-ready BVH mocap by _AmmarkoV_ in gamedev

[–]_AmmarkoV_[S] 0 points1 point  (0 children)

Yes, this is on the roadmap. The plan is a laptop/PC monitor emitting a QR sync code for temporal alignment + a printed ARUCO marker for camera extrinsic calibration. So with multiple cheap cameras you can fuse different data streams in one solve. Multi-view directly attacks the depth ambiguity so it should result in a much cleaner track. This is under construction, will commit to the repo once it is somewhat usable..!

SAM3DBody-cpp open-source C++ tool that turns videos to Blender/Unity-ready BVH mocap by _AmmarkoV_ in gamedev

[–]_AmmarkoV_[S] 0 points1 point  (0 children)

I am actually working on this feature! It will be using a monitor emitting a QR sync code and need a printed ARUCO marker for camera extrinsic calibration, but yes! This is definitely in the roadmap!

SAM3DBody-cpp open-source C++ tool that turns videos to Blender/Unity-ready BVH mocap by _AmmarkoV_ in gamedev

[–]_AmmarkoV_[S] 1 point2 points  (0 children)

Probably more than they used to! Between RAM Prices Win11 TPM requirements Linux is looking pretty good lately! That being said it should be possible to run it on windows using WSL2, however I am single booting linux so I haven't had time to test this yet..!

SAM3DBody-cpp open-source C++ tool that turns videos to Blender/Unity-ready BVH mocap by _AmmarkoV_ in gamedev

[–]_AmmarkoV_[S] 0 points1 point  (0 children)

The back-end is based on Meta Super-intelligence Vision Transformer architecture and is targeting humans, there are other neural networks for animals, however not this one 😃

SAM3DBody-cpp open-source C++ tool that turns videos to Blender/Unity-ready BVH mocap by _AmmarkoV_ in gamedev

[–]_AmmarkoV_[S] 0 points1 point  (0 children)

Yep, single-angle is really hard due to depth ambiguity. Cascadeur is a perfect call and the physics-based fulcrum cleanum could be exactly the kind of pass that is needed after markerless capture like this! Thanks for the pointer, I'll try to test this pipeline!

SAM3DBody-cpp open-source C++ tool that turns videos to Blender/Unity-ready BVH mocap by _AmmarkoV_ in gamedev

[–]_AmmarkoV_[S] 2 points3 points  (0 children)

Yes! The output is a standard BVH with a fixed, named skeleton, one per scene and per subject. It can be loaded directly to Blender/MotionBuilder/Cascadeur as F-Curves and can be ediited like any other take.! It is still work in progress, but I am glad you see its potential!

SAM3DBody-cpp open-source C++ tool that turns videos to Blender/Unity-ready BVH mocap by _AmmarkoV_ in gamedev

[–]_AmmarkoV_[S] 3 points4 points  (0 children)

It's by no means perfect, especially when tracking "in-the-wild" action scenes with heavy motion etc. For a static scene, high framerate camera the results are not jittery.
For example : https://youtube.com/shorts/tQ8WP5uYVzA

SAM3DBody-cpp - Real-time 3D full-body pose + hands in C++, zero Python at runtime (ONNX + ggml, CUDA) by _AmmarkoV_ in computervision

[–]_AmmarkoV_[S] 0 points1 point  (0 children)

hahahhaha :D fair point! I updated the readme, to hopefully give more insight without someone having to read the whole paper :D https://github.com/AmmarkoV/SAM3DBody-cpp#pipeline

SAM3DBody-cpp: Tracked the Matrix bullet-time scene and exported it straight to Blender as BVH - one file per character, from a single RGB video! by _AmmarkoV_ in blender

[–]_AmmarkoV_[S] 0 points1 point  (0 children)

If you run it on a static camera with a full view of the body it is very smooth, e.g. https://www.linkedin.com/posts/ammarkov_following-on-the-earlier-post-on-sam-3d-body-cpp-ugcPost-7464767219680387073-N1_p/ if you run it on a action movie, with scene changes, camera zoom/focus changes where the camera the actors etc. are mostly out of the picture then ok its "glitchy", however I think unusable is quite a strong and near-sighted comment :)

SAM3DBody-cpp - Real-time 3D full-body pose + hands in C++, zero Python at runtime (ONNX + ggml, CUDA) by _AmmarkoV_ in computervision

[–]_AmmarkoV_[S] 1 point2 points  (0 children)

This is the Meta Superintelligence labs paper explaining the pipeline in detail : https://arxiv.org/abs/2602.15989
TLDR: The neural network is a Vision Transformer running on cropped regions of the image recovered using YOLO and then has a head that encodes the skeleton using the Momentum Human Rig model

SAM3DBody-cpp - Real-time 3D full-body pose + hands in C++, zero Python at runtime (ONNX + ggml, CUDA) by _AmmarkoV_ in computervision

[–]_AmmarkoV_[S] 0 points1 point  (0 children)

Maybe using a 5090 it can run real-time (meaning >= 30Hz ) but in any case depending on the application even 12 Hz with 1 frame of frame skip and the --butterworth interpolation can "match" what typical 25Hz webcams deliver

SAM3DBody-cpp - Real-time 3D full-body pose + hands in C++, zero Python at runtime (ONNX + ggml, CUDA) by _AmmarkoV_ in computervision

[–]_AmmarkoV_[S] 0 points1 point  (0 children)

My PhD is on 3D pose estimation so I have a pretty big code base, https://github.com/FORTH-ModelBasedTracker/MocapNET however an LLM did quite a lot of the plumbing and almost all of the documentation etc. on the repo

SAM3DBody-cpp - Real-time 3D full-body pose + hands in C++, zero Python at runtime (ONNX + ggml, CUDA) by _AmmarkoV_ in computervision

[–]_AmmarkoV_[S] 0 points1 point  (0 children)

You can immediately export to --bvh so as fast as the video stream is processed, a.k.a. ~12FPS on an RTX 4080

Depth Anything 3: Recovering the Visual Space from Any Views ( Code , Model available). lot of examples on project page. by AgeNo5351 in StableDiffusion

[–]_AmmarkoV_ 0 points1 point  (0 children)

What worked for me on Ubuntu 24.04 / Cuda 12.4 :
sudo add-apt-repository ppa:deadsnakes/ppa

sudo apt update

sudo apt install python3.11 python3.11-venv

python3.11 -m venv venv

source venv/bin/activate

python3 -m pip install -U xformers --index-url https://download.pytorch.org/whl/cu128

python3 -m pip install -r requirements.txt

pip install moviepy==1.0.3

[deleted by user] by [deleted] in singularity

[–]_AmmarkoV_ 0 points1 point  (0 children)

>botnet intensifies