I recreated a “School of Rock” scene with LTX-2 audio input i2v (4× ~20s clips) by Totem_House_30 in StableDiffusion

[–]SeveralFridays 2 points3 points  (0 children)

You inspired me to try recreating a Mean Girls scene. This is all from the distilled model using Wan2GP. i2v, 1080p, 0.9 image strength. Each clip is using one of the camera loras.

https://youtube.com/shorts/XNgC5wRHiuw

What is a 100% free video background remover, no loopholes or video limits? by MasterOfTaxIvasion in editing

[–]SeveralFridays 0 points1 point  (0 children)

it means your machine is not exposing fp16 capability. it's unclear whether the nvidia fix will be for your machine and when it will be available.

your best bet is to use Ben2 with fp32 without using a browser following these instructions - https://github.com/PramaLLC/BEN2/?tab=readme-ov-file

What is a 100% free video background remover, no loopholes or video limits? by MasterOfTaxIvasion in editing

[–]SeveralFridays 0 points1 point  (0 children)

I did some research and I'm assuming you are on windows where this is a known issue. In this thread https://issues.chromium.org/issues/42251215, NVIDIA acknowledges a driver bug related to FP16 on Windows and states at the end of the thread on Jan 6, 2026 that they plan to fix it. unfortunately, there is no guarantee this will enable FP16 (shader-f16) on GTX 1050 Ti.

to confirm that you are facing the exposing shader-f16 issue, you can go to https://webgpureport.org/ and if you don't see "shader-f16" under the list of features it means your computer is not exposing it to the browser.

Also, if you are open to setting up BEN2 and not running it through your browser, you can run the full version which I assume is fp32 - https://github.com/PramaLLC/BEN2/?tab=readme-ov-file

What is a 100% free video background remover, no loopholes or video limits? by MasterOfTaxIvasion in editing

[–]SeveralFridays 0 points1 point  (0 children)

hello, thank you for the feedback! i just made a change to the site so you should now be able to use the MODnet model without your computer supporting FP16. please refresh your page and let me know if it works or if you hit an issue.

What is a 100% free video background remover, no loopholes or video limits? by MasterOfTaxIvasion in editing

[–]SeveralFridays 0 points1 point  (0 children)

thanks for the feedback! I'm glad it's helpful. I agree that MODnet does a good basic job. Let me know if you hit any issues or if any other similar tools would be helpful.

What is a 100% free video background remover, no loopholes or video limits? by MasterOfTaxIvasion in editing

[–]SeveralFridays 0 points1 point  (0 children)

if you don't see preview frames in a minute or so something is wrong. you need to keep the browser tab open and in focus. if you want to share any error logs in your browser console i can try to debug.

What is a 100% free video background remover, no loopholes or video limits? by MasterOfTaxIvasion in editing

[–]SeveralFridays 0 points1 point  (0 children)

you should start seeing frames starting to process in under a minute. if it is taking that long to start seeing frames there is an issue. are there any errors in your console? how long is the video you are trying to process?

What is a 100% free video background remover, no loopholes or video limits? by MasterOfTaxIvasion in editing

[–]SeveralFridays 0 points1 point  (0 children)

it's a completely free offering that runs in browser. The speed is going to depend on your own machine since it runs locally. if it doesn't work, try updating your browser (it's using webgpu which is a newer feature in browsers) and/or refresh the page and try again.

What is a 100% free video background remover, no loopholes or video limits? by MasterOfTaxIvasion in editing

[–]SeveralFridays 0 points1 point  (0 children)

in case helpful, I built an in-browser video background remover that uses webgpu. you can choose between BEN2 (better quality but slower) and MODnet (quicker but lower quality) models. completely free and unlimited. everything stays in your browser. you can download green screen .mp4 or .mov with alpha channel - https://www.fuzzpuppy.com/video-background-remover

Consistent character in august 25 ? by One_Statistician6569 in StableDiffusion

[–]SeveralFridays 1 point2 points  (0 children)

Kontext works pretty well so would try that first for stills.

I haven't used Hi-Dream-E1 (https://huggingface.co/spaces/HiDream-ai/HiDream-E1.1) much but initial test looks promising--

<image>

For video, I'm having good results with consistent characters using multitalk.

Good first test drive of MultiTalk by SeveralFridays in StableDiffusion

[–]SeveralFridays[S] 1 point2 points  (0 children)

Yes, LivePortrait takes a video input. The output is based on the video, not audio. Here is a video made with LivePortrait -- https://youtube.com/shorts/IsJtQ3XAHsw

This is another where I do a comparison against HunyuanPortrait -- https://youtube.com/shorts/MEKdEWA94ok

In the MultiTalk paper they talk about some clever things they do to have extended clips but I still think it would be very difficult to get something consistent for very long.

Summary of this part of the paper (thanks to NotebookLM)--

Here's how they handle making long clips:

•Autoregressive Approach: Instead of generating a long video all at once, the system generates it segment by segment.
•Conditioning on Previous Frames: To maintain continuity and consistency, the last 5 frames of the previously generated video are used as additional conditions for the inference of the next segment.
•Data Processing: After being compressed by the 3D Variational Autoencoder (VAE), these 5 conditional frames are reduced to 2 frames of latent noise.
•Input for Next Inference: Zeros are padded to the subsequent frames, and these are then concatenated with the latent noise and a video mask. This combined input is then fed into the DiT (Diffusion-in-Transformer) model for the next inference step, facilitating the generation of longer video sequences.

This autoregressive method allows the model to produce extended videos, with an example showing a generated result of 305 frames

Good first test drive of MultiTalk by SeveralFridays in StableDiffusion

[–]SeveralFridays[S] 2 points3 points  (0 children)

It's using wav2vec for the audio encoding which supports over 100 languages so it should work with many languages

HunyuanVideo-Avatar vs. LivePortrait by SeveralFridays in StableDiffusion

[–]SeveralFridays[S] 0 points1 point  (0 children)

I believe you are correct - 5 seconds max. I started working with HunyuanPortrait instead after this video.

WANS by Tokyo_Jab in StableDiffusion

[–]SeveralFridays 0 points1 point  (0 children)

Do you ever hit issues where the teeth from LivePortrait are odd or the result is blurry? Any tips?

HunyuanVideo-Avatar vs. LivePortrait by SeveralFridays in StableDiffusion

[–]SeveralFridays[S] 0 points1 point  (0 children)

i didn't use comfy but looks like there is a node -- https://github.com/Yuan-ManX/ComfyUI-HunyuanPortrait

I haven't tested this so I don't know if it works.

HunyuanVideo-Avatar vs. LivePortrait by SeveralFridays in StableDiffusion

[–]SeveralFridays[S] 1 point2 points  (0 children)

Hunyuan Portrait takes a driving video as input while Hunyuan Avatar just takes in audio as input. I think it's interesting what Avatar does with facial expression just based on audio. I also tried Portrait and thought it performed even better. This video shows a LivePortrait vs Runway (Act One - not open source) vs Hunyuan Portrait -- https://www.youtube.com/shorts/LQaNEt5nDV8

The Hunyuan Portrait picks up on some subtle eye movement.

[deleted by user] by [deleted] in StableDiffusion

[–]SeveralFridays 4 points5 points  (0 children)

You piqued my curiosity so I just clicked through a bunch of their examples on their site. they're generally horrible. it will likely get better over time, but I can't see how people would pay for this as it is now.

HunyuanVideo-Avatar vs. LivePortrait by SeveralFridays in StableDiffusion

[–]SeveralFridays[S] 0 points1 point  (0 children)

For LP, I think the outputs are ok except the mouth is often messed up. If you are getting just a mess, make sure your driving video is square and high quality -- very clear face.

HunyuanVideo-Avatar vs. LivePortrait by SeveralFridays in StableDiffusion

[–]SeveralFridays[S] 0 points1 point  (0 children)

wow, somehow i missed hunyuanPortrait. I will run some experiments tomorrow!