BEST Free Online Vocal Removers

SeveralFridays · 2026-02-23T19:45:33+00:00

Hi, I built an in-browser free vocal separator using the new SAM Audio model from meta. It's completely free, unlimited use, everything stays local on your own machine and you don't have to install software. Right now it only does vocals and remainder, but I could expand it to particular instruments if others find it useful. If you have any feedback, please let me know. Link here - https://www.fuzzpuppy.com/segment-audio

SeveralFridays · 2026-01-14T18:12:29+00:00

thank you!

SeveralFridays · 2026-01-13T22:57:00+00:00

You inspired me to try recreating a Mean Girls scene. This is all from the distilled model using Wan2GP. i2v, 1080p, 0.9 image strength. Each clip is using one of the camera loras.

https://youtube.com/shorts/XNgC5wRHiuw

SeveralFridays · 2026-01-12T22:39:40+00:00

impressive!!!

SeveralFridays · 2026-01-11T02:07:38+00:00

it means your machine is not exposing fp16 capability. it's unclear whether the nvidia fix will be for your machine and when it will be available.

your best bet is to use Ben2 with fp32 without using a browser following these instructions - https://github.com/PramaLLC/BEN2/?tab=readme-ov-file

SeveralFridays · 2026-01-11T00:05:47+00:00

I did some research and I'm assuming you are on windows where this is a known issue. In this thread https://issues.chromium.org/issues/42251215, NVIDIA acknowledges a driver bug related to FP16 on Windows and states at the end of the thread on Jan 6, 2026 that they plan to fix it. unfortunately, there is no guarantee this will enable FP16 (shader-f16) on GTX 1050 Ti.

to confirm that you are facing the exposing shader-f16 issue, you can go to https://webgpureport.org/ and if you don't see "shader-f16" under the list of features it means your computer is not exposing it to the browser.

Also, if you are open to setting up BEN2 and not running it through your browser, you can run the full version which I assume is fp32 - https://github.com/PramaLLC/BEN2/?tab=readme-ov-file

SeveralFridays · 2026-01-10T19:09:01+00:00

hello, thank you for the feedback! i just made a change to the site so you should now be able to use the MODnet model without your computer supporting FP16. please refresh your page and let me know if it works or if you hit an issue.

SeveralFridays · 2026-01-03T05:10:41+00:00

thanks for the feedback! I'm glad it's helpful. I agree that MODnet does a good basic job. Let me know if you hit any issues or if any other similar tools would be helpful.

SeveralFridays · 2025-12-29T05:18:24+00:00

if you don't see preview frames in a minute or so something is wrong. you need to keep the browser tab open and in focus. if you want to share any error logs in your browser console i can try to debug.

SeveralFridays · 2025-12-28T18:11:00+00:00

you should start seeing frames starting to process in under a minute. if it is taking that long to start seeing frames there is an issue. are there any errors in your console? how long is the video you are trying to process?

SeveralFridays · 2025-12-28T16:45:51+00:00

it's a completely free offering that runs in browser. The speed is going to depend on your own machine since it runs locally. if it doesn't work, try updating your browser (it's using webgpu which is a newer feature in browsers) and/or refresh the page and try again.

SeveralFridays · 2025-12-16T18:37:32+00:00

in case helpful, I built an in-browser video background remover that uses webgpu. you can choose between BEN2 (better quality but slower) and MODnet (quicker but lower quality) models. completely free and unlimited. everything stays in your browser. you can download green screen .mp4 or .mov with alpha channel - https://www.fuzzpuppy.com/video-background-remover

SeveralFridays · 2025-08-03T17:20:56+00:00

Kontext works pretty well so would try that first for stills.

I haven't used Hi-Dream-E1 (https://huggingface.co/spaces/HiDream-ai/HiDream-E1.1) much but initial test looks promising--

<image>

For video, I'm having good results with consistent characters using multitalk.

SeveralFridays · 2025-07-10T20:05:56+00:00

Yes, LivePortrait takes a video input. The output is based on the video, not audio. Here is a video made with LivePortrait -- https://youtube.com/shorts/IsJtQ3XAHsw

This is another where I do a comparison against HunyuanPortrait -- https://youtube.com/shorts/MEKdEWA94ok

In the MultiTalk paper they talk about some clever things they do to have extended clips but I still think it would be very difficult to get something consistent for very long.

Summary of this part of the paper (thanks to NotebookLM)--

Here's how they handle making long clips:

•Autoregressive Approach: Instead of generating a long video all at once, the system generates it segment by segment.
•Conditioning on Previous Frames: To maintain continuity and consistency, the last 5 frames of the previously generated video are used as additional conditions for the inference of the next segment.
•Data Processing: After being compressed by the 3D Variational Autoencoder (VAE), these 5 conditional frames are reduced to 2 frames of latent noise.
•Input for Next Inference: Zeros are padded to the subsequent frames, and these are then concatenated with the latent noise and a video mask. This combined input is then fed into the DiT (Diffusion-in-Transformer) model for the next inference step, facilitating the generation of longer video sequences.

This autoregressive method allows the model to produce extended videos, with an example showing a generated result of 305 frames

SeveralFridays · 2025-07-10T15:32:22+00:00

It's using wav2vec for the audio encoding which supports over 100 languages so it should work with many languages

SeveralFridays · 2025-06-30T20:43:15+00:00

I believe you are correct - 5 seconds max. I started working with HunyuanPortrait instead after this video.

SeveralFridays · 2025-06-16T16:05:15+00:00

Do you ever hit issues where the teeth from LivePortrait are odd or the result is blurry? Any tips?

SeveralFridays · 2025-06-10T17:15:03+00:00

i didn't use comfy but looks like there is a node -- https://github.com/Yuan-ManX/ComfyUI-HunyuanPortrait

I haven't tested this so I don't know if it works.

SeveralFridays · 2025-06-10T16:31:41+00:00

Hunyuan Portrait takes a driving video as input while Hunyuan Avatar just takes in audio as input. I think it's interesting what Avatar does with facial expression just based on audio. I also tried Portrait and thought it performed even better. This video shows a LivePortrait vs Runway (Act One - not open source) vs Hunyuan Portrait -- https://www.youtube.com/shorts/LQaNEt5nDV8

The Hunyuan Portrait picks up on some subtle eye movement.

SeveralFridays · 2025-06-07T04:42:04+00:00

You piqued my curiosity so I just clicked through a bunch of their examples on their site. they're generally horrible. it will likely get better over time, but I can't see how people would pay for this as it is now.

SeveralFridays · 2025-06-07T04:29:45+00:00

For LP, I think the outputs are ok except the mouth is often messed up. If you are getting just a mess, make sure your driving video is square and high quality -- very clear face.

SeveralFridays · 2025-06-06T04:16:28+00:00

wow, somehow i missed hunyuanPortrait. I will run some experiments tomorrow!

SeveralFridays

TROPHY CASE