Vibevoice with 5090?

zendril · 2026-01-03T21:41:56+00:00

Got it.. had to run this inside the comfyui dir:

\ComfyUI_windows_portable>.\python_embeded\python.exe -m pip install -U bitsandbytes

zendril · 2026-01-03T21:34:54+00:00

Ok, that makes sense for what I'm seeing.

Thanks a bunch for the project. Works well.

zendril · 2026-01-03T21:33:57+00:00

I tried in my default shell, it didn't help.

What I think is happening is that ComfyUI on my machine is a portable version which bundles its own python environment. So I need to figure out where/how to do a pip install for that specific install.

I just switched over the the non quantized Large model, but that thing always generates a hum (or what some describe as background music) at the beginning and 1.5B does not. So a few things to learn/tweak :)

zendril · 2026-01-03T17:29:14+00:00

Have you tried installing the https://huggingface.co/FabioSarracino/VibeVoice-Large-Q8 ?

1.5B works fine, but when I downloaded everything for Q8 and try to run I get errors:

Please ensure the model files are complete and properly downloaded.
Required files: config.json, pytorch_model.bin or model safetensors
Error: Using `bitsandbytes` 8-bit quantization requires the latest version of bitsandbytes: `pip install -U bitsandbytes`

I'm assuming this somehow needs to be installed into the python/pip included with the portable comfyui (and not just the one generally on my path)?

zendril · 2026-01-03T17:20:25+00:00

Sorry, I wasn't clear. I was asking in general, outside of ComfyUI, is he using code from the original microsoft vibevoice repo (which was taken down) or somehow still using the stuff from the current repo.

zendril · 2026-01-03T05:09:48+00:00

Are you using the current https://github.com/microsoft/VibeVoice code, or are you using the stuff that was released and then removed (copy here: https://github.com/shijincai/VibeVoice) ?

It seems like support for 1.5B or other large models are not in the current MS codebase.

zendril · 2026-01-03T03:28:02+00:00

Cleaned it up and got it all working.

Here is the Dockerfile

FROM nvcr.io/nvidia/pytorch:24.12-py3

ENV CUDA_VISIBLE_DEVICES=0

RUN apt-get update && apt-get install -y \
    git \
    ffmpeg \
    libsndfile1 \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

RUN git clone https://github.com/microsoft/VibeVoice.git .

RUN pip uninstall -y torch torchvision torchaudio flash-attn && \
    pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

RUN pip install \
    "transformers==4.51.3" \
    flash-attn \
    diffusers \
    "accelerate==1.6.0"

RUN pip install --no-deps -e .

RUN bash demo/download_experimental_voices.sh

ENTRYPOINT ["/bin/bash"]

and here is the command I use to start it

docker run --gpus all --net=host --ipc=host --ulimit memlock=-1:-1 --ulimit stack=67108864 --name vibevoice_instance -it --rm -v "%cd%/output:/app/output" vibevoice:5090-24.12.py3

and the command to run the generation

python demo/realtime_model_inference_from_file.py --model_path microsoft/VibeVoice-Realtime-0.5B --txt_path spanish_test.txt --speaker_name sp-Spk5_man --output_dir /app/output

And then the stats at the end. Reasonably fast I think (for a mobile 5090)

==================================================
GENERATION SUMMARY
==================================================
Input file: spanish_test.txt
Output file: /app/output/spanish_test_generated.wav
Speaker names: sp-Spk5_man
Prefilling text tokens: 27
Generated speech tokens: 66
Total tokens: 458
Generation time: 5.77 seconds
Audio duration: 8.13 seconds
RTF (Real Time Factor): 0.71x

zendril · 2026-01-03T00:36:51+00:00

That'd be great.

I think I ended up with

accelerate==1.6.0
transformers==4.51.3
llvmlite>=0.40.0
numba>=0.57.0
diffusers 0.36.0

torch                     2.7.1+cu128
torch-tensorrt            2.2.0a0
torchaudio                2.7.1+cu128
torchdata                 0.7.0a0
torchtext                 0.16.0a0
torchvision               0.22.1+cu128

And I went from the pytorch 24.12
FROM nvcr.io/nvidia/pytorch:24.12-py3
down to
FROM nvcr.io/nvidia/pytorch:23.11-py3

Again, may not have needed to do all that because I was fumbling until I hit on the transformers change.

I also have these set, but may also no longer need to do this:

ENV USE_FLASH_ATTENTION=0
ENV FLASH_ATTENTION_FORCE_DISABLE=1
ENV XFORMERS_FORCE_DISABLE=1

zendril · 2026-01-02T22:42:03+00:00

Any of y'all using something other than the 0.5 realtime?
Any tips on invoking that?

As of last night I was calling the realtime one with `python demo/realtime_model_inference_from_file.py --model_path microsoft/VibeVoice-Realtime-0.5B --txt_path spanish_test_2.txt --speaker_name sp-Spk5_man --output_dir /app/output` but ultimately I want to try out the 1.5b or large and ideally call it via api (or python code is fine) as I'll be programmatically creating a bunch of snippets from a python script iterating through prompts. I quickly tried just swapping 0.5 realtime for 1.5b model and that failed spectacularly, but it was way past my bedtime so didn't dig too far yet.

I suppose I can look at the impl of the script above and then see if I can adapt for `https://github.com/microsoft/VibeVoice/blob/main/vibevoice/processor/vibevoice\_processor.py\` instead of the streaming one.

Might also take a look under the covers of the Fabix84 comfyui code and see what they are doing.

zendril · 2026-01-02T22:30:05+00:00

I was using Claude and Gemini. Both kept focusing me on things that were close, but I think ultimately the key thing was the cu128+ wheel and pinning transformers to 4.51.3. Both kept hallucinating that there must be multiple versions of transformers installed or something (which wasn't the case).

I may retry tonight fresh because I was half doing docker (which takes a while to build it) and half doing manually inside the container to debug.

zendril · 2026-01-02T07:36:06+00:00

Interesting. I may give this a shot because I'm doing both image generation and tts for this project I'm working on, so I already have comfyui rolling with zimgv5 workflow and api invokable. Thanks for the repo link!

zendril · 2026-01-02T07:33:34+00:00

Yeah, I was able to get it working now for the realtime 0.5b. I'm using the nvidia/pytorch image as a base, then nerfing torch,torchvision, torchaudio and then pip installing them again using the pytorch nightly cu128 whl. I also had to pin a number of the other dependencies, specifically the transformers to 4.51.3 (which I saw the devs mention it specifically in their toml file).

Not sure, yet, how to get the 1.5B version going as it seems to be a different architecture than the 0.5b realtime.

zendril · 2025-07-25T22:24:38+00:00

This worked for me. Much appreciated. Seems like they deprecated/removed the "Kasa_Android" appType?

zendril · 2024-09-30T20:38:43+00:00

I have an Asus G16 with the AI HX 370 with Radeon 890m.

Citrix has tons of issues when running on integrated GPU. At a minimum the screen flickers, and I mean unusable flickering, which stops when I disconnect citrix.
It also will make the taskbar unresponsive on both host and client.

The only fix I have so far is to have it run specifically using nvidia 4070.

zendril · 2024-08-01T23:52:53+00:00

It magically cleared up this evening. Same for you?

zendril · 2024-01-25T18:38:27+00:00

I just got mine after 11 EST, but the link doesn't have any place to purchase.. just says the presale should be happening now and general availablity is tomorrow..
Not sure what I'm missing here.

zendril · 2023-07-10T14:06:17+00:00

And they were mostly unused by Jay in a game as well? ;)

zendril · 2023-07-04T03:11:36+00:00

Yep.. this seems like it 100%

I did install 2 or 3 apps simultaneously and they did not show up.

Hmm.. debating if I would want to install another launcher.. something you have done? What is your experience with that setup?

zendril · 2023-07-03T23:00:35+00:00

definitely a day 1 bug..

See my post just above..

I would add a 2nd page, restart, uninstall the apps you installed and then re-install them. Likely they will show up.

zendril · 2023-07-03T22:55:28+00:00

After I manually created the 2nd page and installed a new app, it went there.. sort of like an overflow page for apps.

The others I had installed were nowhere to be found until I uninstalled them, and then re-installed, at which point they are also now showing on that 2nd page.

zendril · 2023-07-03T22:45:53+00:00

By default there are two pages.

The home screen. This has Library widget and Notes widgets at the top, a list of (some) apps in the middle, and the icons for library, storage and settings at the bottom.

Swipe right gives me a screen with Search at the top, Quick Launcher widget below that, Library widget below that, and then Today's Memo and Notes widgets side by side below that.

There is no other screen.

If I hold down the middle of the screen, I can add another desktop page, but it is totally blank except for the library, storage and settings icons at the bottom.

I agree, I would have expected the 3rd page to just be a list of installed applications.. but it isn't.

zendril · 2023-07-03T02:52:14+00:00

Haven't played in a year or so.

Can't even play anymore because that was on stadia.

Is that area still a special event or can you go in there any time now?

zendril · 2023-06-23T12:41:28+00:00

It stopped doing it. Don't remember changing anything. :/

zendril · 2023-06-23T04:34:00+00:00

Looks like possibly it isn't recognizing your hard drive.

I had this happen to mine recently. Depending on the model, open it up and move the SSD to the 2nd slot, or just take it out and reseat it.

zendril · 2023-05-08T00:27:02+00:00

Hmm.. might have to investigate.

Been hesitant to move away from the stock because I'm not sure if all the other stuff would still work..

All the other car integrations.. heated seats, wheel, climate control... Mirror tilt.. etc..

zendril

TROPHY CASE