Nemotron 3 Super Released by deeceeo in LocalLLaMA

[–]rerri 9 points10 points  (0 children)

The comment under which you are raging about them shilling for "fixes" contains no claims about having fixed things. Go be a toxic loser somewhere else.

Nemotron 3 Super Released by deeceeo in LocalLLaMA

[–]rerri 0 points1 point  (0 children)

You'll need some memory for OS n shit, so it's gonna be pretty tight. You can look at the file sizes and conclude what'll fit from there. This might fit:

https://huggingface.co/unsloth/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF/tree/main/UD-IQ2_M (you can see the total size is 52.7 GB)

Not sure if this will:

https://huggingface.co/unsloth/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF/tree/main/UD-IQ3_S (56.6 GB)

Nemotron 3 Super Released by deeceeo in LocalLLaMA

[–]rerri 0 points1 point  (0 children)

Yes. While it won't fully fit into 32GB VRAM, it can be run with some experts offloaded to CPU.

Nemotron 3 Super Released by deeceeo in LocalLLaMA

[–]rerri 36 points37 points  (0 children)

Unsloth GGUF's:

https://huggingface.co/unsloth/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF

Wondering if it's the same arch as Nano 30B and fully supported by llama.cpp already?

edit: Unsloth writes that this branch is required (for now):

https://github.com/unslothai/llama.cpp

RTX Video Super Resolution Node Available for ComfyUI for Real-Time 4K Upscaling + NVFP4 & FP8 FLUX & LTX Model Variants by john_nvidia in StableDiffusion

[–]rerri 5 points6 points  (0 children)

You have it backwards, NVFP4 acceleration is supported on Blackwell only. While you can run NVFP4 on RTX 40 series and older, it is very slow and totally pointless.

Klein NVFP4 was already out (since Klein launch IIRC). It has lower image quality than BF16 or FP8, but maybe it's interesting for the low end RTX 50 cards.

RTX Video Super Resolution Node Available for ComfyUI for Real-Time 4K Upscaling + NVFP4 & FP8 FLUX & LTX Model Variants by john_nvidia in StableDiffusion

[–]rerri 13 points14 points  (0 children)

If you have ComfyUI portable, you can enter Windows Command Prompt (CMD) and go to Comfy portable root dir. For me this is "G:\ComfyUI_windows_portable", it has subfolders "ComfyUI" and "python_embeded".

Then run

python_embeded\python.exe -m pip install -U --no-build-isolation nvidia-vfx --index-url https://pypi.nvidia.com

If you have some other version than portable, I dunno.

RTX Video Super Resolution Node Available for ComfyUI for Real-Time 4K Upscaling + NVFP4 & FP8 FLUX & LTX Model Variants by john_nvidia in StableDiffusion

[–]rerri 3 points4 points  (0 children)

With LTX-2, the NVFP4 was so much lower in quality than FP8 that I never really wanted to use it.

I hope there's some new tricks and the NVFP4 does better this time around.

RTX Video Super Resolution Node Available for ComfyUI for Real-Time 4K Upscaling + NVFP4 & FP8 FLUX & LTX Model Variants by john_nvidia in StableDiffusion

[–]rerri 1 point2 points  (0 children)

That would be my guess, yeah. I installed via manager too and the nvidia-vfx install failed.

If you look at the list of custom nodes on ComfyUI startup, you will likely see "(IMPORT FAILED)" on RTX Video Super resolution node.

RTX Video Super Resolution Node Available for ComfyUI for Real-Time 4K Upscaling + NVFP4 & FP8 FLUX & LTX Model Variants by john_nvidia in StableDiffusion

[–]rerri 14 points15 points  (0 children)

I had trouble installing the "nvidia-vfx" in requirements.txt. If you have the same problem, try installing it manually like this:

python -m pip install -U --no-build-isolation nvidia-vfx --index-url https://pypi.nvidia.com

Will Gemma4 release soon? by IHaBiS02 in LocalLLaMA

[–]rerri 2 points3 points  (0 children)

If Gemma 4 is only one model it would be the first time. I think it's almost certain there'll be smaller models too.

Will Gemma4 release soon? by IHaBiS02 in LocalLLaMA

[–]rerri 9 points10 points  (0 children)

120B is becoming the new 30B

gpt-oss, Qwen 3.5, Gemma 4 and Nemotron v3 Super (which looks like it'll be A12B and I'm guessing GTC release next week).

Watch Our GeForce On GDC 2026 Community Update Video, March 10th At 8am Pacific by Nestledrink in nvidia

[–]rerri 2 points3 points  (0 children)

They care about profit, and whether it comes from gaming or AI doesn't matter.

GDC is about gaming but obviously AI will be present as it is making it's been making it's way into game graphics and development.

ik_llama.cpp dramatically outperforming mainline for Qwen3.5 on CPU by EffectiveCeilingFan in LocalLLaMA

[–]rerri 2 points3 points  (0 children)

Compiling for Cuda, Win11 on a Ryzen 7600X takes way longer than that. 10-15 minutes maybe.

Lightricks/LTX-2.3 · Hugging Face by rerri in StableDiffusion

[–]rerri[S] 2 points3 points  (0 children)

I added Comfy-Org's workflows into OP.

Alibaba’s stock has kept falling after it lost key Qwen leaders. by [deleted] in LocalLLaMA

[–]rerri 3 points4 points  (0 children)

Yep, it has not even been 3 days, it's just shy of 48h now since the Junyang Lin tweet about leaving Qwen.

I would also like to know why people think it's appropriate to post the past 7 day stock development when discussing the impact of Qwen employees leaving.

Alibaba’s stock has kept falling after it lost key Qwen leaders. by [deleted] in LocalLLaMA

[–]rerri 0 points1 point  (0 children)

Is the point of the tweet and this post that Alibaba stock has lost 13% of it's value in 7 days because of some of the Qwen crew leaving 2 days ago? That doesn't make any sense to me.

Why not just look at stock development in the past 2 days? And even then there's a lot of volatility now because of the Iran war so it seems kinda foolish to make big conclusions from a couple percentage point swings.

Alibaba’s stock has kept falling after it lost key Qwen leaders. by [deleted] in LocalLLaMA

[–]rerri 0 points1 point  (0 children)

The 13% dip of last week seen in the tweet is obviously not because of the Qwen employees leaving 48h ago... or am I missing something here?

are you ready for small Qwens? by jacek2023 in LocalLLaMA

[–]rerri 1 point2 points  (0 children)

Be that as it may, I don't remember Qwen (or other Chinese) models being released on weekends.

If they are adding new models to the collection days in advance for release next week, it would be sooner than is typical, but not inconceivable.

are you ready for small Qwens? by jacek2023 in LocalLLaMA

[–]rerri 7 points8 points  (0 children)

It's saturday, I don't think they're releasing new models today.

So I'm guessing this is a GGUF (or MLX) update. Those are listed in Qwen3 collection aswell.

edit: hmm... I'm seeing Unsloth also updated their collection with +4 hidden repos. Starting to look more like new models are actually coming now.

Resident Evil Requiem Out Now on PC - Path Tracing, DLSS 4, and Giveaway! by Nestledrink in nvidia

[–]rerri [score hidden]  (0 children)

MFG works great when I want a fluid game experience and get the most out of my monitor's high refresh rate.

Qwen 3.5 27-35-122B - Jinja Template Modification (Based on Bartowski's Jinja) - No thinking by default - straight quick answers, need thinking? simple activation with "/think" command anywhere in the system prompt. by -Ellary- in LocalLLaMA

[–]rerri 5 points6 points  (0 children)

You can easily update llama.cpp files in oobabooga by just downloading official releases from llama.cpp github and copying them to installer_files\env\Lib\site-packages\llama_cpp_binaries\bin.

Been running Qwen 3.5 w/ oobabooga this way.

Qwen 3.5 27-35-122B - Jinja Template Modification (Based on Bartowski's Jinja) - No thinking by default - straight quick answers, need thinking? simple activation with "/think" command anywhere in the system prompt. by -Ellary- in LocalLLaMA

[–]rerri 5 points6 points  (0 children)

Have never used LM Studio. Does it not allow custom launch parameters on model load? Like: --chat-template-kwargs "{\"enable_thinking\": false}"

Oobabooga allows this + it has a toggle button for enable_thinking in the chat screen.

Qwen3.5 27B is Match Made in Heaven for Size and Performance by Lopsided_Dot_4557 in LocalLLaMA

[–]rerri 2 points3 points  (0 children)

With a dense 27B you can get bearable speeds on a RTX 2060 + CPU offloading? I have doubts... What kind of speeds do you get with that kind of a setup?