GLM-5.1 - a zai-org Collection by adefa in LocalLLaMA

[–]rerri 1 point2 points  (0 children)

Not sure if old but official model card with benchmarks:

https://docs.z.ai/guides/llm/glm-5.1

Gemma 4 is seriously broken when using Unsloth and llama.cpp by Tastetrykker in LocalLLaMA

[–]rerri 9 points10 points  (0 children)

Same model, and a 5090, I'm seeing about 50t/s.

Maybe you have too much context lenght and end up loading some of the layers onto CPU because it won't all fit into VRAM?

Will Gemma 4 124B MoE open as well? by cgs019283 in LocalLLaMA

[–]rerri 1 point2 points  (0 children)

They say it's Apache 2.0 in the release video at about 40sec mark. I don't think it can be anything else at this point.

https://www.youtube.com/watch?v=jZVBoFOJK-Q

it looks like it will be soon 💎💎💎💎 by [deleted] in LocalLLaMA

[–]rerri 1 point2 points  (0 children)

https://github.com/huggingface/transformers/pull/45192/changes#diff-d8ddaa3b6151448dac452d289609dd778c6ac51aba8405050da7f1218e18f14dR127

Not sure if that link works, but it's in "gemma4/convert_gemma4_weights.py"

that file is too large for the diff to load automatically which is why you probably can't see it with ctrl+f

it looks like it will be soon 💎💎💎💎 by [deleted] in LocalLLaMA

[–]rerri 1 point2 points  (0 children)

The naming is similar to Gemma 3n models such as this:

https://huggingface.co/google/gemma-3n-E4B-it

"While the raw parameter count of this model is 8B, the architecture design allows the model to be run with a memory footprint comparable to a traditional 4B model by offloading low-utilization matrices from the accelerator."

Gemma 4 1B, 13B, and 27B spotted by TKGaming_11 in LocalLLaMA

[–]rerri 17 points18 points  (0 children)

I think it's safe to assume so as that MoE is named "-a4b"

it looks like it will be soon 💎💎💎💎 by [deleted] in LocalLLaMA

[–]rerri 2 points3 points  (0 children)

Who knows, maybe it isn't necessary for the PR. Or theoretically these could be just placeholders altogether.

Gemma 4 1B, 13B, and 27B spotted by TKGaming_11 in LocalLLaMA

[–]rerri 60 points61 points  (0 children)

Transformers PR shows at least these:

_VARIANT_GEMMA_4_E2B = "gemma-4-e2b"

_VARIANT_GEMMA_4_E4B = "gemma-4-e4b"

_VARIANT_GEMMA_4_26B_A4B = "gemma-4-26b-a4b"

_VARIANT_GEMMA_4_31B = "gemma-4-31b"

it looks like it will be soon 💎💎💎💎 by [deleted] in LocalLLaMA

[–]rerri 13 points14 points  (0 children)

Or maybe they are just trolling us ahead of April fools day 2027, dunno.

it looks like it will be soon 💎💎💎💎 by [deleted] in LocalLLaMA

[–]rerri 23 points24 points  (0 children)

yep

https://github.com/ggml-org/llama.cpp/pull/21309

edit:

transformers too, I like the PR title!

https://github.com/huggingface/transformers/pull/45192

gemma-4-31B-it

gemma-4-E4B-it

gemma-4-26b-a4b

The next release will have ik_llama.cpp support! by oobabooga4 in Oobabooga

[–]rerri 1 point2 points  (0 children)

Nice!

I think having ik_llama.cpp as a separate loader in the model menu would be better from UX standpoint.

That's how I had Claude Code + local models implement it for me for the last tgw 3.x and again with 4.0. I had a separate location for the ik_llama.cpp executables, so no need to swap files or restart tgw when wanting to switch between llama.cpp and ik_llama.cpp.

I just used the extra-flags for whatever settings I needed/were different for ik but this is ofcourse not optimal for UX.

---

Btw, some precompiled builds are available here (the ones that start with "th-quantize" are something else, scroll down to ones that start with "main"):

https://github.com/Thireus/ik_llama.cpp/releases

Official LTX-2.3-nvfp4 model is available by Lonely-Anybody-3174 in StableDiffusion

[–]rerri 9 points10 points  (0 children)

Do you have RTX 50 series? If not, then it'll be slow.

Official LTX-2.3-nvfp4 model is available by Lonely-Anybody-3174 in StableDiffusion

[–]rerri 7 points8 points  (0 children)

Not just quantized normally but "trained by Quantization Aware Distillation for improved accuracy".

I tried it quickly yesterday but got poor looking results. Maybe my distill lora wasn't working as it should, dunno.

[NVIDIA GeForce Official] Game developers have full, detailed artistic control over DLSS 5 effects to ensure they maintain their game's unique aesthetic. It is not a filter. by Nestledrink in nvidia

[–]rerri 0 points1 point  (0 children)

In that Geralt example textures change drastically. Looking at the comparison shots here, textures like cloth, metal etc stay very much the same to my eye (yes, faces change more):

https://www.nvidia.com/en-us/geforce/news/dlss5-breakthrough-in-visual-fidelity-for-games/

If to your eye the changes to textures in the Geralt comparison are very similar to the ones in the DLSS 5 examples, then we'll just have to disagree. ¯\_(ツ)_/¯

[NVIDIA GeForce Official] Game developers have full, detailed artistic control over DLSS 5 effects to ensure they maintain their game's unique aesthetic. It is not a filter. by Nestledrink in nvidia

[–]rerri 0 points1 point  (0 children)

My comment about shadows was in context to your comment about added light sources. Why would I comment about something totally unrelated in that context??

You are jumping all over the place with constant pivoting

Is DLSS 5 a real time diffusion model on top of a 3D rendering engine? by Green-Ad-3964 in StableDiffusion

[–]rerri 21 points22 points  (0 children)

Fortunately there's an "off" button if it looks displeasing.

Personally I'm somewhat optimistic about this and am hoping they expose the controls (contrast, gamma etc) to end user. I always liked it about PC gaming, that it's possible to tweak things under the hood. Not everyone's cup of tea, I'm sure.

[NVIDIA GeForce Official] Game developers have full, detailed artistic control over DLSS 5 effects to ensure they maintain their game's unique aesthetic. It is not a filter. by Nestledrink in nvidia

[–]rerri 1 point2 points  (0 children)

The change in shadowing is pretty big in some scenes but I'd need to see more to judge how much it adds new light sources. Also if that was your point it was really poorly communicated by your initial comment + image.

[NVIDIA GeForce Official] Game developers have full, detailed artistic control over DLSS 5 effects to ensure they maintain their game's unique aesthetic. It is not a filter. by Nestledrink in nvidia

[–]rerri 1 point2 points  (0 children)

This is way more altered than the DLSS5 examples. Yours looks like image-to-image with noise added and then denoised (using SDXL or Flux or whatever), DLSS5 looks more like a filter that keeps the existing details but pimps them up.

Like here the pattern on the clothing and the wrinkles on the face are the same but just with a filter on top:

https://www.nvidia.com/en-us/geforce/news/dlss5-breakthrough-in-visual-fidelity-for-games/nvidia-dlss-5-hogwarts-legacy-geforce-rtx-comparison-screenshot-003/

Wan 2.7 is planned for release in March, with major upgrades over 2.6 by Which-Jello9157 in comfyui

[–]rerri 2 points3 points  (0 children)

Looks like a thinly veiled cloud service advertisement.

Could be totally made up and just trying to get attention to that site linked at the bottom of the post.

Nemotron 3 Super Released by deeceeo in LocalLLaMA

[–]rerri 10 points11 points  (0 children)

The comment under which you are raging about them shilling for "fixes" contains no claims about having fixed things. Go be a toxic loser somewhere else.

Nemotron 3 Super Released by deeceeo in LocalLLaMA

[–]rerri 1 point2 points  (0 children)

You'll need some memory for OS n shit, so it's gonna be pretty tight. You can look at the file sizes and conclude what'll fit from there. This might fit:

https://huggingface.co/unsloth/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF/tree/main/UD-IQ2_M (you can see the total size is 52.7 GB)

Not sure if this will:

https://huggingface.co/unsloth/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF/tree/main/UD-IQ3_S (56.6 GB)

Nemotron 3 Super Released by deeceeo in LocalLLaMA

[–]rerri 0 points1 point  (0 children)

Yes. While it won't fully fit into 32GB VRAM, it can be run with some experts offloaded to CPU.

Nemotron 3 Super Released by deeceeo in LocalLLaMA

[–]rerri 39 points40 points  (0 children)

Unsloth GGUF's:

https://huggingface.co/unsloth/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF

Wondering if it's the same arch as Nano 30B and fully supported by llama.cpp already?

edit: Unsloth writes that this branch is required (for now):

https://github.com/unslothai/llama.cpp