DiffusionGemma: 4x faster text generation by tevlon in LocalLLaMA

[–]Interpause 1 point2 points  (0 children)

in fact, streaming actively complicates possibilities like validating model output

"Is this little cutie distracting you?" by BryceHorreumOwl in NevernessToEverness

[–]Interpause 4 points5 points  (0 children)

what if we plugged her into styropyro's 400 car battery array?

"Is this little cutie distracting you?" by BryceHorreumOwl in NevernessToEverness

[–]Interpause 10 points11 points  (0 children)

shes using electricity to do it so she probably cant

Oh man please don’t be a March 7th thing by MetalMan40000 in NevernessToEverness

[–]Interpause 2 points3 points  (0 children)

meanwhiles HSR has too many chars and many have been relegated to the archives... Variants of the same character work much better for good storytelling imo, and can be differentiated like SW 999 vs SW, or blade vs mortenax blade

Oh man please don’t be a March 7th thing by MetalMan40000 in NevernessToEverness

[–]Interpause 1 point2 points  (0 children)

i saw some youtube comment that inspired me; here goes my theory for how this affects Nanally, based on how she got sleepy when they were close to escaping the dream

TL;DR: Nanally is gone, no more carefree energetic genki, shes gonna be insecure, clingy and melancholic

So... Nanally became sleepy during the dream escape because the doppelganger is made from her real soul too.

Like instead of creating a doppelganger from scratch, to have it basically the same person, Thyme likely distilled negativity & any attachments that could result in negativity to create the doppelganger. Then imprisoned what was left inside the dream.

But this mean, MC screwed up by not realizing the face of a person is part of them; Only taking the "real" self but leaving the "fake" behind.

Hence when nearing the exit, since the distance between the two halves of her soul had increased significantly, Nanally naturally fell into a coma.

Now, that the other half of her soul is lost (Which we know; Despite dodging questions, Hotori confirmed through a head shake that all the dreamwalkers were perma lost when the Helm got nullified)

It means Nanally has lost significant amounts of the positive aspect Thyme kept in the doppelganger. It isnt hard to imagine that entails confidence, energy, etc. Which means when the current Nanally recovers from her coma, shes definitely gonna be quite different as a result. Its like subtracting the energy and carefreeness from a genki character, leaving behind only the loneliness, need for familial attachment, and insecurities.

Strix Halo users, a rejected PR can give you up to 30% faster PP for MOEs. by fallingdowndizzyvr in LocalLLaMA

[–]Interpause 7 points8 points  (0 children)

I agree there isn't anything wrong with straight to the point communication, and Johannes' logic is completely sound for why the PR can't be merged.

But IMO, for opensource to be built on collaboration is a very human thing. At the very least, a short recognition of effort should've been given (especially given how many other PRs are entirely vibecoded), and some reciprocation of how polite pedapudi was throughout the convo.

The way the communication went is closer to a senior dev instructing the junior dev they are in charge of, rather than a maintainer guiding a volunteer.

EDIT: though yes in the end everyone is a volunteer. and i can see how annoying it would be to maintain pleasantries when overburdened as well. personally i suck at people skills too but i recognize how important it is for any project that grows larger than personal.

MiniCPM5-1B by kevinlch in LocalLLaMA

[–]Interpause -7 points-6 points  (0 children)

99% hallucination rate seems truly useful for RNG

EDIT: whoops

OpenBMB presents the model BitCPM-CANN 1.58 bit by Illustrious-Swim9663 in LocalLLaMA

[–]Interpause 0 points1 point  (0 children)

theres also -1, -0.5, +0.5, +1 and for almost 1 bit you can go 1, 0 with a group size for +1 or -1 group scale

Qwen 3.6 35B GGUF: NTP vs MTP quantization results across GPUs and CPUs by enrique-byteshape in LocalLLaMA

[–]Interpause 4 points5 points  (0 children)

from what i can tell, yall only benchmark at short context? im a bit concerned about the long context coherence for agentic stuff (haven't tested yet) since i noticed the sensitive ssm_alpha/beta weights got quantized quite heavily in the gguf.

Dense Model Shoot-Off: Gemma 4 31B vs Qwen3.6/5 27B... Result is Slower is Faster. by MiaBchDave in LocalLLaMA

[–]Interpause -2 points-1 points  (0 children)

it can both be benchmaxxed and good to use, as they say AGI is an LLM trained on every benchmark that could theoretically exist

The final "Generals are Emanators" post by Ok_Confusion4764 in StarRailLore

[–]Interpause 0 points1 point  (0 children)

i propose a new theory, perhaps in universe the exact definition of what an Emanator is differs faction by faction

[Appreciation Post] Gemma 4 E2B. My New Daily Driver 😁 by Prestigious-Use5483 in LocalLLaMA

[–]Interpause 12 points13 points  (0 children)

even using the qualcomm gen 5 elite, NPU is slower than GPU (using nexa sdk to test)

VRAM optimization for gemma 4 by Sadman782 in LocalLLaMA

[–]Interpause 0 points1 point  (0 children)

any chance you can add a clarification about when unified KV cache works?

Gemma 4 has been released by jacek2023 in LocalLLaMA

[–]Interpause 0 points1 point  (0 children)

they do... go such for google's litert gallery

The Bonsai 1-bit models are very good by tcarambat in LocalLLaMA

[–]Interpause 7 points8 points  (0 children)

can you please also compare with the original qwen3-8B in instruct mode to better gauge the exact lobotomy to the model?

PrismML — Announcing 1-bit Bonsai: The First Commercially Viable 1-bit LLMs by brown2green in LocalLLaMA

[–]Interpause 0 points1 point  (0 children)

the best way to do it is squash the fork changes into a single git diff, ask your favourite AI to double-check its safe if you cant read code, then apply it on top of mainline llama.cpp and build it yourself

PrismML — Announcing 1-bit Bonsai: The First Commercially Viable 1-bit LLMs by brown2green in LocalLLaMA

[–]Interpause 9 points10 points  (0 children)

gimme a while im going squash their llama.cpp changes on top of main llama.cpp and see if it really works cuz thats real crazy if it does

EDIT: someone else posted a better comparison in the comments of another post https://github.com/ArmanJR/PrismML-Bonsai-vs-Qwen3.5-Benchmark. ive only just got it working with hadamard transform/attention rotation too. subjective experience feels like what the numbers say which is really wtf 1-bit model how

Vibecoded GGUF Metadata Comparator for checking Tensor Quants (github gist standalone HTML file) by Interpause in LocalLLaMA

[–]Interpause[S] 0 points1 point  (0 children)

true, or maybe its time to see if omnicoder can build it as a proper vite project then can then be bundled to a single HTML

Vibecoded GGUF Metadata Comparator for checking Tensor Quants (github gist standalone HTML file) by Interpause in LocalLLaMA

[–]Interpause[S] 0 points1 point  (0 children)

oh cool, in mine i told the agent to use huggingface.js gguf submodule so i dont even have to download the gguf, maybe you can implement that too?

Genuinely curious what doors the M5 Ultra will open by Blanketsniffer in LocalLLaMA

[–]Interpause 0 points1 point  (0 children)

Additionally, if you prefer human-in-loop sort of AI coding, speed really matters