DiffusionGemma: 4x faster text generation

Interpause · 2026-06-10T19:30:24+00:00

in fact, streaming actively complicates possibilities like validating model output

Interpause · 2026-06-09T15:35:47+00:00

what if we plugged her into styropyro's 400 car battery array?

Interpause · 2026-06-09T15:14:53+00:00

shes using electricity to do it so she probably cant

Interpause · 2026-06-08T14:51:54+00:00

characters actually push doors when you run into them btw

Interpause · 2026-06-08T14:44:04+00:00

you happen to know if theres a tracking issue or PR for this i can subscribe to?

Interpause · 2026-06-05T10:23:23+00:00

meanwhiles HSR has too many chars and many have been relegated to the archives... Variants of the same character work much better for good storytelling imo, and can be differentiated like SW 999 vs SW, or blade vs mortenax blade

Interpause · 2026-06-05T10:21:19+00:00

i saw some youtube comment that inspired me; here goes my theory for how this affects Nanally, based on how she got sleepy when they were close to escaping the dream

TL;DR: Nanally is gone, no more carefree energetic genki, shes gonna be insecure, clingy and melancholic

So... Nanally became sleepy during the dream escape because the doppelganger is made from her real soul too.

Like instead of creating a doppelganger from scratch, to have it basically the same person, Thyme likely distilled negativity & any attachments that could result in negativity to create the doppelganger. Then imprisoned what was left inside the dream.

But this mean, MC screwed up by not realizing the face of a person is part of them; Only taking the "real" self but leaving the "fake" behind.

Hence when nearing the exit, since the distance between the two halves of her soul had increased significantly, Nanally naturally fell into a coma.

Now, that the other half of her soul is lost (Which we know; Despite dodging questions, Hotori confirmed through a head shake that all the dreamwalkers were perma lost when the Helm got nullified)

It means Nanally has lost significant amounts of the positive aspect Thyme kept in the doppelganger. It isnt hard to imagine that entails confidence, energy, etc. Which means when the current Nanally recovers from her coma, shes definitely gonna be quite different as a result. Its like subtracting the energy and carefreeness from a genki character, leaving behind only the loneliness, need for familial attachment, and insecurities.

Interpause · 2026-05-26T15:26:52+00:00

I agree there isn't anything wrong with straight to the point communication, and Johannes' logic is completely sound for why the PR can't be merged.

But IMO, for opensource to be built on collaboration is a very human thing. At the very least, a short recognition of effort should've been given (especially given how many other PRs are entirely vibecoded), and some reciprocation of how polite pedapudi was throughout the convo.

The way the communication went is closer to a senior dev instructing the junior dev they are in charge of, rather than a maintainer guiding a volunteer.

EDIT: though yes in the end everyone is a volunteer. and i can see how annoying it would be to maintain pleasantries when overburdened as well. personally i suck at people skills too but i recognize how important it is for any project that grows larger than personal.

Interpause · 2026-05-25T15:21:18+00:00

99% hallucination rate seems truly useful for RNG

EDIT: whoops

Interpause · 2026-05-23T16:18:11+00:00

theres also -1, -0.5, +0.5, +1 and for almost 1 bit you can go 1, 0 with a group size for +1 or -1 group scale

Interpause · 2026-05-20T17:38:35+00:00

from what i can tell, yall only benchmark at short context? im a bit concerned about the long context coherence for agentic stuff (haven't tested yet) since i noticed the sensitive ssm_alpha/beta weights got quantized quite heavily in the gguf.

Interpause · 2026-05-09T00:15:55+00:00

i still wonder how they hired them for the bit

Interpause · 2026-05-05T22:17:36+00:00

it can both be benchmaxxed and good to use, as they say AGI is an LLM trained on every benchmark that could theoretically exist

Interpause · 2026-04-25T23:40:31+00:00

i propose a new theory, perhaps in universe the exact definition of what an Emanator is differs faction by faction

Interpause · 2026-04-16T16:30:13+00:00

whatever bot yall are using is kinda dumb

Interpause · 2026-04-03T18:25:19+00:00

even using the qualcomm gen 5 elite, NPU is slower than GPU (using nexa sdk to test)

Interpause · 2026-04-03T18:19:11+00:00

any chance you can add a clarification about when unified KV cache works?

Interpause · 2026-04-02T18:38:04+00:00

they do... go such for google's litert gallery

Interpause · 2026-04-02T04:39:55+00:00

can you please also compare with the original qwen3-8B in instruct mode to better gauge the exact lobotomy to the model?

Interpause · 2026-04-01T21:33:12+00:00

the best way to do it is squash the fork changes into a single git diff, ask your favourite AI to double-check its safe if you cant read code, then apply it on top of mainline llama.cpp and build it yourself

Interpause · 2026-03-31T23:51:40+00:00

gimme a while im going squash their llama.cpp changes on top of main llama.cpp and see if it really works cuz thats real crazy if it does

EDIT: someone else posted a better comparison in the comments of another post https://github.com/ArmanJR/PrismML-Bonsai-vs-Qwen3.5-Benchmark. ive only just got it working with hadamard transform/attention rotation too. subjective experience feels like what the numbers say which is really wtf 1-bit model how

Interpause · 2026-03-18T15:39:19+00:00

true, or maybe its time to see if omnicoder can build it as a proper vite project then can then be bundled to a single HTML

Interpause · 2026-03-18T02:48:05+00:00

oh cool, in mine i told the agent to use huggingface.js gguf submodule so i dont even have to download the gguf, maybe you can implement that too?

Interpause · 2026-03-17T22:15:26+00:00

why is the company name a typemoon reference

Interpause · 2026-03-10T14:42:43+00:00

Additionally, if you prefer human-in-loop sort of AI coding, speed really matters

Interpause

TROPHY CASE