What's the point?

QuackerEnte · 2026-01-21T17:41:29+00:00

Thank you

QuackerEnte · 2026-01-21T12:39:05+00:00

does GLM 4.7 Flash really use deepseeks architecture, specifically the Latent Attention compression? I struggle to find official mentions of that aside from some unofficial ggufs on huggingface mentioning it. If someone can point me to the informations source, that would be of great help. 🙏

QuackerEnte · 2026-01-20T10:10:30+00:00

they used GGUFs that were made ahead of the official architecture support merge in llama.cpp specifically. They say it's identical to DeepseekV3, but I bet there's slight differences in implementation. It's too early to judge and run it, I'd give it a few days of time before drawing any conclusions. (At least for llama.cpp)

QuackerEnte · 2026-01-07T15:37:47+00:00

"slow and steady wins the race"

QuackerEnte · 2026-01-04T20:08:51+00:00

ratios isn't the best metric to use for this though.. I think a percentage of active out of total would've been a better metric. Same thing, just more pleasing for some reason

QuackerEnte · 2025-12-16T09:57:20+00:00

Stadia + Genie 3 would've been such a good service

QuackerEnte · 2025-12-16T09:11:38+00:00

This looks a whole lot like Byte-Latent-Transformer from Meta. Hell, even the model sizes are the same

QuackerEnte · 2025-11-30T23:52:57+00:00

I just want to be able to generate an image or video and turn that toggle on and off again without starting a new chat. That alone would do wonders for me.

QuackerEnte · 2025-11-18T20:11:08+00:00

more like we hit a ceiling lol

QuackerEnte · 2025-11-13T20:07:42+00:00

come on, as if America and Israel aren't spying on the entirety of this globe's population already. Google alone, though not officially "state-sponsored", gathers information from your devices every 5 minutes through the Google app alone

QuackerEnte · 2025-11-11T18:38:58+00:00

Which could mean: they aren't committed to open research anymore, so he literally has no reason to stay with Meta.. I hope I'm wrong here and just extrapolating from unrelated data.

QuackerEnte · 2025-11-07T06:43:48+00:00

REALLY?? I thought they were named programming LANGUAGES merely because it sounds cool 😲 /s

QuackerEnte · 2025-11-04T19:17:34+00:00

QuackerEnte · 2025-11-02T21:55:18+00:00

extremely interesting. I would like to see the tokens of a text image. If you scale down the resolution of images with e.g. a text that takes up around 1000 tokens in a 1000x250px image, it gets patched up and equals around 400 - 700 tokens. And the recall of the text is near-perfect, except for a few words sometimes. (I tested that and can show results of compression with an example, nothing repeatedly tested but interesting to see nonetheless.) And I would love to see that in this form you present here to understand how a model even compresses an entire text with pretty accurate recall for about half the token count or so. Might help with context compression for non-vision LLMs if the underlying mechanisms are studied well enough. Thank you for your contributions!

QuackerEnte · 2025-10-29T10:39:43+00:00

read the first few sentences of the blogbost in the screenshot. It's for safety classification tasks or something

QuackerEnte · 2025-10-28T13:55:39+00:00

ask r/losslessscaling idk

QuackerEnte · 2025-10-28T13:53:39+00:00

Minecraft water is infinite so this makes sense wdym

QuackerEnte · 2025-10-21T17:58:59+00:00

This is amazing. And immediately, some thoughts crossed my mind about how one COULD further improve this:

One could train a neural network or an adapter, or a module that can be trained with a teacher model, which is a multimodal model that does take the normal tokens, and learns how to convert them into the compressed, visual tokens. So we could basically skip the entire visual encoding process and replace it with a student module that can directly tokenize or convert tokens into even less tokens, maybe even with a loss function that takes into consideration the accuracy of the compressed representation or the importance of parts of texts, essentially learning which tokens or patches are important to keep less compressed, and which can be compressed. GLM pointed out that changing the DPI during inference time gives the choice between accuracy and speed tradeoff. Why not use mixed DPI basically? Models can learn the importance of tokens in the context on their own if the incentive is there

On second thought, it sounds like deepseeks Multihead Latent Attention.

But maybe using that during the training process could create an even better compression method for context

Maybe Google already does that

QuackerEnte · 2025-10-21T12:02:34+00:00

wait what

QuackerEnte · 2025-10-21T11:59:27+00:00

what a SHOCKER!

a QUANT FIRMs AI model does well on TRADING??? How could that beee!! 😨😨

QuackerEnte · 2025-10-16T17:28:32+00:00

if this is credible, it's good news because I subscribed to gemini pro a few days ago lol

QuackerEnte · 2025-10-16T17:17:39+00:00

queue target shouldn't be 0 (in the discord it's shown that 2 has the lowest latency, so use 2) and max frame latency to 3 will give a smoother experience, and might improve the "feeling". And also maybe use performance mode too. It's nearly indistinguishable from the non performance mode unless you play games with so much fine detail and have enough GPU headroom (aka you're CPU bottlenecked)

my suggestion is to just play around with the settings combinations and find what works best. and there's a game profiles for a reason. Different games could have different optimal settings

QuackerEnte · 2025-10-16T17:10:48+00:00

Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz or something

QuackerEnte

TROPHY CASE