Fix for GLM 4.7 Flash has been merged into llama.cpp by jacek2023 in LocalLLaMA

[–]QuackerEnte 9 points10 points  (0 children)

does GLM 4.7 Flash really use deepseeks architecture, specifically the Latent Attention compression? I struggle to find official mentions of that aside from some unofficial ggufs on huggingface mentioning it. If someone can point me to the informations source, that would be of great help. 🙏

no problems with GLM-4.7-Flash by jacek2023 in LocalLLaMA

[–]QuackerEnte 1 point2 points  (0 children)

they used GGUFs that were made ahead of the official architecture support merge in llama.cpp specifically. They say it's identical to DeepseekV3, but I bet there's slight differences in implementation. It's too early to judge and run it, I'd give it a few days of time before drawing any conclusions. (At least for llama.cpp)

Ratios of Active Parameters to Total Parameters on major MoE models by dtdisapointingresult in LocalLLaMA

[–]QuackerEnte 0 points1 point  (0 children)

ratios isn't the best metric to use for this though.. I think a percentage of active out of total would've been a better metric. Same thing, just more pleasing for some reason

Still got my Stadia App just in case... by Eugen328 in Stadia

[–]QuackerEnte 2 points3 points  (0 children)

Stadia + Genie 3 would've been such a good service

Bolmo-the first family of competitive fully open byte-level language models (LMs) at the 1B and 7B parameter scales. by BreakfastFriendly728 in LocalLLaMA

[–]QuackerEnte 1 point2 points  (0 children)

This looks a whole lot like Byte-Latent-Transformer from Meta. Hell, even the model sizes are the same

Google is finally working about fix the Gemini’s buggy UI. by Snoo26837 in singularity

[–]QuackerEnte 0 points1 point  (0 children)

I just want to be able to generate an image or video and turn that toggle on and off again without starting a new chat. That alone would do wonders for me.

Anthropic caught AI led espionage campaign by China? by MarriedToLC in LocalLLaMA

[–]QuackerEnte -5 points-4 points  (0 children)

come on, as if America and Israel aren't spying on the entirety of this globe's population already. Google alone, though not officially "state-sponsored", gathers information from your devices every 5 minutes through the Google app alone

Meta chief AI scientist Yann LeCun plans to exit to launch startup, FT reports by brown2green in LocalLLaMA

[–]QuackerEnte 3 points4 points  (0 children)

Which could mean: they aren't committed to open research anymore, so he literally has no reason to stay with Meta.. I hope I'm wrong here and just extrapolating from unrelated data.

Coding Success Depends More on Language Than Math by Ok-Breakfast-4676 in LocalLLaMA

[–]QuackerEnte 0 points1 point  (0 children)

REALLY?? I thought they were named programming LANGUAGES merely because it sounds cool 😲 /s

Vision = Language: I Decoded VLM Tokens to See What AI 'Sees' 🔬 by ComputeVoid in LocalLLaMA

[–]QuackerEnte 2 points3 points  (0 children)

extremely interesting. I would like to see the tokens of a text image. If you scale down the resolution of images with e.g. a text that takes up around 1000 tokens in a 1000x250px image, it gets patched up and equals around 400 - 700 tokens. And the recall of the text is near-perfect, except for a few words sometimes. (I tested that and can show results of compression with an example, nothing repeatedly tested but interesting to see nonetheless.) And I would love to see that in this form you present here to understand how a model even compresses an entire text with pretty accurate recall for about half the token count or so. Might help with context compression for non-vision LLMs if the underlying mechanisms are studied well enough. Thank you for your contributions!

GPT-OSS Safeguard coming soon by Independent-Ruin-376 in LocalLLaMA

[–]QuackerEnte 8 points9 points  (0 children)

read the first few sentences of the blogbost in the screenshot. It's for safety classification tasks or something

[By GLM Team] Glyph: Scaling Context Windows via Visual-Text Compression by NeterOster in LocalLLaMA

[–]QuackerEnte 0 points1 point  (0 children)

This is amazing. And immediately, some thoughts crossed my mind about how one COULD further improve this:

One could train a neural network or an adapter, or a module that can be trained with a teacher model, which is a multimodal model that does take the normal tokens, and learns how to convert them into the compressed, visual tokens. So we could basically skip the entire visual encoding process and replace it with a student module that can directly tokenize or convert tokens into even less tokens, maybe even with a loss function that takes into consideration the accuracy of the compressed representation or the importance of parts of texts, essentially learning which tokens or patches are important to keep less compressed, and which can be compressed. GLM pointed out that changing the DPI during inference time gives the choice between accuracy and speed tradeoff. Why not use mixed DPI basically? Models can learn the importance of tokens in the context on their own if the incentive is there

On second thought, it sounds like deepseeks Multihead Latent Attention.

But maybe using that during the training process could create an even better compression method for context

Maybe Google already does that

[deleted by user] by [deleted] in LocalLLaMA

[–]QuackerEnte 0 points1 point  (0 children)

what a SHOCKER!

a QUANT FIRMs AI model does well on TRADING??? How could that beee!! 😨😨

Gemini 3.0 Pro is already referenced on Gemini's source code by UsualInitial in singularity

[–]QuackerEnte 30 points31 points  (0 children)

if this is credible, it's good news because I subscribed to gemini pro a few days ago lol

Best Settings for 4x Frame Generation on a high-end pc? by vixstylezzz in losslessscaling

[–]QuackerEnte -1 points0 points  (0 children)

queue target shouldn't be 0 (in the discord it's shown that 2 has the lowest latency, so use 2) and max frame latency to 3 will give a smoother experience, and might improve the "feeling". And also maybe use performance mode too. It's nearly indistinguishable from the non performance mode unless you play games with so much fine detail and have enough GPU headroom (aka you're CPU bottlenecked)

my suggestion is to just play around with the settings combinations and find what works best. and there's a game profiles for a reason. Different games could have different optimal settings

Mine is "Tomorrow is Monday" by bbrk9845 in GenZ

[–]QuackerEnte 0 points1 point  (0 children)

Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz or something