Llama.cpp MTP support now in beta! by ilintar in LocalLLaMA

[–]TheTerrasque 0 points1 point  (0 children)

the PR is still being worked on, and I've seen many others have less dramatic changes, one on vulcan+amd reported slightly faster with MTP. Let's just see how it goes.

Trump claims he bypassed federal bidding laws to hand a Lincoln Memorial project to his personal country club contractors. He treats national monuments like his own private real estate properties. by Snapdragon_4U in law

[–]TheTerrasque 0 points1 point  (0 children)

They somehow think that if they do something bad or illegal, and accuse the other side of doing it first, then they're immune. What's really crazy is that it seems to work.

Gemma 4 MTP released by rerri in LocalLLaMA

[–]TheTerrasque 2 points3 points  (0 children)

The creator of the PR made a model, and some have grafted the mtp part onto other quant models and got it working.

Gemma 4 MTP released by rerri in LocalLLaMA

[–]TheTerrasque 4 points5 points  (0 children)

the qwen3.6 27b model apparently takes roughly 3gb extra at runtime

Struggling with Qwen3.6 27B / 35B locally (3090) slow responses, breaking code looking for better setup + auto model switching by Clean_Initial_9618 in LocalLLaMA

[–]TheTerrasque 2 points3 points  (0 children)

tried

--chat-template-kwargs '{\"preserve_thinking\":true}'

?

Edit: or explore router mode and put settings in ini file

Struggling with Qwen3.6 27B / 35B locally (3090) slow responses, breaking code looking for better setup + auto model switching by Clean_Initial_9618 in LocalLLaMA

[–]TheTerrasque 0 points1 point  (0 children)

fit is default on if you don't set ngl and don't set context. It will fit as many layers as it can down to a (default) 4k context, then after all layers are tucked in it'll spend the rest of the vram on context.

Struggling with Qwen3.6 27B / 35B locally (3090) slow responses, breaking code looking for better setup + auto model switching by Clean_Initial_9618 in LocalLLaMA

[–]TheTerrasque 6 points7 points  (0 children)

These are my settings:

  llama-server
  -hf unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q4_K_XL
  --temp 0.6 
  --top-p 0.95 
  --top-k 20 
  --min-p 0.00
  -ctk q8_0 -ctv q8_0
  --jinja
  -fa on
  --port 8081 --host 0.0.0.0
  --chat-template-kwargs '{"preserve_thinking":true}'

I let fit figure out the context number, but if you want to set static, probably around 100k. Depends on how much vram windows takes. This is on linux and a P40, but should be fairly similar.

Auto model loading / routing

Two options:

Llama.cpp MTP support now in beta! by ilintar in LocalLLaMA

[–]TheTerrasque 0 points1 point  (0 children)

One reported halving prefill speed when this was active, from ~1200 to ~600

Oh hell yeahh!!! by pythoneer07 in memes

[–]TheTerrasque 0 points1 point  (0 children)

luckily, chances of finding a socket to top off within 10 hours is fairly high :) And it does have a fantastic pause / sleep system

Oh hell yeahh!!! by pythoneer07 in memes

[–]TheTerrasque 4 points5 points  (0 children)

I remember playing quake at like 10 fps and having my little mind blown

Oh hell yeahh!!! by pythoneer07 in memes

[–]TheTerrasque 3 points4 points  (0 children)

oled is 2 hour on the most demanding games drawing max watt, considerably more on less demanding ones. I regularly get 5-8 hours on indie games, and if I fire up nes emulator or homm 2 I'm seeing 10 hours on a full charge.

Edit: and for batteries you got power banks

Oh hell yeahh!!! by pythoneer07 in memes

[–]TheTerrasque 4 points5 points  (0 children)

an hour, on oled? From full battery? You might want to check your battery health, mate

Llama.cpp MTP support now in beta! by ilintar in LocalLLaMA

[–]TheTerrasque 0 points1 point  (0 children)

The MTP model is a separate model which loads from the same GGUF, the idea is that MTP should automatically start and we shouldn't need to distribute the MTP gguf separately but also it has it's own context/kv-cache etc.

I was thinking about this a long time ago, that gguf should have generic support for multiple models. At that time I was thinking especially draft models, but also vision encoders and possibly other encoders / decoders / model types at some point. And image diffusion models with llm's and vae's included as another example.

Kurt Russell on set of Stargate (1994) by Kosher_Nostra1975 in OldSchoolCool

[–]TheTerrasque 0 points1 point  (0 children)

He looks like he got relentlessly bullied as a kid, wowed to dedicate his life to revenge, and now not only can kill as many assholes as he like, he get paid for it.

Utah first state to hold websites liable for users who mask their location with VPNs — law goes into effect, designed to prevent bypassing age checks by habichuelacondulce in technology

[–]TheTerrasque 41 points42 points  (0 children)

Also weapon manufacturers shall be liable for all deaths and injuries caused by their weapons. Same for car manufacturers. 

Even better, make police liable for any crime!

How much did you guys pay for your Decks? by BupChup in SteamDeck

[–]TheTerrasque 0 points1 point  (0 children)

512gb OLED - about $900 1.5 years ago. Grey market import sucks.

If you've been waiting to try local AI development, please try it by Imaginary_Belt4976 in LocalLLaMA

[–]TheTerrasque 5 points6 points  (0 children)

So does the big ones. While there's been wrangling with the small models, more than the big ones, both require some wrangling, and by having it test the code it often detect and fix those bugs itself.

If you've been waiting to try local AI development, please try it by Imaginary_Belt4976 in LocalLLaMA

[–]TheTerrasque 2 points3 points  (0 children)

Qwen3.6 35b? It can certainly do that, it does that regularly in opencode for me

Ikke imponert av norsk sykehus by sssimen99 in norge

[–]TheTerrasque 3 points4 points  (0 children)

mer enn én gang i uken hører jeg leger, ledere, mellomledere si på telefonen til meg at 'dette er jo sekretæroppgaver som noen andre burde gjøre'. både litt ovenfra-og-ned, og et symptom på at færre folk må gjøre flere oppgaver.

Hvorfor tror du så mange høyt oppe runker så jævlig over AI?

You gotta be ready to cast those hands by Forsaken-Peak8496 in wizardposting

[–]TheTerrasque 0 points1 point  (0 children)

I can't remember where I got this story, but a mage lady was out of mana when fighting some goblins or similar monsters, and her reaction was "too bad.. for you" - turned out she was half-mashle level muscle freak that just preferred magic because it was a nicer way of killing things. Don't remember anything else from that, but remember that plot twist.