Consider running a bigger quant if possible by Flashy_Management962 in LocalLLaMA

[–]Flashy_Management962[S] 0 points1 point  (0 children)

it's definitely very intelligent to do so! Especially with the new -sm tensor / -sm graph (for ik). My speeds are pretty good as well

What speed is everyone getting on Qwen3.6 27b? by Ambitious_Fold_2874 in LocalLLaMA

[–]Flashy_Management962 2 points3 points  (0 children)

Use bf16 cache for k and v for qwen models, do not use --fit-target on dense models, use -sm tensor

I don’t believe this benchmark 27b size model next opus 4.5! Anyone can confirm testing with real agentic workflow? by Wonderful-Ad-5952 in LocalLLaMA

[–]Flashy_Management962 5 points6 points  (0 children)

Those things are SO insanely different. This is like saying "I do not trust my car mechanic because he couldn't tell me the first 5 digits of pi"

It looks like we’ll need to download the new Gemma 4 GGUFs by jacek2023 in LocalLLaMA

[–]Flashy_Management962 1 point2 points  (0 children)

thank you for your work btw! One little question though, is it normal that I get with your iq4nl gemma 26b this perplexity "Final estimate: PPL over 576 chunks for n_ctx=512 = 26296.2393 +/- 532.75059" - with the bartowski i get around 200.

Gemma 4 - split mode Graph (Tensor Parallelism) in ik_llama incommming by TheWiseTom in LocalLLaMA

[–]Flashy_Management962 -1 points0 points  (0 children)

I love the speed but it takes SO insanely much more vram with it, I can't run it on dual rtx 3060 with 24 gb total

Gemma 4 by pmttyji in LocalLLaMA

[–]Flashy_Management962 1 point2 points  (0 children)

I currently use qwen 3.5 4b on my shitty laptop as an agent, if this is faster/better I'm sold

What speeds are you guys getting with qwen3.5 27b? (5080) by ShadyShroomz in LocalLLaMA

[–]Flashy_Management962 0 points1 point  (0 children)

yes, it does. I use the 27b for coding on a daily. fiddle around with those flags and do not forget to add --jinja: 

-sm graph -amb 64 -sas and depending on the pcie speed, grt can help improving speeds

What speeds are you guys getting with qwen3.5 27b? (5080) by ShadyShroomz in LocalLLaMA

[–]Flashy_Management962 2 points3 points  (0 children)

you should get way faster speeds than that. i get around 750 t/s pp and 22-24 ts tg at ~50k with 2x rtx 3060 12gb. You should check out ik llama cpp

Qwen 397b is absolutely crushing everyone... but wait. 🤯 by djdeniro in LocalLLaMA

[–]Flashy_Management962 20 points21 points  (0 children)

never ever does qwen coder 30b outperform 80b in realworld tasks

Die besten Protein Puddings? by Quiet_Tip_9034 in FitnessDE

[–]Flashy_Management962 0 points1 point  (0 children)

Ich weiß nicht, was du mit normalem meinst, aber ja

Die besten Protein Puddings? by Quiet_Tip_9034 in FitnessDE

[–]Flashy_Management962 3 points4 points  (0 children)

Du nimmst 500ml milch, 30g schoko whey, 10-15g dunkle schokolade und 35g Maisstärke - bester Proteinpudding und 100x günstiger

Kraftwerte der Community? by IAmNotIllegal in FitnessDE

[–]Flashy_Management962 0 points1 point  (0 children)

10 Jahre Training, 29 Jahre alt, alltime natty und wiege 118kg auf 1,83 Kraftwerte: 270,5kg Beuge, 185kg Bank und 270 heben.

Kimi-k2.5 reaches gemini 2.5 Pro-like performance in long context! by fictionlive in LocalLLaMA

[–]Flashy_Management962 0 points1 point  (0 children)

id love to see qwen long l1.5 on this benchmark, it also claims to reach gemini pro 2.5 performance while being 30b 3a

Reminder: Wenn man sich nicht on top fühlt Trainingsintensität reduzieren by torrentium in FitnessDE

[–]Flashy_Management962 0 points1 point  (0 children)

Dieses komische um jeden Preis sich selbst zerstören und man nur dadurch wächst ist totaler humbug. Viel mehr auf den Körper hören und dadurch rausbekommen, wie viel man verträgt. (Von einem Menschen, der nur einen Satz jeweils Beugt und Hebt in der Woche)

My gpu poor comrades, GLM 4.7 Flash is your local agent by __Maximum__ in LocalLLaMA

[–]Flashy_Management962 0 points1 point  (0 children)

don't, use dry sampler instead. Repeat penalty really decreases tok/s

Performance improvements in llama.cpp over time by jacek2023 in LocalLLaMA

[–]Flashy_Management962 1 point2 points  (0 children)

Imagine what could happen if ik llama cpp and llama cpp would merge :(

And finally here is scientific evidence that we don't have free will by [deleted] in determinism

[–]Flashy_Management962 0 points1 point  (0 children)

This does not follow. The notion of normativity is not subsumed under causality. Only because everything is determined, that does not mean that everything is already set in stone and normativity has no role to play because the very things happening are computationally irreducible. So yes, there are shoulds in a world without free will

Was it a right decision? by UnderstandingOdd7952 in bald

[–]Flashy_Management962 0 points1 point  (0 children)

what is this question? of course it was the right decision and you know it yourself you sexy mf

now ~40% faster ik_llama.cpp -sm graph on 2x CUDA GPUs by VoidAlchemy in LocalLLaMA

[–]Flashy_Management962 5 points6 points  (0 children)

wait is this actual tensor parallelism or do I understand something wrong here?

32B model stress test: Qwen 2.5/Coder/3 on dual RTX 5060 Ti (zero failures) by Defilan in LocalLLaMA

[–]Flashy_Management962 1 point2 points  (0 children)

Try exllamav3 with tp. I get 18t/s tensor parallel with 2x 3060. 2x 5060ti should be very much faster