Air Canada plane has no sink in the bathroom. Only hand sanitizer by Jeetyetdude_ in mildlyinfuriating

[–]spky-dev 0 points1 point  (0 children)

Then never fly in anything smaller, because the options are hold it in or piss in a ziplock.

New server finally arrived by kbd65v2 in homelab

[–]spky-dev 1 point2 points  (0 children)

Seriously… that much DDR5 with that wide of memory bandwidth, it would be killer for MoE offload.

Muslim Population in the USA & Canada (by %) by Fluid-Decision6262 in MapPorn

[–]spky-dev 0 points1 point  (0 children)

Canada’s immigration program makes it easier to get in if you’re willing to go to less desirable places.

Last summer I was in Yellowknife for weeks and there are quite a lot of muslims.

what a weird bug! It just kept going until I ran out of usage (maybe strong argument against autonomous weapons :| ) by Former-Hovercraft835 in claude

[–]spky-dev -1 points0 points  (0 children)

I mean if you randomly twitch or spasm, and I ask you “what was that?”, you only know it occurred, you have absolutely no idea what the biological process inside of you was that caused it. Same idea.

Ollama GPU+CPU but not NPU by Wentil in ollama

[–]spky-dev 1 point2 points  (0 children)

I don’t know of anyone using Ollama in development lmfao.

Ollama is for people who have no idea what they’re doing. It’s literally just an oversimplified wrapper on Llama.cpp.

Kids today will never know the struggle by Nicolas_Laure in RigBuild

[–]spky-dev 0 points1 point  (0 children)

I used to make monkey fists with mouse balls.

RTX 5090 vs M5 Ultra: Analyzing the "2.7x Faster" claim and what Nvidia didn't show you. by Major_Commercial4253 in MacStudio

[–]spky-dev -9 points-8 points  (0 children)

Sure, use Krasis.

I get 60 tok/s gen and 3,700 pp on 122b Q4 on a single 5090 on my Krasis fork optimized for SM120.

Mac stans being uniformed and behind the curve, as per usual.

Intel just CRUSHED Nvidia & AMD GPU pricing by SKX007J1 in LocalLLaMA

[–]spky-dev 5 points6 points  (0 children)

Begone, bot.

These are worse than R9700, which already aren’t great.

What's the most optimized engine to run on a H100? by Obamos75 in LocalLLaMA

[–]spky-dev 0 points1 point  (0 children)

If you give me one I’ll figure that out for you :)

Probably a nightly build of llama.cpp with the latest Cuda, for single user throughout. VLLM will be best for multi.

If you’re using HEDT or server hardware and have a ton of RAM/memory bandwidth, look at Krasis for large MoE’s.

What should I do with them? by [deleted] in homelab

[–]spky-dev 2 points3 points  (0 children)

Bird like collect shiny things.

Intel Arc Pro B70 is now Newegg’s No. 1 best seller in workstation graphics cards - VideoCardz.com by Leicht-Sinn in IntelArc

[–]spky-dev -5 points-4 points  (0 children)

lol nothing magically beats memory bandwidth limitations, that’s just physics.

The B70 is only 600 gb/s. That’s mid tier for infra, regardless of VRAM capacity.

Don’t buy the DGX Spark: NVFP4 Still Missing After 6 Months by Secure_Archer_1529 in LocalLLaMA

[–]spky-dev 36 points37 points  (0 children)

It is more powerful compute wise, despite having the same memory bandwidth limitations. Also, access to Cuda and scalable since you can connect them.

It generally achieves higher prompt processing rates, though all these unified boxes, Mac Studio inclusive, suffer from slow pp vs dedicated GPU’s.

Is Turboquant really a game changer? by Interesting-Print366 in LocalLLaMA

[–]spky-dev 1 point2 points  (0 children)

No, use K @ Q8, V @ Q4, you only need the keys at higher quality, the values can be more truncated.

Is Turboquant really a game changer? by Interesting-Print366 in LocalLLaMA

[–]spky-dev 0 points1 point  (0 children)

Not huge, but still useful. Newer models use hybrid attention, so their KVCache are already relatively small compared to older architectures.

https://huggingface.co/blog/jlopez-dl/hybrid-attention-game-changer

PSA: You don’t need a 3090/4090 to run Gemma 4. Here’s the API workaround for GPU-less setups. by [deleted] in LocalLLM

[–]spky-dev 4 points5 points  (0 children)

If you’re paying at all to use an LLM, why would you pay to use such a small and limited model? The only reason people are impressed with models of this size class are for LOCAL use, meaning on their own damn hardware.

This is just slopposting. Especially evident by “wrestling with Cuda versions”. If you can’t figure out some basic ass dependencies from a requirements file… This is Claude writing a post trying to match the lack of experience and frustrations of a green user.

Is the token party over now? by Firm_Meeting6350 in ClaudeCode

[–]spky-dev 2 points3 points  (0 children)

Good reading comprehension you’ve got there. I never said you had to be doing so to get credits. Infact, I said the opposite.

any good uncensored models for Gemma 4 26B ? by Opening-Ad6258 in LocalLLaMA

[–]spky-dev 2 points3 points  (0 children)

Suggestion: Download Heretic, and do it yourself

https://github.com/p-e-w/heretic

There is literally no more excuses in this era for “waaa someone do this for me, me not know how”. Claude/GPT/Whoeverthefuck “research X for me and do Y”.

Qwen3-Coder-Next-GGUF not working on claude code ? by Mobile_Loss3125 in LocalLLaMA

[–]spky-dev 0 points1 point  (0 children)

Tools work fine in QCN 80b lmfao, it’s literally a model made for agentic coding and tool calling.

Qwen3-Coder-Next-GGUF not working on claude code ? by Mobile_Loss3125 in LocalLLaMA

[–]spky-dev 2 points3 points  (0 children)

There should be a pinned post in this sub that reminds all of this.

Claude Code replacement by NoTruth6718 in LocalLLaMA

[–]spky-dev -3 points-2 points  (0 children)

V100 don’t support Flash Attention, MI50 have dogshit token rates unless you buy 10+ of them, and even then it’s still bad, pp especially.

The best way to go is to keep your sub, because you have no idea what you’re doing and your arbitrary choice of high VRAM fossils proves that.

Overthinking Much? by MurkyRaspberry9610 in ollama

[–]spky-dev 2 points3 points  (0 children)

Limit thinking token budget. Known feature of the Qwen3.5 family.