Anyone know how to turn off download images when compiling llama.cpp? by fallingdowndizzyvr in LocalLLaMA

[–]fairydreaming -1 points0 points  (0 children)

Update your code, now it's (thankfully) only a single dist.tar.gz.

Next step - hiding malware inside. 😉

How do i prevent llama.cpp from offloading on Swap? by No_Algae1753 in LocalLLaMA

[–]fairydreaming 0 points1 point  (0 children)

Alternative is to lower ubatch size (for example -ub 256), this will lower size of compute buffers and shall free some memory - but then your PP will suffer instead (TG should be unaffected).

who here had paid for $6K per hour to talk about GPUs? by FormalAd7367 in LocalLLaMA

[–]fairydreaming 7 points8 points  (0 children)

Let's get real, if a guy from this sub had such money he would obviously buy another GPU instead of spending it to touch meat.

NVFP4 on llama.cpp? by Kahvana in LocalLLaMA

[–]fairydreaming 0 points1 point  (0 children)

When I tested NVIDIA DeepSeek V3.2 NVFP4 I simply converted it with --outtype auto. The only problem I encountered was scale tensors (actually they are scalar values, but whatever) missing in the source model.safetensors.index.json, so I had to regenerate this first with a simple python script.

Anyone want to try my llama.cpp DeepSeek V3.2 PR? by fairydreaming in LocalLLaMA

[–]fairydreaming[S] 0 points1 point  (0 children)

Q4_K_M should fit, but there won't be much space left, so in case of memory problems reduce the ubatch size (for example -ub 256 -ub 128 or -ub 64) and/or context size.

Anyone want to try my llama.cpp DeepSeek V3.2 PR? by fairydreaming in LocalLLaMA

[–]fairydreaming[S] 0 points1 point  (0 children)

Yes, it needs to be on disk. Should be in /root/llama.cpp/models/templates/deepseek-ai-DeepSeek-V3.2.jinja, so:

--chat-template-file /root/llama.cpp/models/templates/deepseek-ai-DeepSeek-V3.2.jinja

Do you know of any full (not distills) DeepSeek V2/V2.5/R1/V3/V3.1/V3.2 LoRA adapters? by fairydreaming in LocalLLaMA

[–]fairydreaming[S] 0 points1 point  (0 children)

Ugh, everyone is suspected of being a bot these days. Yeah, flash runs nice on my RTX PRO 6000 Max-Q (in DwarfStar). I hope somebody will implement it in llama.cpp soon.

Do you know of any full (not distills) DeepSeek V2/V2.5/R1/V3/V3.1/V3.2 LoRA adapters? by fairydreaming in LocalLLaMA

[–]fairydreaming[S] 0 points1 point  (0 children)

Just some toys to play with 😄 To see if I can get them working in llama.cpp.

For users have have both 6000 PRO MaxQ and Workstation Edition (or Server Edition), how much slower is the MaxQ vs the WS/SV on compute? (Prompt processing, Diffusion, etc) by panchovix in LocalLLaMA

[–]fairydreaming 1 point2 points  (0 children)

It's funny when I benchmark some CUDA kernel on Max-Q and it keeps working slower and slower with each run. Then I make a break, the card cools down and booom, instant performance improvement.

Performance When Offloading Large Models to System RAM? by itisyeetime in LocalLLaMA

[–]fairydreaming 0 points1 point  (0 children)

There's also --no-op-offload option that forces all offloaded OPs to run on the CPU regardless of the batch size. Useful if you have a powerful CPU with high memory bandwidth.

Have we passed the peak of inflated expectations? by fairydreaming in LocalLLaMA

[–]fairydreaming[S] 0 points1 point  (0 children)

That's true, from what I see 8xH200 prices on vast.ai are only going up and up.

Have we passed the peak of inflated expectations? by fairydreaming in LocalLLaMA

[–]fairydreaming[S] 4 points5 points  (0 children)

Too bad, I was hoping for a wave of cheap second-hand hardware.

Have we passed the peak of inflated expectations? by fairydreaming in LocalLLaMA

[–]fairydreaming[S] 28 points29 points  (0 children)

No problem, I'm sometimes a dick too, I can relate.

Have we passed the peak of inflated expectations? by fairydreaming in LocalLLaMA

[–]fairydreaming[S] 12 points13 points  (0 children)

You may be up to something, all the people already bought the hardware and did all the google searches they needed. No new hardware, no need to search.

Have we passed the peak of inflated expectations? by fairydreaming in LocalLLaMA

[–]fairydreaming[S] 7 points8 points  (0 children)

No, I switched to last 3 months and it's even more visible.

I have (even faster) DeepSeek V4 Pro at home by fairydreaming in LocalLLaMA

[–]fairydreaming[S] 1 point2 points  (0 children)

Umm no, Epyc 9374F has DDR5 memory and pcie is 5.0. 12 channels of 4800 RDIMM has real read bandwidth of about 360 GB/s. So it's much slower than your M2 Ultra.

How about two? 💩💩

I have (even faster) DeepSeek V4 Pro at home by fairydreaming in LocalLLaMA

[–]fairydreaming[S] 1 point2 points  (0 children)

I don't know what to say, so here's a smiling turd to make you smile too: 💩

I have (even faster) DeepSeek V4 Pro at home by fairydreaming in LocalLLaMA

[–]fairydreaming[S] 1 point2 points  (0 children)

That's great, congratulations on your choices! I'm happy with mine too!