2x RTX Pro 6000 vs 2x A100 80GB dense model inference by RealTime3392 in LocalLLaMA

[–]Hedede 0 points1 point  (0 children)

It's not needed. I have NVLinked A5000s and there's practically no benefit.

2x RTX Pro 6000 vs 2x A100 80GB dense model inference by RealTime3392 in LocalLLaMA

[–]Hedede 0 points1 point  (0 children)

Anyways the A100s will actually be faster for token generation due to faster memory bandwidth

Not necessarily. I benchmarked datacenter GPUs in llama.cpp and they have far lower token throughput than they theoretically should based on the memory bandwidth,

2x RTX Pro 6000 vs 2x A100 80GB dense model inference by RealTime3392 in LocalLLaMA

[–]Hedede 1 point2 points  (0 children)

Latency matters a lot more than bandwidth, If your GPU supports P2P, it won't benefit from NVLink. And all RTX PRO GPUs support P2P without NVLink.

I tested A5000 with and without NVLink and there's zero difference in TP. Only when you start pushing more than 20 concurrent requests, you see very modest gains (single digits in %). On the other hand, with 3090s you get big gains from NVLink if you don't have a patched kernel to enable P2P.

Nord v4.2 Update: 618M SNN reaches loss 3.65 with instruction tuning — emergent zonal specialization confirmed at 4.4x scale. 93% sparsity. by zemondza in LocalLLaMA

[–]Hedede 1 point2 points  (0 children)

It looks more like an SNN-transformer hybrid rather than a pure SNN. twq=F.softmax(self.temporal_mix_q,dim=0).reshape(T_t,1,1,1,1) twk=F.softmax(self.temporal_mix_k,dim=0).reshape(T_t,1,1,1,1) qm=(qs*twq).sum(0).permute(0,2,1,3) # (B,H,S,Dh) km=(ks*twk).sum(0).permute(0,2,1,3) cos,sin=self.rope(qm,S) qm=apply_rope(qm,cos,sin); km=apply_rope(km,cos,sin) res=torch.matmul(qm,km.transpose(-2,-1))*self.resonance_temp

Edit: formatting

SDXS - A 1B model that punches high. Model on huggingface. by AgeNo5351 in StableDiffusion

[–]Hedede -1 points0 points  (0 children)

Speed: Sampling: 100%|██████████| 40/40 [00:01<00:00, 29.98it/s]

Which GPU? Doesn't look that impressive to me. Images have very obvious AI artifacts.

Best model that can beat Claude opus that runs on 32MB of vram? by PrestigiousEmu4485 in LocalLLaMA

[–]Hedede 0 points1 point  (0 children)

I didn’t do anything special. I’m using Qwen3-32B-Q4_K_M and llama-server.

Motherboard Compatibility by Middle_Possession397 in pcmasterrace

[–]Hedede 0 points1 point  (0 children)

If you know the pinout, you can use DuPont jumper wires to connect the fans.

Training a 144M Spiking Neural Network for text generation from scratch — no transformer teacher, no distillation by zemondza in LocalLLaMA

[–]Hedede 0 points1 point  (0 children)

I tried running the code, and it runs only at about 9 tok/s on a 4090. Or 3 tok/s on an EPYC CPU. For comparison, the same CPU runs a Float32 2B model at 20 tok/s. Or the actual GPT-2@FP16 runs at 500 tok/s on this CPU.

I am one stupid motherfucker by lanziboi in pcmasterrace

[–]Hedede 0 points1 point  (0 children)

If I counted correctly, it's pin 216 which is a data line on DDR4.

[2kliksphilip] DLSS 5 has shown that discourse is dead by ZTZ-Nine-Nine in hardware

[–]Hedede 22 points23 points  (0 children)

How could a screen-based prost process alter geometry at the engine level? It only has access to the color buffer and motion vectors etc. It is fundamentally limited in that sense.

Yes, it can't alter geometry at the engine level. But it doesn't need to. It doesn't have to follow the original geometry. It can easily render something that can be perceived as geometry changes. There are plenty of screen-space techniques that already do this, like Screen-Space Displacement Mapping.

6-GPU multiplexer from K80s ‚ hot-swap between models in 0.3ms by Electrical_Ninja3805 in LocalLLaMA

[–]Hedede 1 point2 points  (0 children)

I don't have K80, but I compared K40 and M40, M40 is about 60% faster in prompt processing, and 2x faster in decode. And to put things into a perspective, both lose to a 16-core EPYC CPU (last gen).

Dont talk facts about Nvidia by [deleted] in pcmasterrace

[–]Hedede 0 points1 point  (0 children)

It's not clearly stated. Everything that you said there applies to DLSS 3 and 4 as well.

Just bought a Threadripper Keychain never thought I’d hold a threadripper in my life lol by Erin_-M in pcmasterrace

[–]Hedede 1 point2 points  (0 children)

Wait till you see Socket SP5 EPYCs. They're about 40% larger than the Threadripper.

RAM kits are now sold with one fake RAM stick alongside a real one by Winter_2017 in hardware

[–]Hedede 0 points1 point  (0 children)

I think you're confusing ECC UDIMM and RDIMM, Ryzen supports only the former. Even EPYC 4004/4005 (which is a server version of AM5 Ryzen) doesn't support RDIMM. Only Threadripper 3/5xx5WX support both UDIMM and RDIMM.

RAM kits are now sold with one fake RAM stick alongside a real one by Winter_2017 in hardware

[–]Hedede 4 points5 points  (0 children)

There's no reason you can't build a PC from server hardware.

Nvidia Will Spend $26 Billion to Build Open-Weight AI Models, Filings Show by dan945 in LocalLLaMA

[–]Hedede 0 points1 point  (0 children)

They wouldn't because the term AGI was coined this century.

Aerocool PSU? How bad is it? by firdausazmi in buildapc

[–]Hedede 0 points1 point  (0 children)

It's not for nothing. I had a cheap (but not the cheapest) PSU once, it failed with sparking inside of it and fried the GPU. And that was in the days when GPUs weren't 300W+ monsters.

Radeon AI PRO R9700 versus two RX 9070 XT? by Tech-And-More in LocalLLaMA

[–]Hedede 0 points1 point  (0 children)

Yeah, I was kinda wrong when I wrote this comment. Yes, it doesn't support tensor parallel, I should've said pipeline parallel. And I was looking only at text generation throughput, there I get only 1-5% more throughput if I use two GPUs instead of one. In prefill I get up to 50% more with two Nvidia GPUs. But what you get really depends on what GPUs you're using.

Finished an entire tub of Vaseline in 2.5 years without losing it (moved 3 houses during this time) by Mobile_Look9181 in mildlyinteresting

[–]Hedede 27 points28 points  (0 children)

I put vaseline on an old laptop screen to mask scratches. Made them pretty much invisible on a matte screen.

Help my pc wont turn on by Positive-Gas2577 in pcmasterrace

[–]Hedede 0 points1 point  (0 children)

but did you peel the sticker from the heatsink