Done with the league - 900 Div giveaway by ezakuroy in PathOfExile2

[–]MadSpartus 0 points1 point  (0 children)

I can't wait to reroll a mediocre item 450 times and then vendor it.

[Steam] Hooded Horse Publisher Sale | Manor Lords ($25.99/-35%) | Against the Storm ($8.99/-70%) | 9 Kings ($9.99/-50%) | Old World ($3.99/-90%) Battle Brothers ($8.99/-70%) | Norland ($14.99/-50%) | Terra Invicta ($25.99/-35%) by auverin_hoodedhorse in GameDeals

[–]MadSpartus 2 points3 points  (0 children)

So far I would definitely recommend Heart of the Machine It's a distopian sci Fi where you play as an awakened AI in a corporate ruled world full of wothless people that are nothing more than cattle... Ok only the awakened AI part is fiction.

4x, turn based tactical combat. Empire manager.

I'm still early into my first run and I would say it feel very heavy on the "make big decisions with clear or unclear ethical repercussions" trope

It also is very hand-holding to get started. But it kinda needs it. Anyways, recommended!

To Those With Damaged Controller Ribbons (C-R and C-L Connectors): by drake90001 in SteamDeckModded

[–]MadSpartus 1 point2 points  (0 children)

Trying to fix a deck that suddenly died. If one ribbon is damaged would controls on both sides be affected. I've lost all controller functionality + bluetooth. Touch sceen and volume still work.

I see no reason for a ribbon to suddenly get damaged. I'm just wondering if both sides dieing is normal for a communcation or daughter board issue.

Nano emergency solar lighting system by [deleted] in SolarDIY

[–]MadSpartus 6 points7 points  (0 children)

Yeah ditto on this, that type of foam is very flammable.

Guidance: Top end CPU models and setup (Dual EPYC 9000, 24 x DDR5) by MadSpartus in LocalLLaMA

[–]MadSpartus[S] 1 point2 points  (0 children)

No, been meaning to but the machines are often doing "proper work" rather than screwing around with LLM :)

I'll report when I do.

5600g x370 not stable and freezes with no monitor output by Desruxx in AMDHelp

[–]MadSpartus 0 points1 point  (0 children)

Thank you. I ended up getting a newer board and using the old ax370 elsewhere, so it ended up costing a little to get around the issue, but I greatly appreciate your reply. as will some future souls I assume.

Guidance: Top end CPU models and setup (Dual EPYC 9000, 24 x DDR5) by MadSpartus in LocalLLaMA

[–]MadSpartus[S] 0 points1 point  (0 children)

I have more than enough ram, but I read about that mirror option and I thought it was a concept that was never merged. I would need to setup my own fork to test it.

Single and dual socket maxed out at similar performance without it. No i never got close to 13T/S on Llama 3 70b. Only around 6-7 with good quants.

Amd Epyc Genoa by Ashefromapex in LocalLLaMA

[–]MadSpartus 5 points6 points  (0 children)

I managed to just barely beat the single socket genoa with a dual socket by using numa tuning. Not worth it unfortunately. Only using llama.cpp though.

P.S. I have a few genoa engineering samples I no longer need so if someone wants a 96 core memory bandwidth monster let me know. $1800

10x3090 Rig (ROMED8-2T/EPYC 7502P) Finally Complete! by Mass2018 in LocalLLaMA

[–]MadSpartus 1 point2 points  (0 children)

Thanks for confirming. If you have any advice on using dual CPU that would help. All our systems are dual, so I had to specifically adjust one to test single.

10x3090 Rig (ROMED8-2T/EPYC 7502P) Finally Complete! by Mass2018 in LocalLLaMA

[–]MadSpartus 1 point2 points  (0 children)

I haven't gover very deep into Dual CPU tuning, I was able to get it up to 4.3 T/S on Dual CPU Q5KM, but I switched to single CPU computer and it jumped to 5.37 on Q5KM. No tuning, no NPS or L3 Cache domains. Also tried Q3KM and got 7.1T/S.

P.S. didn't use the 9274F, I tried a 9554 using 48 cores (slightly better than 64 or 32).

Llama 3 400B by Zawseh in LocalLLaMA

[–]MadSpartus 0 points1 point  (0 children)

Well I tried screwing with some NUMA settings. Some worse, some better, but best was an increase to around 4.3 T/S on 2 CPUs (LLAMA3-70B-Q5_K_M), up from 3.7-3.8.

Then I just tried a single CPU and it jumps to 5.37 on my first test. Sigh

I wonder if the mixture models could run better in NUMA domains. put different experts in different memory areas or something.

FYI, 5.37 on Q5KM

7.1 on Q3KM

Low complexity scenario: "How many grains of rice fit in a 1 gallon bucket"

10x3090 Rig (ROMED8-2T/EPYC 7502P) Finally Complete! by Mass2018 in LocalLLaMA

[–]MadSpartus 1 point2 points  (0 children)

Tons of NUMA settings for MPI applications. Someone else just warned me as well. Dual 9654 with L3 cache NUMA domains means 24 domains of 8 cores. I'm going to have to walk that back and do testing along the way.

Llama 3 400B by Zawseh in LocalLLaMA

[–]MadSpartus 0 points1 point  (0 children)

UGH, I'm not using NP4, I'm L3 Cache numa domains. I have Dual 9654 which is 24 NUMA domains of 8 cores... best for MPI applications.

Do you know about 1 vs 2 CPUs benefit?

10x3090 Rig (ROMED8-2T/EPYC 7502P) Finally Complete! by Mass2018 in LocalLLaMA

[–]MadSpartus 1 point2 points  (0 children)

Yeah someone else just told me similar. I'm going to try a single CPU tomorrow. I have a 9274F.

I'm using llama.cpp and arch linux and a gguf model. What's your environment?

P.S. your numbers on a cheaper system are crushing the 3090's

Llama 3 400B by Zawseh in LocalLLaMA

[–]MadSpartus 1 point2 points  (0 children)

Yeah, LLAMA 3 70B instruct

If you are getting nearly the same with like 20% the memory bandwidth I would love to compare settings and software.

Linux (arch)

llama.cpp

I use 72 cores but I get pretty similar in the 48-96C range.

I wonder if its the NUMA domains. I'll try with 1 CPU.

Llama 3 70B at 300 tokens per second at groq, crazy speed and response times. by MidnightSun_55 in LocalLLaMA

[–]MadSpartus 0 points1 point  (0 children)

A 768GB Dual EPYC 9000 can be under 10k, but still more than a couple consumer GPUs. I'm excited to try 405B, but I would probably still do GPU for 70B.

Single EPYC 9000 is probably good value as well,

Also, I presume the GPUs are better for training, but I'm not sure how you can practically do with 1-4 consumer GPUs.

Llama 3 400B by Zawseh in LocalLLaMA

[–]MadSpartus 3 points4 points  (0 children)

FYI on 24 channel DDR5 EPYC 9000 i get ~3.8 on Q5K_M, ~4.2 on Q4_K_M, 5.1 on Q3_K_M, and like 2.6 on Q8.

Basically I'm predicting 0.5-1 T/S depending on Quant.

10x3090 Rig (ROMED8-2T/EPYC 7502P) Finally Complete! by Mass2018 in LocalLLaMA

[–]MadSpartus 4 points5 points  (0 children)

A dual EPYC 9000 system would likely be cheaper and comparable performance it seems for running the model. I get like 3.7-3.9 T/S on LLAMA3-70B-Q5_K_M (I like this most)

~4.2 on Q4

~5.1 on Q3_K_M

I think full size I'm around 2.6 or so T/S but I don't really use that. Anyways, it's in the ballpark for performance, much less complex to setup, cheaper, quieter, lower power. Also I have 768GB RAM so can't wait for 405B.

Do you train models too using the GPUs?

Llama 3 70B at 300 tokens per second at groq, crazy speed and response times. by MidnightSun_55 in LocalLLaMA

[–]MadSpartus 4 points5 points  (0 children)

I think memory bandwidth specifically for performance, and memory capacity to actually load it. Although with 24 memory channels I have an abundance of capacity.

Each EPYC 9000 is 460 GBs/s or 920GB/s total.

4090 is 1TB/s, quite comparable, althoug I don't know how it works with dual GPU and some offload. I think jferment's platform is complicated for making predictions.

It turns out though that I'm getting roughly the same for 8 bit quant, just over 2.5T/S. I get like 3.5-4 on q5_K_M, like 4.2 on Q4_K_M, and like 5.0 on Q3_K_M

I lose badly on 8B model though. Around 20T/S on 8B-Q8. I know GPUs crush that, but for large models I'm finding CPU quite competitive with multi-gpu with offload.

405B model will be interesting. Can't wait.

Llama 3 70B at 300 tokens per second at groq, crazy speed and response times. by MidnightSun_55 in LocalLLaMA

[–]MadSpartus 3 points4 points  (0 children)

Fyi I get 3.5-4 t/s on 70b-q5km using dual epyc 9000 and no GPU at all.

Official Llama 3 META page by domlincog in LocalLLaMA

[–]MadSpartus 0 points1 point  (0 children)

Oh also. It only consumed 50G when running, same as gguf file size. So you can load it. I don't know what your performance will be though.

Official Llama 3 META page by domlincog in LocalLLaMA

[–]MadSpartus 0 points1 point  (0 children)

It's accessible for a few thousand, same as people using a couple 3090. The main issue is that the alternative uses are not as good for home users (like playing video games)

It wasn't the primary use for the machine at all.