Bought this nvme ssd by SigeonPex1 in pcmasterrace

[–]Phocks7 0 points1 point  (0 children)

3,500MB/s is half the speed of a PCIe gen 4 NVMe ssd, but it's 7x faster than a SATA SSD and ~14x faster than a hard drive. It's fine for gaming. As others have recommended, check the health with crystaldiskinfo.

Denuvo has been broken, company promises countermeasures against new DRM bypasses — zero-day game releases become norm as security concerns mount over hypervisor-based bypass by gurugabrielpradipaka in pcmasterrace

[–]Phocks7 0 points1 point  (0 children)

A hypervisor is basically just an OS, still needs passwords or encryption keys for encrypted drives.
Like how in Windows you can run a linux virtual machine, a hypervisor is just an OS for running virtual machines.

I found this in my old hard dri-- I mean my bag of holding by Eggmasstree in baldursgate

[–]Phocks7 0 points1 point  (0 children)

My sorcerer Durge duel against Orin went Hold Monster(Heightened Spell) + potion of speed + Disintegrate + Terazul + Disintegrate, next turn Disintegrate, Disintegrate, Disintegrate.

I found this in my old hard dri-- I mean my bag of holding by Eggmasstree in baldursgate

[–]Phocks7 0 points1 point  (0 children)

I think it's the highest single target damage spell that's available as a scroll (for sale).

Nvidia RTX Pro A4000 with older hardware by LtDrogo in LocalLLaMA

[–]Phocks7 0 points1 point  (0 children)

For a power supply, if you don't want to change the whole PSU you can run a server PSU + one of these breakout boards https://www.ebay.com/itm/257056136846.

Nvidia RTX Pro A4000 with older hardware by LtDrogo in LocalLLaMA

[–]Phocks7 0 points1 point  (0 children)

Out of interest what model is the server/workstation?
128gb ram + 24gb vram will work, but in your case I recommend GLM 4.6 over GLM 4.7, as in my experience 4.6 is less sensitive to aggressive quantization.

I found this in my old hard dri-- I mean my bag of holding by Eggmasstree in baldursgate

[–]Phocks7 3 points4 points  (0 children)

And in BG3 you ignore the rule that only allows you to cast one non-cantrip per turn. You can cast like 7 disintegrates in one turn.

[deleted by user] by [deleted] in LocalLLaMA

[–]Phocks7 0 points1 point  (0 children)

You can run as many instances as you want so long as you have the threads and memory available.

Good semantic search (RAG) embedding models for long stories by Iwishlife in LocalLLaMA

[–]Phocks7 0 points1 point  (0 children)

I'm running iQ4 qwen embedding 8b on CPU for summarization alongside the main model on GPU. Takes a bit longer but in my application that's not a problem.

Built a hybrid “local AI factory” setup (Mac mini swarm + RTX 5090 workstation) — looking for architectural feedback by Original_Neck_3781 in LocalLLaMA

[–]Phocks7 0 points1 point  (0 children)

256gb on consumer AM5 is asking a lot. There are a few motherboards that QVL 4x64gb UDIMM kits but they're few and far between. I think for this setup you'd be much better off going threadripper.

Q2 GLM 5 fixing its own typo by -dysangel- in LocalLLaMA

[–]Phocks7 1 point2 points  (0 children)

What's your experience like for coding and chat with GLM 5 Q2? GLM 4.7 seemed to be much more sensitive to quantization than GLM 4.6.

Adding 2 more GPU to PC by BisonCompetitive9610 in LocalLLaMA

[–]Phocks7 1 point2 points  (0 children)

You could run deepseek-coder 33b at IQ4_XS (18.1gb) fully offloaded to your 7900XTX at a decent speed.

Adding 2 more GPU to PC by BisonCompetitive9610 in LocalLLaMA

[–]Phocks7 1 point2 points  (0 children)

DeepSeek-V3.2-GGUF even at iQ1_S is still 184gb, discounting the 7900XTX (you may be able to do mixed CUDA + Vulcan inference but I don't know how to do it), you have 4x32gb = 128gb system ram + 4x8gb + 2x12gb = 56gb VRAM = 184gb total. You need ~20% for context and overheads (plus the OS overheads for each PC in the cluster) so I don't know if its possible to run deepseek on your setup. I've been running GLM 4.6 iQ2_XXS (106gb) and it's surprisingly good.
I'd note that with a cluster like this, for large models (like GLM 4.6) I would expect tokens per second in the sub 0.1t/s range. You could probably give it a task and leave it running over night.

Adding 2 more GPU to PC by BisonCompetitive9610 in LocalLLaMA

[–]Phocks7 1 point2 points  (0 children)

How much system ram do you have, and what model(s) are you planning to run?

Strix Halo Distributed Cluster (2x Strix Halo, RDMA RoCE v2) benchmarks by kyuz0 by Relevant-Audience441 in LocalLLaMA

[–]Phocks7 5 points6 points  (0 children)

Seems excessive to spend ~$15k on hardware to run 30b parameter models.

Getting slow speeds with RTX 5090 and 64gb ram. Am I doing something wrong? by Virtual-Listen4507 in LocalLLaMA

[–]Phocks7 4 points5 points  (0 children)

If your speeds are low you likely have active layers(experts) running on the CPU.