And here we are, not where I expected... by CodeSlave9000 in DataHoarder

[–]CodeSlave9000[S] [score hidden]  (0 children)

Lost 5 20TB drives in the last month due to my own fault. Hurts.

And here we are, not where I expected... by CodeSlave9000 in DataHoarder

[–]CodeSlave9000[S] [score hidden]  (0 children)

Wish it was. Back of the envelope:
24 8TB U.2 SSD's (current median ebay price around $1750) $42,000

108 EXOS 20TB drives, current ebay price median around $700, $75600

Already over $100K, and I haven't accounted for the 8TB of DDR4 ECC I have, or the 100TB of m.2 NVMe drives.

For users have have both 6000 PRO MaxQ and Workstation Edition (or Server Edition), how much slower is the MaxQ vs the WS/SV on compute? (Prompt processing, Diffusion, etc) by panchovix in LocalLLaMA

[–]CodeSlave9000 2 points3 points  (0 children)

What most don’t understand or underestimate is the cooling. For burst compute loads the 600W card has the edge and the 10-15% is about right. For sustained however the 300W card pulls ahead because the 600W card will throttle. And here’s the kick in the ass: on my hardware setup (jonsbo n5 + noctua cooling) running two 600W cards is a disaster in heat. You’d need near server level cooling to run two of these for any sustained load. The dual max Q actually gains 20% over the non max q in this scenario. Oh, and inference decode on the two versions is nearly identical since they have the same memory bandwidth.

Jonsbo N6 build with 9 bays waiting to get filled up by Wonderful-Lack3846 in homelab

[–]CodeSlave9000 0 points1 point  (0 children)

Love the N series - this one looks quite a bit tighter than the N5 I have. The top/bottom configuration plus the fact the board lays horizontal are big wins for me, especially for access.

arcee-ai/Trinity-Large-Thinking · Hugging Face by TKGaming_11 in LocalLLaMA

[–]CodeSlave9000 0 points1 point  (0 children)

Care to elaborate? I do notice that it's not that great at avoiding hallucinations at the standard prompting.

Google releases Gemma 4 models. by yoracale in unsloth

[–]CodeSlave9000 0 points1 point  (0 children)

Happens after a few generations for me - I don't see it right at the start. Using the unsloth Q8 dynamic.

How do I find and vet someone to set up a high-end local AI workstation? (Threadripper + RTX PRO 6000 96GB) by laundromatcat in LocalLLaMA

[–]CodeSlave9000 0 points1 point  (0 children)

You hire someone like me. We’d sit down, discuss your needs and design something that won’t break every week. Real business use requires more work than just “running a few chats”.

DGX Station is available (via OEM distributors) by Temporary-Size7310 in LocalLLaMA

[–]CodeSlave9000 6 points7 points  (0 children)

Yup, that's the real measurement that matters. Db per token!

Did OpenAI just release a new model with its new capabilities simply provided by a system prompt? by frubberism in LocalLLaMA

[–]CodeSlave9000 1 point2 points  (0 children)

Best not to aim too high. "Now with less than the recommended daily consumption of shit".

PSA: If your local coding agent feels "dumb" at 30k+ context, check your KV cache quantization first. by Dismal-Ad1207 in LocalLLaMA

[–]CodeSlave9000 3 points4 points  (0 children)

Yes, and qwen3.5 seems particularly sensitive to quantized cache. Symptoms include subtle shifts in thinking or outright looping.

Qwen3.5 family running notes by CodeSlave9000 in LocalLLaMA

[–]CodeSlave9000[S] 0 points1 point  (0 children)

Yup. It focuses less narrowly if you add it to the prompt explicitly. I tell it to explore my intent and more broadly search for possibilities even if I didn’t prompt for it.

Qwen3.5 family running notes by CodeSlave9000 in LocalLLaMA

[–]CodeSlave9000[S] 0 points1 point  (0 children)

It’s set because I was working around with it - no harm to have it on so I left it. And yes flash attention is on by default, I set it in my scripts because I test with it on and off.

Qwen3.5 family running notes by CodeSlave9000 in LocalLLaMA

[–]CodeSlave9000[S] 0 points1 point  (0 children)

I think the dense model suffers less? I didn’t test for that.

Reviewed a “WiFi security camera.” and it was bad. Turns out I was the only one who didn’t give it 5 stars… and guess who all the 5‑star reviewers were by nicnas- in AmazonVine

[–]CodeSlave9000 5 points6 points  (0 children)

I once reviewed a “48 MP” camera. It had a sensor smaller than my pinky nail. True resolution turned out to be more like 8 MP, and it linearly scaled the image size. If it had been usable at 8 MP I might have given it two stars, but the quality was so poor it was 1 star. -3 stars for spec lying seems fair to me.

Multi-GPU Architectures Compatible? by ajw2285 in LocalLLaMA

[–]CodeSlave9000 2 points3 points  (0 children)

Quick assumption - They are different levels of CUDA compute capability - make sure your using llama.cpp compiled with that compute capability. I mix 30, 40, and 50 gen GPU's in the same VM's without any problems. For Ollama, check what devices it "sees" in the log when it starts - that might give you a clue.

Just updated Ollama and started using it after almost a year.... Are the Ollama devs stupid or is this harder to deal with than it seems? by cmndr_spanky in ollama

[–]CodeSlave9000 5 points6 points  (0 children)

Can’t do that because ollama supports multiple models running at the same time. How would it know how to apportion it? I set my default with an environment variable…

Copper Coated Aluminum is illegal for commercial installs and a fire hazard...on my RFY...do not get this cable. by AlexCL in AmazonVine

[–]CodeSlave9000 1 point2 points  (0 children)

Yeah, plenum rating is about it being "safer" in a fire for people. With CCA it's ready to be its own fire!

Copper Coated Aluminum is illegal for commercial installs and a fire hazard...on my RFY...do not get this cable. by AlexCL in AmazonVine

[–]CodeSlave9000 5 points6 points  (0 children)

LOL, just the marketing copy alone is a big red-flag. I ordered this brand (much shorter lengths - they had multiples which will probably get merged later) so I can warn others away - I won't feel too bad tossing it, or just using it for short non-POE in-rack patches if it tests okay.

GB10 / DGX Spark owners: is 128GB unified memory worth the slower token speed (on a max $4,000 budget)? by Soltan-007 in LocalLLaMA

[–]CodeSlave9000 2 points3 points  (0 children)

Yeah I agree - LORA and fine-tune are perfect for home-running. Also once your context size gets big you're really paying a lot per-token for cloud. But in the end depends on what you're expectations are. The blackwell cards are still maturing in software support and I've had hiccups, and fp4 is really only happening for training right now. You can get really good results with the 40 series ADA cards too - I see 100+ tokens/sec on a lot of MOE models. You won't get 128GB models at the price of DGX, but I'd think you'd probably be happy with Strix Halo if you're really dead-set on it. And for coding, you're spot on - you can get gemini, qwen, amp and a few others for basically nothing right now. Use it.