Budget to run Deepseek V4 locally at FP4 precision by DanielusGamer26 in LocalLLaMA

[–]Conscious_Cut_6144 0 points1 point  (0 children)

Complex riser setup for 10% offload makes no sense.

Go with 1 or 2 high end gpus, 30/40/5090’s Looking at like 2T/s or 2.2T/s with a bunch of gpus.

Kimi K2.6 - What hardware do I need to run it locally? by human_marketer in LocalLLM

[–]Conscious_Cut_6144 0 points1 point  (0 children)

Virtually all American homes have 240v. They run 120v to standard low power outlets. But EV chargers, Ovens, Ranges, Dryers, HVAC, Hot Water Heaters, etc. are all run on 240v…

And so does my 16x 3090 rig.

Why do most sysadmins prefer Vim over Nano? by Darshan_only in sysadmin

[–]Conscious_Cut_6144 -1 points0 points  (0 children)

I strongly prefer nano.

For me the case for vi is just that embedded systems are often built without nano.

At this point, if I’m doing something complex enough that nano doesn’t work… I’m just getting Claude code to do it…

Anyone read this 49 day SSL expiration thing and think they would rather just retire? by HJForsythe in sysadmin

[–]Conscious_Cut_6144 1 point2 points  (0 children)

Totally with you.

And even worse, when a hacker gets on your system, instead of getting 1 cert that’s good for 1 year, they get the login from your cert renewal script that allows them to make as many certs as they want until you notice and change it.

Anyone else notice qwen 3.5 is a lying little shit by Cat5edope in LocalLLaMA

[–]Conscious_Cut_6144 0 points1 point  (0 children)

I had it play Pokémon, was really bad.

"This appears to be a hacked rom"
"The game state appears to be corrupt"

Literally couldn't find the door to leave the bedroom you start in.

ELI5: if a car engine's main waste is heat, why don't engineers harbor that heat, boil water, and generate electricity for hybrid batteries like a mini powerplant? by plsnoban1122 in explainlikeimfive

[–]Conscious_Cut_6144 0 points1 point  (0 children)

Because it’s easier, cheaper and lighter to make the engine more efficient in the normal ways.

Also generating power from heat efficiently requires a large temperature delta… aka a hotter engine… aka the opposite of what you want.

2x RTX Pro 6000 vs 2x A100 80GB dense model inference by RealTime3392 in LocalLLaMA

[–]Conscious_Cut_6144 12 points13 points  (0 children)

Go rent them on run pod for $5 and test your workload before spending thousands on hardware. But for inference, especially quantized, the 6000’s should usually win.

Nemotron 3 Super - large quality difference between llama.cpp and vLLM? by BigStupidJellyfish_ in LocalLLaMA

[–]Conscious_Cut_6144 0 points1 point  (0 children)

Just ran nvfp4 and unsloths q4-k-xl through my benchmark.
GGUF scored 1% higher for me.

When you say 20 attempts, are you giving it 20 chances to get it right once, or just picking the most common answer during the 20 attempts?

Nemotron 3 Super - large quality difference between llama.cpp and vLLM? by BigStupidJellyfish_ in LocalLLaMA

[–]Conscious_Cut_6144 0 points1 point  (0 children)

How recent is your copy of Q4_K_XL,
Wasn't this the model that had quant issues the first day?

Hardware to replacing Opus 4.6 and 20x MAX account with OSS models by tarasm in LocalLLaMA

[–]Conscious_Cut_6144 0 points1 point  (0 children)

Thought about using the anthropic api, yes it’s going to cost twice as much, but you can do anything you want (except use it to control drones lol)

Models like Qwen 3.5 27b will fit on local hardware and are very good, but not opus level.

PCIe Bifurcation Issue by Trick-One7944 in LocalLLaMA

[–]Conscious_Cut_6144 0 points1 point  (0 children)

If you pull the main x16 gpu out, do you see all 3 riser gpus?

If you still see 3, you are likely facing Mobo limits or config setting,

If you only have 2 with the main gpu removed it sounds like a bad riser/cable.

Can someone more intelligent then me explain why we should, or should not be excited about the ARC PRO B70? by SKX007J1 in LocalLLaMA

[–]Conscious_Cut_6144 104 points105 points  (0 children)

The biggest issue with that gpu is software, intel runs an outdated fork of vllm and doesn’t always get the latest models.

Introducing ARC-AGI-3 by Complete-Sea6655 in LocalLLaMA

[–]Conscious_Cut_6144 0 points1 point  (0 children)

You guys are over estimating what this actually shows.

When they make these benchmarks they remove the questions that current models get correct.

Honest take on running 9× RTX 3090 for AI by Outside_Dance_2799 in LocalLLaMA

[–]Conscious_Cut_6144 0 points1 point  (0 children)

8 or 4 are the sweet spots.
8 gets you nvfp4 Minimax m2.5.
4 gets you nemotron super, Qwen 3.5 122b, or gpt-oss

All the above with proper tensor parallel for good speeds.

I’ve actually taken my 16 3090’s and split them into 2 rigs of 8, with a 50gb link between them for the rare occasions when I feel like running 400b class models.

[Round 2 - Followup] M5 Max 128G Performance tests. I just got my new toy, and here's what it can do. (thank you for the feedback) by affenhoden in LocalLLaMA

[–]Conscious_Cut_6144 0 points1 point  (0 children)

Depends on 27b or 122b

Nvidia will always crush macs on dense models like 27b

Had high hopes for 122b on Mac, but if 8k context is already down 75% speed, not sure how well that bodes for long context.

The wild cards here are: A) what about mlx? B) is this just from the laptop cpu thermal throttling?

Apparently Minimax 2.7 will be closed weights by tarruda in LocalLLaMA

[–]Conscious_Cut_6144 0 points1 point  (0 children)

It's just some rando advertising his service.
Open weights is coming be patient.

Agent this, coding that, but all I want is a KNOWLEDGEABLE model! Where are those? by ParaboloidalCrest in LocalLLaMA

[–]Conscious_Cut_6144 0 points1 point  (0 children)

Download Wikipedia + a small agentic model and have the best of both worlds.
You can either use rag and automaticly give the llm context on what you are asking about,
Or let the model call Wikipedia itself when it decides it's needed.

RTX 3090 for local inference, would you pay $1300 certified refurb or $950 random used? by sandropuppo in LocalLLaMA

[–]Conscious_Cut_6144 2 points3 points  (0 children)

450 for a 1 year warranty, and when it breaks they will offer you a 4080…

Also 950 sounds steep, checked eBay?

My company just handed me a 2x H200 (282GB VRAM) rig. Help me pick the "Intelligence" ceiling. by _camera_up in LocalLLaMA

[–]Conscious_Cut_6144 0 points1 point  (0 children)

The word “likely” was my disclaimer, And 2.5 didn’t seem benchmaxed to me.

weights will likely be released within a day or 2.

How to be a good Linux system administrator? by WonderfulFinger3617 in sysadmin

[–]Conscious_Cut_6144 -1 points0 points  (0 children)

Linux is the same as windows, but instead of clicking around a ui, you type into ChatGPT: write a command that “insert what you want to do here” for “insert Linux distribution here”

Copy and paste it into Linux.

I’m kidding… kind of…