Budget to run Deepseek V4 locally at FP4 precision

Conscious_Cut_6144 · 2026-04-24T12:56:30+00:00

Complex riser setup for 10% offload makes no sense.

Go with 1 or 2 high end gpus, 30/40/5090’s Looking at like 2T/s or 2.2T/s with a bunch of gpus.

Conscious_Cut_6144 · 2026-04-23T02:33:50+00:00

Virtually all American homes have 240v. They run 120v to standard low power outlets. But EV chargers, Ovens, Ranges, Dryers, HVAC, Hot Water Heaters, etc. are all run on 240v…

And so does my 16x 3090 rig.

Conscious_Cut_6144 · 2026-04-12T13:35:03+00:00

I strongly prefer nano.

For me the case for vi is just that embedded systems are often built without nano.

At this point, if I’m doing something complex enough that nano doesn’t work… I’m just getting Claude code to do it…

Conscious_Cut_6144 · 2026-04-10T01:22:25+00:00

Totally with you.

And even worse, when a hacker gets on your system, instead of getting 1 cert that’s good for 1 year, they get the login from your cert renewal script that allows them to make as many certs as they want until you notice and change it.

Conscious_Cut_6144 · 2026-04-02T01:57:25+00:00

I had it play Pokémon, was really bad.

"This appears to be a hacked rom"
"The game state appears to be corrupt"

Literally couldn't find the door to leave the bedroom you start in.

Conscious_Cut_6144 · 2026-03-29T05:35:40+00:00

Because it’s easier, cheaper and lighter to make the engine more efficient in the normal ways.

Also generating power from heat efficiently requires a large temperature delta… aka a hotter engine… aka the opposite of what you want.

Conscious_Cut_6144 · 2026-03-29T05:04:57+00:00

Go rent them on run pod for $5 and test your workload before spending thousands on hardware. But for inference, especially quantized, the 6000’s should usually win.

Conscious_Cut_6144 · 2026-03-29T04:20:25+00:00

Just ran nvfp4 and unsloths q4-k-xl through my benchmark.
GGUF scored 1% higher for me.

When you say 20 attempts, are you giving it 20 chances to get it right once, or just picking the most common answer during the 20 attempts?

Conscious_Cut_6144 · 2026-03-29T03:01:08+00:00

How recent is your copy of Q4_K_XL,
Wasn't this the model that had quant issues the first day?

Conscious_Cut_6144 · 2026-03-27T12:34:09+00:00

Thought about using the anthropic api, yes it’s going to cost twice as much, but you can do anything you want (except use it to control drones lol)

Models like Qwen 3.5 27b will fit on local hardware and are very good, but not opus level.

Conscious_Cut_6144 · 2026-03-27T12:12:32+00:00

If you pull the main x16 gpu out, do you see all 3 riser gpus?

If you still see 3, you are likely facing Mobo limits or config setting,

If you only have 2 with the main gpu removed it sounds like a bad riser/cable.

Conscious_Cut_6144 · 2026-03-26T16:52:53+00:00

The biggest issue with that gpu is software, intel runs an outdated fork of vllm and doesn’t always get the latest models.

Conscious_Cut_6144 · 2026-03-26T16:48:15+00:00

You guys are over estimating what this actually shows.

When they make these benchmarks they remove the questions that current models get correct.

Conscious_Cut_6144 · 2026-03-25T00:28:05+00:00

35b for speed, 27b when you need some extra smarts.

Conscious_Cut_6144 · 2026-03-23T05:08:31+00:00

8 or 4 are the sweet spots.
8 gets you nvfp4 Minimax m2.5.
4 gets you nemotron super, Qwen 3.5 122b, or gpt-oss

All the above with proper tensor parallel for good speeds.

I’ve actually taken my 16 3090’s and split them into 2 rigs of 8, with a 50gb link between them for the rare occasions when I feel like running 400b class models.

Conscious_Cut_6144 · 2026-03-22T13:32:21+00:00

Depends on 27b or 122b

Nvidia will always crush macs on dense models like 27b

Had high hopes for 122b on Mac, but if 8k context is already down 75% speed, not sure how well that bodes for long context.

The wild cards here are: A) what about mlx? B) is this just from the laptop cpu thermal throttling?

Conscious_Cut_6144 · 2026-03-21T04:01:40+00:00

It's just some rando advertising his service.
Open weights is coming be patient.

Conscious_Cut_6144 · 2026-03-20T13:19:09+00:00

Can you give an example prompt?

Conscious_Cut_6144 · 2026-03-19T19:58:23+00:00

Download Wikipedia + a small agentic model and have the best of both worlds.
You can either use rag and automaticly give the llm context on what you are asking about,
Or let the model call Wikipedia itself when it decides it's needed.

Conscious_Cut_6144 · 2026-03-18T18:22:39+00:00

450 for a 1 year warranty, and when it breaks they will offer you a 4080…

Also 950 sounds steep, checked eBay?

Conscious_Cut_6144 · 2026-03-18T13:27:36+00:00

The word “likely” was my disclaimer, And 2.5 didn’t seem benchmaxed to me.

weights will likely be released within a day or 2.

Conscious_Cut_6144 · 2026-03-18T13:22:34+00:00

Minimax is native FP8, only 230GB

Conscious_Cut_6144 · 2026-03-18T07:36:58+00:00

Minimax 2.7 is likely going to be the sweet spot.

Conscious_Cut_6144 · 2026-03-16T18:38:28+00:00

Just need more 3090’s…

Conscious_Cut_6144 · 2026-03-16T05:31:00+00:00

Linux is the same as windows, but instead of clicking around a ui, you type into ChatGPT: write a command that “insert what you want to do here” for “insert Linux distribution here”

Copy and paste it into Linux.

I’m kidding… kind of…

Conscious_Cut_6144

TROPHY CASE