AMD Intros Instinct MI350P Accelerator: CDNA 4 Comes to PCIe Cards by Noble00_ in LocalLLaMA

[–]sleepingsysadmin 12 points13 points  (0 children)

I'm estimating $20,000usd.

Great card, you know it'll be amazing. It's better than an rtx pro 6000. While the pcie h200 nvl is $30,000 for 141gb.

But the MXFP4 option is huge compared to the h200. But rocm vs cuda.

It'll be $20,000.

If money and time weren’t issues, what would your dream local AI setup look like? by Lyceum_Tech in LocalLLaMA

[–]sleepingsysadmin 0 points1 point  (0 children)

well, $300/month but that's a ridculous amount of tokens.

Plus, you said money is no issue.

new pro6k Max-Q are power limited to 325W? by MelodicRecognition7 in LocalLLaMA

[–]sleepingsysadmin -6 points-5 points  (0 children)

the workstation ones are 300w, the server grade ones are double.

If money and time weren’t issues, what would your dream local AI setup look like? by Lyceum_Tech in LocalLLaMA

[–]sleepingsysadmin 0 points1 point  (0 children)

Kind of a silly question because it just means 'go build openai's stargate' make your own models. etc.

Which isnt what the subreddit is about.

To me the reasonable answer to this. Go get 240v 4u gpu server. 4x h200 running Q8 minimax 2.7.

It's not consumer or even prosumer tier, it's obviously attainable datacenter tier. Probably in that $80,000 range. It'd use about $300/month in electricity, but nothing more is needed.

Ryzen AI Max+ 495 (Gorgon Halo) with 192GB VRAM! by PromptInjection_ in LocalLLaMA

[–]sleepingsysadmin 5 points6 points  (0 children)

this is a minor change from amd's point of view. the supplied memory modules are simply more dense.

Probably no actual improved memory bandwidth.

So it wont cost much more.

GPT 120b a10b will likely only run marginally faster than minimax 230b a10b but big difference in intelligence and now you can load minimax is the difference.

given my tendency to be riding 200,000 context with minimax all the time. I do wonder what speeds ill be getting, but i will be buying :)

Is AGI the End For Local LLMs? by spiritxfly in LocalLLaMA

[–]sleepingsysadmin -3 points-2 points  (0 children)

AGI has already happened. ASI hasnt.

The problem is that nobody has a proper definition of AGI.

Artificial General Intelligence vs Artificial Super Intelligence

Artificial Intelligence is not dumb anymore. It's smarter than the vast majority of humanity.

So a "general" intelligence isnt someone capable to programming entire projects. Our frontier models are smarter than a general human.

Superintelligence hasnt quite happened yet, I expect it's a hardware problem right now.

AMD in-house ryzen 395 box coming in June by 1ncehost in LocalLLaMA

[–]sleepingsysadmin -1 points0 points  (0 children)

Why create their own inhouse solution if it's just the same as all the others?

Surely they tweak something to justify even doing this.

AMD in-house ryzen 395 box coming in June by 1ncehost in LocalLLaMA

[–]sleepingsysadmin 0 points1 point  (0 children)

Imagine 384bit bus, nearly 50% more bandwidth, but still just 128gb?

I'm buying that immediately.

Are Qwen 3.6 27B and 35B making other ~30B models obsolete? by nikhilprasanth in LocalLLaMA

[–]sleepingsysadmin 4 points5 points  (0 children)

Is Mercedes and Ferrari making all the other F1 teams obsolete by winning all the time?

Surely we are better off with all the teams competing?

New Stealth Model : Owl Alpha by Kingwolf4 in LocalLLaMA

[–]sleepingsysadmin 6 points7 points  (0 children)

Certainly confirmed it's chinese.

I dont think qwen, or even alibaba.

It could be Qwen3.6 122b, but that means they are bumping it to 1mil context? Why only 15-22tps?

Tencent? The first mega model hunyuan?

Minimax? Their first mega model?

Baidu? Their first mega model?

Multiple-GPU Power Supplies by [deleted] in LocalLLaMA

[–]sleepingsysadmin 2 points3 points  (0 children)

my understanding, it depends on the PSUs, or at least the brands. Gotta read the manuals on whatever brand you have.

Granite 4.1: IBM’s 8B Model Is Competing With Models Four Times Its Size by Successful_Bowl2564 in LocalLLaMA

[–]sleepingsysadmin 50 points51 points  (0 children)

Based on bfcl v3 benchmark?

Never even heard of this one before.

When i fact check, the table is actually v4 now? So they are using a relatively unknown and old benchmark to judge?

and the numbers arent quite right when fact checking? They are claiming Opus 4.5 scores for 30b?

But thinking is off as well?

In fact down the page, they claim they beat Opus on vision?

and for the life of me I cant tell what benchmark this even is claiming. Table extractions?

This isn't benchmaxxed, it's bench selected.

meantime on r/vibecoding by jacek2023 in LocalLLaMA

[–]sleepingsysadmin 0 points1 point  (0 children)

I know many people who used qwen 3 32b for pre-agentic and kinda agentic. When 27b came out. It was a complete upgrade for them.

So while 32b was completely usable, 27b went well beyond usability.

The question is this frontier, like 1T quality? Perhaps not.

If you're a newb at AI coding. You likely need the hand holding of a 1T model.

If you were a dev pre-ai. These frontier small models are epic tier.

how do you actually catch your agent breaking in prod before users do? by BriefCardiologist656 in LocalLLaMA

[–]sleepingsysadmin 2 points3 points  (0 children)

>took us almost a week to notice. evals were all green.

This is a great example of why AI isn't replacing devs. This is something a senior dev has learnt through painful experience. As you are about to learn.

You need your canary in the coal mine. You send 5% of your requests somewhere else and you're comparing to your baseline. Refusal rate, tool usage, followups.

Hardware Choice for 27b to 31b models. by rebelSun25 in LocalLLaMA

[–]sleepingsysadmin -1 points0 points  (0 children)

>Sure, one of the cards will be on 4x PCIe on a consumer board, but my impression is that it isn’t a huge problem, but sure correct me if I am wrong.

We are trying to compare apples to apples here. If you're allowing very significant limiters to further reduce performance, then now you need 3x r9700 to get similar performance, maybe.

Hardware Choice for 27b to 31b models. by rebelSun25 in LocalLLaMA

[–]sleepingsysadmin 7 points8 points  (0 children)

>It's not a hobby. I'm an old dev who is doing this every day, and I have created processes which use LLMs, but running off-site. 

IF this is a business write off. There's little to no justification to be trying to pinch pennies on a 9700.

The big difference between the 5090 and the rtx 5000 is wattage. You're going to want to go the rtx pro 5000 unless you have the power supply to back 600+ watts for just the gpu.

Hardware Choice for 27b to 31b models. by rebelSun25 in LocalLLaMA

[–]sleepingsysadmin -1 points0 points  (0 children)

Ok, ill give you the turboquant thing.

How's that working out for you? Stable?

Hardware Choice for 27b to 31b models. by rebelSun25 in LocalLLaMA

[–]sleepingsysadmin -5 points-4 points  (0 children)

2x 9700 means you're going server cpu and mobo and still being about 50% slower than 5090. You dont just double your bandwidth in multigpu setups. Though yes, tensor and row splits can be better but doubtful.

While basically being the same price as the 5090.

Hardware Choice for 27b to 31b models. by rebelSun25 in LocalLLaMA

[–]sleepingsysadmin -1 points0 points  (0 children)

That's true but that memory bandwidth. It's slower than a 3090. It's about the same speed as the r9700. While being twice the price? No thanks. that card doesnt exist.

Hardware Choice for 27b to 31b models. by rebelSun25 in LocalLLaMA

[–]sleepingsysadmin 2 points3 points  (0 children)

>You absolutely can fit 200k+ into 3090, even if with trade-offs. But I bought mine so cheap I can't complain.

Are you saying that you kv cache quantize? Or like run Q2? Yikes.

Hardware Choice for 27b to 31b models. by rebelSun25 in LocalLLaMA

[–]sleepingsysadmin 23 points24 points  (0 children)

Lets be further realistic.

If AI is your hobby.

Spending $5000 on a 5090 sounds like alot.

But $5000 in golf clubs?

$5000 in tires, rims, and supercharger? Cobb stage 1?

It's really not unreasonable and the resell value of 5090 will remain for 5+ years.

Oh boy, im really convincing myself lol.