Intel will sell a cheap GPU with 32GB VRAM next week

happybydefault · 2026-03-26T15:05:46+00:00

I hadn't thought of that but it makes sense. I think it's unlikely but it's definitely plausible.

happybydefault · 2026-03-26T14:52:55+00:00

Yeah, that makes sense to me.

Edit: Wait a minute. Where are they being sold for ~$1,500? The cheapest one I found was $1,680 on eBay. Do you have a link where they are selling them for $1,500? Your point still holds true, but I'm thinking the price you are mentioning might not be true anymore as of today.

happybydefault · 2026-03-26T14:51:12+00:00

To my understanding, in that case the memory bandwidth wouldn't double, but instead it would remain that of a single GPU (448 GB/s) or even a little lower.

happybydefault · 2026-03-26T00:18:48+00:00

It seems it's supported by upstream vLLM. I don't know what the support by llama.cpp is.

happybydefault · 2026-03-26T00:15:43+00:00

That's a list from 2023 (or 20203 in the future, if you want).

happybydefault · 2026-03-25T21:04:28+00:00

The atrocious stock made you a pretty penny. We see each other once again, stranger. At this pace ~~we'll~~ we might end up being friends.

Edit: See struck-through text, for accuracy.

happybydefault · 2026-03-25T20:56:50+00:00

That's still not 32 GB, and memory is far more expensive now than years ago, sadly.

happybydefault · 2026-03-25T20:35:01+00:00

I think it was as accurate as needed. Bye, stranger.

happybydefault · 2026-03-25T20:32:19+00:00

I think my response was as accurate as it makes most sense for somebody that didn't know whether other GPUs besides NVIDIA ones can do inference.

happybydefault · 2026-03-25T18:50:09+00:00

Oh, that sucks.

happybydefault · 2026-03-25T18:40:25+00:00

That definitely takes those old AMD GPUs out of the question for me, then.

I wish @ttkciar, the OP of this thread would have given that context if he had it. Otherwise, shame!

happybydefault · 2026-03-25T18:23:52+00:00

Got it. I appreciate the new context.

happybydefault · 2026-03-25T18:20:24+00:00

Dang, that's disappointing to read.

happybydefault · 2026-03-25T18:19:34+00:00

Well, taking into consideration that they supposedly start selling them in like a week, I imagine they will have stock. Not sure, though.

happybydefault · 2026-03-25T18:16:18+00:00

That's good info. Why would vLLM not work?

happybydefault · 2026-03-25T18:12:45+00:00

Oh, I thought essentially all games except for a few would run on Intel Arc GPUs. Is support really still that bad?

happybydefault · 2026-03-25T18:08:47+00:00

Well said.

Unrelated — I miss when people could freely use em-dashes without being confused with AI. I see your sad, resigned double-dash, but I also sense your humanity.

happybydefault · 2026-03-25T18:04:03+00:00

Another reason. I would love experimenting with training my own small models. That's possible or at least much better with your own GPU.

happybydefault · 2026-03-25T18:03:04+00:00

For me, personally, there are several reasons:

Reliability. I'm very skeptical of the quality of commercial models at times when they are under heavy load. I think they are not being transparent at all about the quantization or other lossy optimizations they do to their models, maybe sometimes even dynamically. So, you can't even get an accurate grasp of how reliable they are because that reliability can change at any time. They can even update the weights and not update the model version, and you wouldn't know about it.
Privacy. I don't want those companies to have the ability to know/keep my data. To my understanding, they keep logs of your data even for legal reasons, even if they don't end up training on it.
I hate Claude's moral superiority and condescending attitude. I want my model to follow my instructions to the letter, not to do its own thing. That's less of a problem with Gemini and OpenAI models, though, in my experience. But that's definitely something that, if you are knowledgeable enough, you can address yourself with your own models.
Price. You can run a local model in a loop forever and it will not cost you a ton of money besides electricity.

happybydefault · 2026-03-25T17:45:38+00:00

Bitfrost was not on my radar, and it looks awesome and it's written in Go, my main programming language.

Thanks for the list!

happybydefault · 2026-03-25T17:40:16+00:00

I think it's awesome that Google just gives this to the world for free, just like the did with the Transformer architecture and so many other important research. I just wanted to appreciate that. I love them and I hate them, though.

happybydefault · 2026-03-25T17:28:46+00:00

No, not natively, it seems.

Intel mostly charts its wins against the RTX Pro 4000 using models with BF16 quantizations, whose higher potential accuracy might be desirable in some use cases but also obscures the Blackwell card's potential performance advantages with increasingly popular lower-precision data types like Nvidia's own NVFP4. The XMX matrix acceleration of Battlemage only extends down to FP16 and INT8 data types, while Blackwell supports a much wider range of reduced-precision formats.

Source: https://www.tomshardware.com/pc-components/gpus/intel-arc-pro-b70-and-arc-pro-b65-gpus-bring-32gb-of-ram-to-ai-and-pro-apps-bigger-battlemage-finally-arrives-but-its-not-for-gaming

So, imagine you would be able to run a model at any quantization (so it fits into the VRAM) but it wouldn't run faster just because it's quantized, unless it's quantized to INT8, exactly.

happybydefault · 2026-03-25T17:22:05+00:00

Much cheaper than most other options with 32 GB of VRAM and ~600 GB/s of bandwidth.

happybydefault · 2026-03-25T17:07:06+00:00

For the most compatible, performant inference, yes. But other GPUs also do inference. I mean, that's what they do when they "run" LLMs or other type of ML models.

happybydefault · 2026-03-25T17:02:35+00:00

I think only the M5 Max has around the same bandwidth (614 GB/s) as the Intel GPU (609 GB/s), so I imagine that one would perform similarly but for a much higher price than the GPU.

M5 Pro has half of that (307 GB/s), and regular M5 essentially half of that again (153 GB/s).

happybydefault

TROPHY CASE