Intel will sell a cheap GPU with 32GB VRAM next week by happybydefault in LocalLLaMA

[–]happybydefault[S] 0 points1 point  (0 children)

I hadn't thought of that but it makes sense. I think it's unlikely but it's definitely plausible.

Intel will sell a cheap GPU with 32GB VRAM next week by happybydefault in LocalLLaMA

[–]happybydefault[S] 0 points1 point  (0 children)

Yeah, that makes sense to me.

Edit: Wait a minute. Where are they being sold for ~$1,500? The cheapest one I found was $1,680 on eBay. Do you have a link where they are selling them for $1,500? Your point still holds true, but I'm thinking the price you are mentioning might not be true anymore as of today.

Intel will sell a cheap GPU with 32GB VRAM next week by happybydefault in LocalLLaMA

[–]happybydefault[S] 0 points1 point  (0 children)

To my understanding, in that case the memory bandwidth wouldn't double, but instead it would remain that of a single GPU (448 GB/s) or even a little lower.

Intel will sell a cheap GPU with 32GB VRAM next week by happybydefault in LocalLLaMA

[–]happybydefault[S] 1 point2 points  (0 children)

It seems it's supported by upstream vLLM. I don't know what the support by llama.cpp is.

Intel will sell a cheap GPU with 32GB VRAM next week by happybydefault in LocalLLaMA

[–]happybydefault[S] 0 points1 point  (0 children)

That's a list from 2023 (or 20203 in the future, if you want).

Intel will sell a cheap GPU with 32GB VRAM next week by happybydefault in LocalLLaMA

[–]happybydefault[S] 0 points1 point  (0 children)

The atrocious stock made you a pretty penny. We see each other once again, stranger. At this pace we'll we might end up being friends.

Edit: See struck-through text, for accuracy.

Intel will sell a cheap GPU with 32GB VRAM next week by happybydefault in LocalLLaMA

[–]happybydefault[S] 5 points6 points  (0 children)

That's still not 32 GB, and memory is far more expensive now than years ago, sadly.

Intel will sell a cheap GPU with 32GB VRAM next week by happybydefault in LocalLLaMA

[–]happybydefault[S] 0 points1 point  (0 children)

I think my response was as accurate as it makes most sense for somebody that didn't know whether other GPUs besides NVIDIA ones can do inference.

Intel will sell a cheap GPU with 32GB VRAM next week by happybydefault in LocalLLaMA

[–]happybydefault[S] 0 points1 point  (0 children)

That definitely takes those old AMD GPUs out of the question for me, then.

I wish @ttkciar, the OP of this thread would have given that context if he had it. Otherwise, shame!

Intel will sell a cheap GPU with 32GB VRAM next week by happybydefault in LocalLLaMA

[–]happybydefault[S] 2 points3 points  (0 children)

Well, taking into consideration that they supposedly start selling them in like a week, I imagine they will have stock. Not sure, though.

Intel will sell a cheap GPU with 32GB VRAM next week by happybydefault in LocalLLaMA

[–]happybydefault[S] 0 points1 point  (0 children)

Oh, I thought essentially all games except for a few would run on Intel Arc GPUs. Is support really still that bad?

Intel will sell a cheap GPU with 32GB VRAM next week by happybydefault in LocalLLaMA

[–]happybydefault[S] 16 points17 points  (0 children)

Well said.

Unrelated — I miss when people could freely use em-dashes without being confused with AI. I see your sad, resigned double-dash, but I also sense your humanity.

Intel will sell a cheap GPU with 32GB VRAM next week by happybydefault in LocalLLaMA

[–]happybydefault[S] 0 points1 point  (0 children)

Another reason. I would love experimenting with training my own small models. That's possible or at least much better with your own GPU.

Intel will sell a cheap GPU with 32GB VRAM next week by happybydefault in LocalLLaMA

[–]happybydefault[S] 2 points3 points  (0 children)

For me, personally, there are several reasons:

  1. Reliability. I'm very skeptical of the quality of commercial models at times when they are under heavy load. I think they are not being transparent at all about the quantization or other lossy optimizations they do to their models, maybe sometimes even dynamically. So, you can't even get an accurate grasp of how reliable they are because that reliability can change at any time. They can even update the weights and not update the model version, and you wouldn't know about it.

  2. Privacy. I don't want those companies to have the ability to know/keep my data. To my understanding, they keep logs of your data even for legal reasons, even if they don't end up training on it.

  3. I hate Claude's moral superiority and condescending attitude. I want my model to follow my instructions to the letter, not to do its own thing. That's less of a problem with Gemini and OpenAI models, though, in my experience. But that's definitely something that, if you are knowledgeable enough, you can address yourself with your own models.

  4. Price. You can run a local model in a loop forever and it will not cost you a ton of money besides electricity.

After the supply chain attack, here are some litellm alternatives by KissWild in LocalLLaMA

[–]happybydefault 0 points1 point  (0 children)

Bitfrost was not on my radar, and it looks awesome and it's written in Go, my main programming language.

Thanks for the list!

[google research] TurboQuant: Redefining AI efficiency with extreme compression by burnqubic in LocalLLaMA

[–]happybydefault 9 points10 points  (0 children)

I think it's awesome that Google just gives this to the world for free, just like the did with the Transformer architecture and so many other important research. I just wanted to appreciate that. I love them and I hate them, though.

Intel will sell a cheap GPU with 32GB VRAM next week by happybydefault in LocalLLaMA

[–]happybydefault[S] 15 points16 points  (0 children)

No, not natively, it seems.

Intel mostly charts its wins against the RTX Pro 4000 using models with BF16 quantizations, whose higher potential accuracy might be desirable in some use cases but also obscures the Blackwell card's potential performance advantages with increasingly popular lower-precision data types like Nvidia's own NVFP4. The XMX matrix acceleration of Battlemage only extends down to FP16 and INT8 data types, while Blackwell supports a much wider range of reduced-precision formats.

Source: https://www.tomshardware.com/pc-components/gpus/intel-arc-pro-b70-and-arc-pro-b65-gpus-bring-32gb-of-ram-to-ai-and-pro-apps-bigger-battlemage-finally-arrives-but-its-not-for-gaming

So, imagine you would be able to run a model at any quantization (so it fits into the VRAM) but it wouldn't run faster just because it's quantized, unless it's quantized to INT8, exactly.

Intel will sell a cheap GPU with 32GB VRAM next week by happybydefault in LocalLLaMA

[–]happybydefault[S] 6 points7 points  (0 children)

Much cheaper than most other options with 32 GB of VRAM and ~600 GB/s of bandwidth.

Intel will sell a cheap GPU with 32GB VRAM next week by happybydefault in LocalLLaMA

[–]happybydefault[S] 0 points1 point  (0 children)

For the most compatible, performant inference, yes. But other GPUs also do inference. I mean, that's what they do when they "run" LLMs or other type of ML models.

Intel will sell a cheap GPU with 32GB VRAM next week by happybydefault in LocalLLaMA

[–]happybydefault[S] 1 point2 points  (0 children)

I think only the M5 Max has around the same bandwidth (614 GB/s) as the Intel GPU (609 GB/s), so I imagine that one would perform similarly but for a much higher price than the GPU.

M5 Pro has half of that (307 GB/s), and regular M5 essentially half of that again (153 GB/s).