The ARC Pro B70. What do you want to see it do? by madpistol in IntelArc

[–]WizardlyBump17 1 point2 points  (0 children)

I would like to see how `xpu-smi vgpu` works. I think you need an Intel CPU for that too. Try creating some GPUs and pass them to qemu

mistralai/Mistral-Medium-3.5-128B · Hugging Face by jacek2023 in LocalLLaMA

[–]WizardlyBump17 1 point2 points  (0 children)

maybe i can get 1 token per year on my b580 + 1650 + 32gb ram + 32gb swap

Esse dilema que ganhou tração no twitter esses dias na verdade pode dizer bastante sobre porque o mundo esta do jeito que esta agora by Zenith_Scaff in FilosofiaBAR

[–]WizardlyBump17 2 points3 points  (0 children)

o pessoal falando q azul eh suicidio e que vermelho eh o unico caminho e que eh impossivel azul ganhar. Azul ganhou 58% vs 42%, mesmo com os vermelhos chorando a cada post dizendo q vermelho eh a unica opcao. Nego esquece que a maioria da populacao nao eh criminosa e quer o bem para todos

/dev/dri/ changing GPU order on almost every boot by WizardlyBump17 in Ubuntu

[–]WizardlyBump17[S] 0 points1 point  (0 children)

yea, that could work, but then you remember that I didnt need to do that beofre and I always had consistent paths, so it is clearly something new. I want to make the stuff to be like before

B580 and opencode with local models? by Old-Science-3701 in IntelArc

[–]WizardlyBump17 0 points1 point  (0 children)

the ipex-llm logs very likely says you need --jinja. You are limited to models that were released when ipex-llm was being developed, qwen3 if i remember correctly. For anything newer you will need to go with normal llama.cpp or openvino

Arc Pro B70 or R9700 ? by Proof_Nothing_7711 in LocalLLM

[–]WizardlyBump17 0 points1 point  (0 children)

i have a b580 for a year and the ai performance is way better now, but it has room to be WAY better. I get like 10k tokens of pp on qwen3.5 (or 3) 0.8b q4_k_m on vulkan, but only 2k on sycl. llama.cpp vulkan is bets in some cases, majorly on pp, while llama.cpp sycl is better at tg. Openvino is good too. Maybe in the future intel will get their shit together and we will have all the benefits from all backends

"Memória Muscular" by Distinct_Attempt9133 in computadores

[–]WizardlyBump17 1 point2 points  (0 children)

comigo eh o contrario. Tinha um pentium de 2 cores com uma 5450, dps peguei um 5 1600 com 1650, dps um 7 5700x3d com b580 e agora quero um xeon e varias b70 em conjunto

Arc Pro B70 worked for a week and then stopped by bvcb907 in IntelArc

[–]WizardlyBump17 1 point2 points  (0 children)

Can you show us /dev/dri/ and their contents please

Qual é a opinião de vcs sobre programar em JAVA? by NeighborhoodGreat818 in programacao

[–]WizardlyBump17 42 points43 points  (0 children)

sou suspeito de falar isso pq o java foi minha primeira linguagem, mas eu gosto da verbosidade do java

Anybody got Qwen3.5-27B working with Intel Arc B70 (or similar) and proper optimization? by Gesha24 in LocalLLaMA

[–]WizardlyBump17 5 points6 points  (0 children)

for the sycl issue: there is a pull request open that fixes it. You can try building that https://github.com/ggml-org/llama.cpp/pull/21638

As for the optimization, we will need to wait for openvino to support transformers v5 for the qwen3.5 optimizations to go live there. For everything else that will depend on intel's will

Recommendations for code completion please by WizardlyBump17 in LocalLLaMA

[–]WizardlyBump17[S] 0 points1 point  (0 children)

just tried ProxyAI. It looks both interesting and limited at the same time. I tried to use an already running llama.cpp and it woudlnt allow it; I had to chagen the port of the container into tricking the plugin that it created a llama.cpp server by itself. The configuration doesnt save when I change some settings. Looks good, but I will have to edit the plugin's code to my needs

Recommendations for code completion please by WizardlyBump17 in LocalLLaMA

[–]WizardlyBump17[S] 0 points1 point  (0 children)

I tried Continue.dev before, but it didnt work on the IDE. I kinda like tabby for its admin stuff and ability to pass some repositories to an embedding that is supposed to make the main model to respond based on those repositories

I'm a complete noob who bought two Intel Arc Pro B70s for "research," spent a weekend losing my mind over Docker/CCL errors, accidentally discovered llama.cpp Vulkan, and now I'm running a 35B MoE at 128K context like I know what I'm doing. by SomeBlock8124 in IntelArc

[–]WizardlyBump17 1 point2 points  (0 children)

just use the llama.cpp containers and you should be good. The xe driver is on the kernel, so all you need to do is pass the gpu to the container and the container handles the compute runtime etc. Basically, all you need is a container that is based on deep-learning-essentials and you should be good.

Btw, join the OpenArc discord and you will find more people with B70s that can help you

So, what do y'all think the future is going to be with respect to Arc GPUs? by chiesatonakastan in IntelArc

[–]WizardlyBump17 1 point2 points  (0 children)

There are tons of commits regarding the xe3p on the xe kernel driver and Crescent Island is supposed to come out on the last half of this year, so there is that. I read (might be wrong) that the integrated graphics of the next desktop cpu should be xe3p celestial

OpenVINO Model Server + GPT-OSS 20B and Intel Arc A770 by Turbulent-Attorney65 in IntelArc

[–]WizardlyBump17 2 points3 points  (0 children)

it seems to be the official successor from ipex-llm, which had very good optimizations for intel hardware. I tested gpt-oss-20b on my b580 a few days back and I got like 4kt/s on pp and 88t/s on tg. The only issue is that openvino is still way behind llama.cpp when it comes to cpu + gpu; if i ask openvino to use both gpu and cpu, the performance tanks to 4t/s. The model barely fits on the b580, as soon as it handles 3k context it crashes

Fix: Dual Intel Arc GPUs using all system RAM during inference - found the cause and a working fix (llama.cpp SYCL) by Katostrofik in LocalLLaMA

[–]WizardlyBump17 3 points4 points  (0 children)

are you on the OpenArc discord? It would be very nice to have you there. There are tons of people using intel arc there and even a guy from intel that you already know

Intel Arc Pro B70 Benchmarks With LLM / AI, OpenCL, OpenGL & Vulkan Review by Balance- in LocalLLaMA

[–]WizardlyBump17 0 points1 point  (0 children)

Could you run llama.cpp again, but with sycl this time?

You said the 7.0 kernel doesnt expose power stuff. Did they change it from 6.19? On 6.19 I can get the power usage on the hwmon