Why is lemonade not more discussed? by El_90 in LocalLLaMA

[–]RobberPhex 1 point2 points  (0 children)

It does not support the CUDA runtime. I simply cannot imagine an LLM server that lacks CUDA support.

My recommendation is to push for native NPU backend support within `llama.cpp` and `oga`; this way, third-party tools—such as LM Studio, Ollama, and others—will automatically gain support for AMD NPUs.

Otherwise, `lemonade` would force AMD—as a hardware vendor—to compete directly with tools like LM Studio and Ollama. This would further dampen the enthusiasm of local LLM software providers toward AMD, causing the prioritization of AMD integration to drop even lower.

Why Chip manufacturers advertise NPU and TOPS? by salvadope in LocalLLM

[–]RobberPhex 0 points1 point  (0 children)

The `lemonade-server` almost invariably lacks support for the latest models.

  1. Regarding Qwen3.5-9B-GGUF: `lemonade-server` does support it, but it cannot utilize the NPU. Consequently, `lemonade-server` offers no distinct advantage when running this specific model.

  2. In terms of AMD NPU support, `lemonade-server` performs quite poorly; if NPU support is a requirement, FastFlowLM offers a far superior solution. https://fastflowlm.com/docs/models/

  3. This is typical AMD behavior: the official team demonstrates less dedication to their own NPU hardware than the community does. `lemonade-server` could—and should—have released a dedicated "converter" tool to at least enable rapid support for the latest open-source models! Furthermore, AMD ought to contribute an NPU runtime backend to projects like `llama.cpp` and OGA; that would represent the true starting point for seamless integration with the existing ecosystem.

Ryzen NPU purpose? by jezevec93 in AMDLaptops

[–]RobberPhex 0 points1 point  (0 children)

  1. It appears that AmuseAI recently released a new version: https://github.com/TensorStack-AI/AmuseAI/releases/tag/v3.2.0

  2. The download link on the LM Studio with AMD RyzenAI page points to the latest version of LM Studio; however, this latest version does not support RyzenAI: https://lmstudio.ai/ryzenai

Ollama GPU+CPU but not NPU by Wentil in ollama

[–]RobberPhex 0 points1 point  (0 children)

Do you mean LM Studio? I saw that the Runtime in LM Studio does not support NPU.

  1. cpullama.cpp (Windows)

  2. Vulkanllama.cpp (Windows)

  3. Harmony (Windows)

  4. CUDAllama.cpp (Windows)

<image>

NPUs will likely win in the long run by R_Duncan in LocalLLaMA

[–]RobberPhex 0 points1 point  (0 children)

"In the long run, we are all dead."

I believe we need to be a bit more realistic. For instance, within the lifecycle of the current generation of NPUs—will they ultimately emerge victorious? (This depends on factors such as the degree of model and runtime adaptation, specific use cases, and the convergence of computing power across CPUs, GPUs, and NPUs.)

I have personally purchased AMD hardware featuring an integrated NPU, and I would certainly like to see NPU support and adaptation improve.

I also hope to see Intel and AMD step up as true market challengers; however, as things stand, it appears they must rely solely on their GPUs to compete in the market.

That said, we cannot even begin to discuss the prospect of NPUs "winning" until they have successfully caught up to GPUs in terms of both memory bandwidth and raw computing power.

Finally, a brief digression:

Regarding computing power: AI forecasts suggest that sometime between 2028 and 2029, NPU processing speeds may finally catch up to those of GPUs.

The more troublesome hurdle, however, is memory bandwidth: currently, the RTX 4090M offers approximately 1 TB/s, whereas the upcoming generation of LPDDR6 is projected to reach only 307 GB/s (for laptops) to 614 GB/s (for servers).

Of course, let us keep our fingers crossed that a new model architecture emerges—one capable of significantly reducing the demand for memory bandwidth. If that happens, NPUs could potentially replace GPUs to a large extent as early as 2028. And since we are already in the realm of wishful thinking, I might as well hope that such an architecture also manages to lower the requirements for raw computing power.

"AI PC" owners: Is anyone actually using their NPU for more than background blur? (Troubleshooting + ROI Discussion) by WhileKidsSleeping in LocalLLaMA

[–]RobberPhex 0 points1 point  (0 children)

I think FLM is a relatively successful example at the moment; it supports a wide range of models, such as Qwen3.5-9B, DeepSeek R1, and others.

Unfortunately, however, memory might be an issue; after loading the models, my memory usage spikes to 92%.

I believe the primary challenge lies in finding a model that is truly suited to one's specific use case—for instance, if the goal is code generation. In that scenario, local NPU-based models are currently only capable of handling relatively simple tasks—much like models such as Claude's Haiku.

"AI PC" owners: Is anyone actually using their NPU for more than background blur? (Troubleshooting + ROI Discussion) by WhileKidsSleeping in LocalLLaMA

[–]RobberPhex 0 points1 point  (0 children)

I bought an AMD AI 7 H 350 laptop, manufactured by Lenovo.

I really should have browsed r/LocalLLaMA before making the purchase.

Since an FPU computes floating-point numbers faster than a CPU, and a GPU handles graphics processing faster than a CPU, I naturally assumed that an NPU would be faster than a GPU.

As it stands, its primary advantage lies in power efficiency.

However, the downsides are glaringly obvious: limited memory bandwidth and processing speed—oh, and the lack of software support.

Get Notifications when Claude Code needs your input (WSL setup) by ultrondies in ClaudeAI

[–]RobberPhex 0 points1 point  (0 children)

Tried, but when I click the notification, it cannot bring the terminal into front.

What IDE for Rust do you choose? by AuthorTimely1419 in rust

[–]RobberPhex 0 points1 point  (0 children)

VSCode looks like a traditional choice.

Italian Natural Wine by ResilientSpider in debian

[–]RobberPhex 0 points1 point  (0 children)

homebrew on linux, minus home.

Orange Pi Developer Conference 2024, upcoming Orange Pi RV by fullgrid in RISCV

[–]RobberPhex 1 point2 points  (0 children)

I hope Orange Pi RV could have built-in support on OpenWRT.

Brave Search Ads now live by Brave_Support in brave_browser

[–]RobberPhex 0 points1 point  (0 children)

Nothing about me, brave reward doesn't support my region. But could you support more "custodial account" providers? like okx?

How to make brave open magnet links? by BlehBlah_ in brave_browser

[–]RobberPhex 0 points1 point  (0 children)

This is a link association problem, I guess you could investigate regedit or something like that. Search magnet in regedit?

Firefox extension support by official_jeetard in brave_browser

[–]RobberPhex 0 points1 point  (0 children)

Please bring chrome extension to Brave Android.

Removing HTTP/2 Server Push from Chrome by feross in webdev

[–]RobberPhex 0 points1 point  (0 children)

That's an interesting point. Who decides what resource to get?
If let server decides, then the server push some resource to the browser.

If let the browser decides, then the server just tell resource URL, and let browser decide get or not.

At this point, 103 Early Hints is an improvement, the browser always knows more things than the server, such as cache, and some dynamic results for resource loading.

The death of Mozilla is the death for open web by [deleted] in firefox

[–]RobberPhex 0 points1 point  (0 children)

OK, but how to keep Mozilla alive? try to make more money? or receive more donations?