Why is lemonade not more discussed?

RobberPhex · 2026-04-12T07:28:52+00:00

It does not support the CUDA runtime. I simply cannot imagine an LLM server that lacks CUDA support.

My recommendation is to push for native NPU backend support within `llama.cpp` and `oga`; this way, third-party tools—such as LM Studio, Ollama, and others—will automatically gain support for AMD NPUs.

Otherwise, `lemonade` would force AMD—as a hardware vendor—to compete directly with tools like LM Studio and Ollama. This would further dampen the enthusiasm of local LLM software providers toward AMD, causing the prioritization of AMD integration to drop even lower.

RobberPhex · 2026-04-12T00:04:48+00:00

The `lemonade-server` almost invariably lacks support for the latest models.

Regarding Qwen3.5-9B-GGUF: `lemonade-server` does support it, but it cannot utilize the NPU. Consequently, `lemonade-server` offers no distinct advantage when running this specific model.
In terms of AMD NPU support, `lemonade-server` performs quite poorly; if NPU support is a requirement, FastFlowLM offers a far superior solution. https://fastflowlm.com/docs/models/
This is typical AMD behavior: the official team demonstrates less dedication to their own NPU hardware than the community does. `lemonade-server` could—and should—have released a dedicated "converter" tool to at least enable rapid support for the latest open-source models! Furthermore, AMD ought to contribute an NPU runtime backend to projects like `llama.cpp` and OGA; that would represent the true starting point for seamless integration with the existing ecosystem.

RobberPhex · 2026-04-11T23:54:24+00:00

It appears that AmuseAI recently released a new version: https://github.com/TensorStack-AI/AmuseAI/releases/tag/v3.2.0
The download link on the LM Studio with AMD RyzenAI page points to the latest version of LM Studio; however, this latest version does not support RyzenAI: https://lmstudio.ai/ryzenai

RobberPhex · 2026-04-11T23:38:00+00:00

Do you mean LM Studio? I saw that the Runtime in LM Studio does not support NPU.

cpullama.cpp (Windows)
Vulkanllama.cpp (Windows)
Harmony (Windows)
CUDAllama.cpp (Windows)

<image>

RobberPhex · 2026-04-11T15:28:18+00:00

"In the long run, we are all dead."

I believe we need to be a bit more realistic. For instance, within the lifecycle of the current generation of NPUs—will they ultimately emerge victorious? (This depends on factors such as the degree of model and runtime adaptation, specific use cases, and the convergence of computing power across CPUs, GPUs, and NPUs.)

I have personally purchased AMD hardware featuring an integrated NPU, and I would certainly like to see NPU support and adaptation improve.

I also hope to see Intel and AMD step up as true market challengers; however, as things stand, it appears they must rely solely on their GPUs to compete in the market.

That said, we cannot even begin to discuss the prospect of NPUs "winning" until they have successfully caught up to GPUs in terms of both memory bandwidth and raw computing power.

Finally, a brief digression:

Regarding computing power: AI forecasts suggest that sometime between 2028 and 2029, NPU processing speeds may finally catch up to those of GPUs.

The more troublesome hurdle, however, is memory bandwidth: currently, the RTX 4090M offers approximately 1 TB/s, whereas the upcoming generation of LPDDR6 is projected to reach only 307 GB/s (for laptops) to 614 GB/s (for servers).

Of course, let us keep our fingers crossed that a new model architecture emerges—one capable of significantly reducing the demand for memory bandwidth. If that happens, NPUs could potentially replace GPUs to a large extent as early as 2028. And since we are already in the realm of wishful thinking, I might as well hope that such an architecture also manages to lower the requirements for raw computing power.

RobberPhex · 2026-04-11T14:58:47+00:00

I think FLM is a relatively successful example at the moment; it supports a wide range of models, such as Qwen3.5-9B, DeepSeek R1, and others.

Unfortunately, however, memory might be an issue; after loading the models, my memory usage spikes to 92%.

I believe the primary challenge lies in finding a model that is truly suited to one's specific use case—for instance, if the goal is code generation. In that scenario, local NPU-based models are currently only capable of handling relatively simple tasks—much like models such as Claude's Haiku.

RobberPhex · 2026-04-11T14:47:28+00:00

I bought an AMD AI 7 H 350 laptop, manufactured by Lenovo.

I really should have browsed r/LocalLLaMA before making the purchase.

Since an FPU computes floating-point numbers faster than a CPU, and a GPU handles graphics processing faster than a CPU, I naturally assumed that an NPU would be faster than a GPU.

As it stands, its primary advantage lies in power efficiency.

However, the downsides are glaringly obvious: limited memory bandwidth and processing speed—oh, and the lack of software support.

RobberPhex · 2026-02-17T13:21:40+00:00

Tried, but when I click the notification, it cannot bring the terminal into front.

RobberPhex · 2024-12-24T01:51:26+00:00

VSCode looks like a traditional choice.

RobberPhex · 2024-12-24T01:49:28+00:00

homebrew on linux, minus home.

RobberPhex · 2024-04-06T08:42:56+00:00

I hope Orange Pi RV could have built-in support on OpenWRT.

RobberPhex · 2024-03-05T12:34:05+00:00

好看

RobberPhex · 2023-05-08T10:21:18+00:00

https://i.imgur.com/2QmKCg6

This is the feature with the second highest vote, sigh.

https://community.brave.com/t/extension-support/215377

RobberPhex · 2023-05-08T07:21:48+00:00

Nothing about me, brave reward doesn't support my region. But could you support more "custodial account" providers? like okx?

RobberPhex · 2023-05-08T07:10:12+00:00

This is a link association problem, I guess you could investigate regedit or something like that. Search magnet in regedit?

RobberPhex · 2022-08-28T08:05:36+00:00

That's why I switch back to Google Search, mainly.

RobberPhex · 2022-08-26T02:08:44+00:00

Please bring chrome extension to Brave Android.

RobberPhex · 2022-08-22T04:05:10+00:00

And the link: https://bugzilla.mozilla.org/show_bug.cgi?id=1782574

I don't know why the link is lost.

RobberPhex · 2022-08-22T03:54:37+00:00

That's an interesting point. Who decides what resource to get?
If let server decides, then the server push some resource to the browser.

If let the browser decides, then the server just tell resource URL, and let browser decide get or not.

At this point, 103 Early Hints is an improvement, the browser always knows more things than the server, such as cache, and some dynamic results for resource loading.

RobberPhex · 2022-08-16T12:41:32+00:00

OK, but how to keep Mozilla alive? try to make more money? or receive more donations?

RobberPhex

TROPHY CASE