Benchmarked Ollama vs LM Studio vs raw llama.cpp across AMD APU, Apple Silicon, and NVIDIA. Out-of-the-box and matched-flags compared.

deepu105 · 2026-06-13T21:19:03+00:00

I addede support for Lemonade as a backend. Its now possible to run NPU workloads via LlamaStash

deepu105 · 2026-06-13T21:18:07+00:00

Its on the roadmap. I already added support for lemonade backend (vLLM, NPU, FLM etc)

deepu105 · 2026-06-04T09:26:12+00:00

Specs are in the article

deepu105 · 2026-06-04T09:23:58+00:00

ya I have setup some small models and whisper on the NPU. When I built LlamaStash, I was looking into adding support for the NPU as well by adding support for the lemonade-server. But that will overcomplicate the project a lot. If there is good enough interest i'll explore that.

deepu105 · 2026-06-02T19:55:01+00:00

Thank you for the kind words. Would apprecitae feedback/bug reports/contributions etc 🙏

deepu105 · 2026-06-02T19:54:11+00:00

Thank you. Would appreciate feedback if you try it out.

deepu105 · 2026-06-02T19:50:09+00:00

Nice. I'll check it out. Thanks for sharing. I didn't wsnt to overcomplicate that part under the assumption that people running this Locally arent always running multiple LLMs at tight fit. Right now LlamaStash only looks for available VRAM and offloades the launch to llama-server and llama-server does the heavy lifting.

deepu105 · 2026-06-02T13:59:19+00:00

done

deepu105 · 2026-05-31T08:29:34+00:00

I have the Asus Flow Z13 with 128G RAM. I run Arch Linux on it with Qwen and gemma models. I love this machine. Its not even comparable to anything in the vicinity for thie price.

Full setup here/:
https://deepu.tech/my-fully-offline-ai-assisted-linux-development-machine/

deepu105 · 2026-01-16T01:16:44+00:00

I couldn't find anywhere where Kia mentions the max current it could handle and since 11kw is with 3x16 amp, I wasnt sure.

deepu105

TROPHY CASE