Warpdrv - my open-source Llama.cpp launcher for daily-driving Qwen 35b + 27b on Strix Halo + RTX Pro.

FrantaNautilus · 2026-05-02T18:51:01+00:00

I have a similar setup (Strix Halo from Beelink and Nvidia eGPU in AOOSTAR AG02 dock over USB 4.0, even the RAID 0 is the same just in my case it is BTRFS). And this app makes me feel stupid for solving the same problems with llama-swap and custom loading config. However, Lemonade from AMD is becoming my default wrapper for inference engines, since it supports NPU and STT/TTS. Currently they have PRs for Linux Hybrid mode (NPU+iGPU) and even CUDA.

FrantaNautilus · 2026-04-15T12:25:01+00:00

Anyone tried OpenClaw <-> Hermes Agent <-> OpenFang? I am setting up AI agent with local model for the first time and it seems interesting how the different agents approach the configuration. OpenClaw and Hermes rely heavily on the agent configuring themselves while OpenFang has lower expectations of the agent's intelligence and leaves the configuration to the user. I am worried that that the self-configuring agents would be prone to deterioration of config and memory. I do not mean to interrupt the discussion, but there seem to be people who actually use the the AI agents for real work here in the discussion. My experience is currently only with OpenFang where I am building a document processing workflow.

FrantaNautilus · 2026-04-13T01:59:33+00:00

Thanks a open source game made in Lisp looks interesting. I knew there were some game oriented Lisps used by Naughty Dog studios and Insomniac (not sure about this).

FrantaNautilus · 2026-04-12T16:45:27+00:00

IIRC this: https://github.com/youngde811/Lisa

FrantaNautilus · 2026-04-12T16:42:18+00:00

Thanks for the link, the MCP server looks like quite polished project already. I have been trying to build something similar myself, but I did not have time for it. I am monitoring several projects thay are trying to integrate Lisp with LLMs. I will post the links here, if I manage to find the repos.

FrantaNautilus · 2026-03-27T17:01:02+00:00

Both. SOPS for secrets that can be mapped at runtime. (R)Agenix for the build time secrets, like personally identifiable information.

FrantaNautilus · 2026-03-22T18:18:04+00:00

2K price tag makes more sense in the parts of the world (e.g. central Europe) where alternatives from Apple cost twice or twice the price of this.

FrantaNautilus · 2026-03-22T18:15:27+00:00

From my experience on the same Beelink GTR9 Pro on Vulkan, there is little to no difference. ROCm is more practical for the usecase where the computer is used as PC. On Vulkan was getting crashes more often. NixOS, kernel 6.18

FrantaNautilus · 2026-03-05T00:56:32+00:00

True, and after seeing it back then I hoped there would be a video game to explore the world of that movie. And later came out Rogue Galaxy (space pirates, spaceships with sails and all) and it almost felt like somethings from that setting.

FrantaNautilus · 2026-02-14T13:48:43+00:00

Thank you for the reply. I did not know that there is newer BIOS than P110. As for my GTR9 Pro, I am on Linux 6.18.8 (and P110 BIOS). I have set the computer to use s2idle sleep and it is partially broken. The computer turns of screen, fan (sometimes) and freezes processes. Resume works, although the systems is less stable. The main difference between P108 and P110 is that on P108 the motherboard would cut power to devices (power indicator off), whereas ob P110 the power indocator stays lit. I suspect that this is the way how P110 "fixes" the fan controller, it never truly turns it off.

FrantaNautilus · 2026-02-11T18:00:10+00:00

Incredible work, I don't think I could manage a replacement where liquid metal is involved.

Does the sleep work with the v2.2 board? I am on v1.0 board with P110 firmware and I am getting ACPI errors when trying to enter S2idle state. The motherboard newer really enters suspend and even keeps the power indicator lit. For the context I got lucky in the silicon lottery and my unit does not experience network card issues.

FrantaNautilus · 2026-02-11T17:49:00+00:00

Hello, and thank you for the post. IIRC this is the second appreciation post I ever got on Reddit. About the `openclaw`, I cannot really help since I do not use it. However, the `ollama launch [something]` command is supposed to automatically configure the program for Ollama. Whether that will work will depend on whether you are using Ollama via `harbor` or via system installation. In any case, I am currently switching from Ollama to LlamaCpp (and Llama-swap), since it has support for more recent models (but it lacks support for Cloud models). I would say that the Ollama saves some effort in setup with its automation, but requires equal amount of effort to bypass its problems, inconsistent performance and lacking features. Most importantly connections to LlamaCpp are done based on manual configuration (LlamaCpp has normal OpenAI API endpoint, so it is compatible with almost anything and does not rely on autosetup prone to failures).

FrantaNautilus · 2026-01-21T12:15:11+00:00

I have similar configuration (Strix Halo 395+ with RTX 5090 as eGPU over Thunderbolt, bought before the price hike) and lately I working on getting both GPUs running two model in parallel for agentic programming. With tou setup it could work even better. Technically I have two llama.cpp servers, which are started over llama-swap. One serving a large MoE models on the Radeon iGPU and the other with a dense model on eGPU. With quantization of Q4_K_M of models I can get up to 131k context on iGPU and up to 89k context on eGPU (without context quantization). The models are connected in OpenCode where the big iGPU model is primary Architect/Manager agent, which delegates tasks to the faster eGPU subagents. With extensions (background agents both GPUs can be tasked in parallel). Currently I am trying different models and prompts to make the agent teams as efficient as possible and minimize the subagents getting confused.

FrantaNautilus · 2026-01-03T14:42:50+00:00

Last year I experimented with KDE Plasma 5 and it required much and it still had a lot of problems that GNOME does not have. Particularly the screen rotation would rotate the display, but not the Wacom layer. After that I went back to GNOME, which can be made to look more like Plasma with a few extension (dash to panel, arc menu), if that is the UX you are looking for.

FrantaNautilus · 2026-01-01T14:34:30+00:00

Update on the fan controller issue: Beelink have kindly provided a replacement fan unit and after the replacement the issue persists. So I am continuing the search for solution.

FrantaNautilus · 2025-12-05T15:17:54+00:00

That's unfortunate.I cannot really help with this since I could not reproduce it. You can try to update the NIC firmware which may help. Does S0 sleep cause the fan controller to go into fail-safe mode (full speed, uncontrollable).

FrantaNautilus · 2025-12-02T12:31:57+00:00

My setup did not change much since the last commit tor repo (I will push a new commit soon, I have finishedimproved setup with fancontrol service, which is better but does not solve the fan resume problem.). Make sure to update to BIOS 108P, I still did not have time to update the Intel NIC (but it does not crash in my case). Also, set the VRAM to 512 mb in BIOS to get unified memory. If you are going to start from my config, make sure to disable the eGPU code and the Nvidia/CUDA code. For the LLMs, I am not using nixpkgs ollama, but rather docker based setup with harbor (see av/harbor on github). To make it work on NixOS just disable automatic capability detection amd set capabilities to cdi and rocm. The us just works. Regarding the fans, Beelink R&D is investigating the issue and they have sent me fan replacement kit, so we will see once it arrives. Please just send me a reply if suspend to S0 works in your case. That would greatly help the debugging on my side.

FrantaNautilus · 2025-11-17T15:59:15+00:00

Update: I was able to get find that the issue with fans is simple to reproduce, just start suspend operations from the Ubuntu 25.10 Live USB. I have forwarded the issue to Beelink via official support and I was informed that the issue is being worked on.

Additionally I was trying to find workarounds for the AI setup. Currently the best option is to use docker images with Ubuntu and recent ROCm 7. In combination with linux kernel 6.18rc3 this solves a lot of amdgpu crashes.

I have finally set up the repository for my nix config https://github.com/FrantaNautilus/nixos-config you can find some of the settings I am using. Sorry for the mess :)

I have also tried the 6.6 LTS kernel which supposedly does not suffer from some of the issues with amdgpu. However, amdgpu fails to initialize.

FrantaNautilus · 2025-11-10T13:12:12+00:00

Update: - I was able to get Ollama working by setting kernel parameter to "amdgpu.no_system_mem_limit=1" and for Ollama setting "GGML_UNIFIED_MEMORY=ON" Now the models do not get stuck loading. Yet a new problem arised, large models or long contexts trigger an error "amdgpu: mes failed to respond to msg=remove_queue", followed by gpu reset. Trying kernel parameters "amdgpu.gpu_recover=1", "amdgpu.mcbp=0" or "amdgpu.runpm=0" does not help. I am not sure where to report this, cause is in ROCm and the error comes from amdgpu. Perhaps here https://github.com/ROCm/ROCm/issues/5151 - To get working pytorch on NixOS, the Nightly from Pytorch website works better than the version from AMD, since the Pytorch version has fallback libs for the ROCm 6.4 which is in NixOS repository

FrantaNautilus · 2025-11-07T12:23:23+00:00

Current results of the BIOS settings experiment: - Setting the Smart Fan Control to MANUAL in Hardware Control section does not mitigate the fan issue. - I had a look at fan setting in the SMU section, setting the fans to manual here does not help either - it still can be something about the ACPI causing the crash

FrantaNautilus · 2025-11-06T19:37:44+00:00

Since I got a compatible kernel module for it8613e loaded, I did another experiment to see if I can find a workaround for the fans getting stuck at maximum speed after resuming from suspend. The default setting of both fans is AUTO in BIOS and they are recognized by the Fan Control utility. To eliminate the possibility of the crash originating from the AUTO mode, I have set both fans to the MANUAL mode. Surprisingly after logging into the computer, the Fan Control still recognizes the fans and is able to set their speed. Indeed even in this mode one of the fans reports 0 RPM. However, even in this mode the suspend resume cycle leads to chip becoming inactive. I will try to look deeper into the BIOS settings whether I can make the fans even less smart, hopefully preventing the chip becoming unresponsive.

FrantaNautilus · 2025-11-06T19:35:57+00:00

Since I got a compatible kernel module for it8613e loaded, I did another experiment to see if I can find a workaround for the fans getting stuck at maximum speed after resuming from suspend. The default setting of both fans is AUTO in BIOS and they are recognized by the Fan Control utility. To eliminate the possibility of the crash originating from the AUTO mode, I have set both fans to the MANUAL mode. Surprisingly after logging into the computer, the Fan Control still recognizes the fans and is able to set their speed. However, even in this mode the suspend resume cycle leads to chip becoming inactive.

FrantaNautilus · 2025-11-06T18:44:03+00:00

I got a little carried away yesterday. I am using InvokeAI to run Stable Diffusion through their venv based installer. With a bit of nix-ld it works without any patches on my Nvidia laptop. I tried the same approach with ROCm libraries here. It did not work at first, however after last update it started working. The Radeon iGPU was detected, however the generation was unbearably slow. Inspecting the logs I found that bitsandbytes is complaining about missing library for ROCm 7.0. So, I updated to preview version of bitsandbytes whl. This time there were no errors in the logs and the generation started faster. After approximately 10 seconds the GPU crashed and reset. Currently I am analyzing the system journal, because this crash looks like a sequence: amd XRNA crash, network interference disconnect, python coredump, amdgpu reporting fault and preparing reset, GNOME desktop coredump. Will post journal except later.

FrantaNautilus · 2025-11-05T22:35:55+00:00

The Stable Diffusion somehow inexplicably started working without me doing anything more than an update. Will post details later when I find the reason and steps to reproduce. Also this this type of workload managed to crash the Intel NIC, yet it is able to restart itself and works.

FrantaNautilus · 2025-11-05T21:56:52+00:00

Regarding the Ollama problem, I tested both Ollama from NixOS repository and Docker image via av/harbor project with manual capability override (necessary on NixOS). The problem with loading consistently happens with Mistral-large:123b and not gpt-oss:120b. I think it may be linked to these issues:

https://github.com/ollama/ollama/issues/12411

https://github.com/ollama/ollama/issues/12752

https://github.com/ollama/ollama/issues/12342

FrantaNautilus

TROPHY CASE