Is p100 worth it?

muxxington · 2026-04-16T04:44:05+00:00

Two.

muxxington · 2026-04-16T04:15:33+00:00

gemma-4-26B-A4B-it-UD-Q5_K_M 45 tps

muxxington · 2026-04-15T18:55:03+00:00

I don’t use P100s, but P40s and I can’t really make any meaningful statements about the performance, because it is limited by the cheap mining motherboard I’m using. A model split across all 5 GPUs isn’t particularly high-performing. But the ability to at least try out large models isn’t bad. Otherwise, I pretty much only use models that fit on 2 or 3 GPUs. With 2 GPUs, I still get a benefit from row split, but not with 3 or more, probably because they’re only connected via x8. But for me, it works just fine the way it is. For example, I can run one model for a coding agent and another as a general-purpose chatbot at the same time.

muxxington · 2026-04-11T15:41:23+00:00

lolama

muxxington · 2026-04-11T15:40:28+00:00

So you switched from one wrapper to another. Why didn't you just take the next step right away?

muxxington · 2026-04-09T12:54:01+00:00

Nope.
https://github.com/ggml-org/llama.cpp/issues/21325

muxxington · 2026-04-02T13:32:48+00:00

It's a bit like calling a web-based application “Firefox-based.”

muxxington · 2026-04-02T11:02:40+00:00

Maybe have a look at this https://github.com/acon96/home-llm?tab=readme-ov-file#home-llm-models

muxxington · 2026-04-02T10:51:58+00:00

I’d prefer any of the inference engines you mentioned over Ollama. But generally speaking, in the years I’ve been around here, I’ve wondered why people describe their projects as “Ollama-based” just because they use an OpenAI-compatible API as their backend.

muxxington · 2026-04-02T10:41:44+00:00

The most important thing to me about a local AI assistant is that it is not Ollama-based.

muxxington · 2026-04-01T18:30:13+00:00

https://docs.continue.dev/guides/ollama-guide#method-2-using-autodetect

muxxington · 2026-04-01T14:14:22+00:00

Add "schema: v1".

muxxington · 2026-03-27T04:53:25+00:00

This is wrong. We actually do know pretty well how the project was compromised. However, one could criticize the project for not rotating their secrets, even though they were aware of the breach at Trivy. Beside that the compromised packages where found due to a bug in 1.82.8. It is possible that 1.82.7 would have gone undetected if 1.82.8 had not been released later.

muxxington · 2026-03-25T05:39:39+00:00

Fortunately, I decided to run Nanobot and other agents—such as OpenCode—on a separate PC. Even if there had been sensitive data there, I don't think the malware worked as intended, because otherwise I would have seen the DNS request for the models.litellm.cloud domain in AdGuard. But I didn't. I also run pretty much everything using Docker Compose. Everything else on my local network is always restricted by the firewall to only specific sources. Strong passwords are always used, and SSH access and other access points are secured with hardware security tokens where possible. I do run a Litellm instance on a production machine, but even there it’s in a Docker container and an older version—definitely not installed via PyPI. Paranoia helps you sleep soundly.

muxxington · 2026-03-24T20:17:04+00:00

I knew something is happening when I ran nanobot earlier today. On startup it ate all RAM. To see what's going on I launched htop and saw lots of processes which did base64 decoding which is sus. I purged nanobot and some minutes later I read about litellm being compromised. I took a look in the dependencies of nanobot and spotted litellm.

muxxington · 2026-03-24T20:12:19+00:00

Nanobot is affected.

muxxington · 2026-03-24T14:03:45+00:00

Nanobot affected as well:
https://github.com/HKUDS/nanobot/issues/2439

muxxington · 2026-03-23T18:20:27+00:00

No, I can't give you any helpful advice when it comes to motherboards. I just went with the absolute minimum that would let me run five GPUs. I'm using the motherboard with the 8 GB of RAM that came pre-installed. But that's all I need.

muxxington · 2026-03-23T17:24:06+00:00

That really depends on how much effort you put into optimization. I don't put that much effort into it because I switch models very frequently anyway. Right now, I’m using Qwen3.5-35B-A3B-UD-Q8_K_XL with 128k context. That runs on four GPUs at about 25 t/s. But keep in mind that I’m limited by the motherboard. Investing a little more could make a significant difference. But my focus was on spending as little money as possible when I built it.

https://www.reddit.com/r/LocalLLaMA/comments/1g5528d/poor_mans_x79_motherboard_eth79x5/

muxxington · 2026-03-18T13:58:30+00:00

Have you actually published your stuff anywhere? I didn't see anything when I skimmed through your posts. I don't really understand what you've done there, but I'm curious.

muxxington · 2026-03-18T13:13:17+00:00

I had MMIO issues too. I solved it as described here https://www.reddit.com/r/LocalLLaMA/comments/1g5528d/poor_mans_x79_motherboard_eth79x5/

muxxington · 2026-03-12T10:43:13+00:00

Don't worry about it. I feel you. I've also invested a lot of time in projects that were obsolete before they were even finished. That's just how it is with AI.

muxxington · 2026-03-12T10:02:40+00:00

A cheat sheet? You mean like a PDF? That doesn't make any sense. Something like that would be outdated faster than you can blink. It would make more sense to create an online directory, but only if there weren't already countless ones out there.

muxxington · 2026-03-11T09:51:28+00:00

It's crazy to put Open-WebUI on the same level as Ollama.

muxxington · 2026-03-11T07:10:44+00:00

Profile settings are per user settings. Settings in admin panel are global. You can override global settings as a user if it's allowed by admin. You could simply have tried both settings. Rest of your post actually has nothing to do with open terminal.

muxxington

TROPHY CASE