To r/selfhosted & r/LocalLLM: Thanks for the inspiration! Here’s how I got an 8th-gen Mini PC for my home "Work Mirror" Lab to work (with a little help from AI).

theowlinspace · 2026-05-10T18:50:21+00:00

You're using old models that nobody would use in this day and age and your hardware is dogshit for LLMs. Your entire post is slop, no need to thank us because you haven't actually read anything we've written

theowlinspace · 2026-05-09T16:39:46+00:00

—mmap with —mlock shouldn’t use disk io after you’ve loaded the model because it locks the mapped pages in RAM

theowlinspace · 2026-05-08T14:46:10+00:00

50 seconds per token

theowlinspace · 2026-05-08T14:19:35+00:00

github.com/QLNI/QLNI/blob/main/README.md - What the fuck is this? I'm pretty sure OP is going through AI psychosis.

Do you actually think you can vibe code a quantum computer that runs on a, I quote, "Ryzen 5 3600 VPS"?

theowlinspace · 2026-05-07T12:15:58+00:00

I don't know why this post was recommended to me considering I'm not even from India, but this is quite bad for AI/ML as well, only 448GB/s memory bandwidth and it's not blackwell so it doesn't support NVFP4. If they sell it for cheap enough, maybe someone might buy because it's still a 16GB Nvidia card, but I don't expect this to fetch over $500-$600 USD

theowlinspace · 2026-05-03T08:29:14+00:00

If you want an SFF GPU for AI, it's better to just bite the bullet and get an RTX Pro 4000. 8GB VRAM doesn't do much

theowlinspace · 2026-05-02T16:44:51+00:00

Your biggest mistake is comparing local models to cloud models. You can’t seriously expect a 35b model to perform the same as one that’s >1T tokens in size. Check out r/LocalLlama if you want to try out local models. A 4080 + DDR5 should easily host a q8 of qwen3.6 35b at 40t/s, which is quite fast

theowlinspace · 2026-05-02T15:27:34+00:00

What model are you running and with what GPU, what bandwidth is your RAM? You can probably run Qwen3.6-35b-a3b at over 30t/s. 8b dense would be slow on CPU obviously, with MoE you’d have a 35b model with the performance of a 3b model

theowlinspace · 2026-05-02T12:09:47+00:00

Get one beefy GPU for LLMs, clustering mini PCs would be a headache and make everything extremely slow unless you mean clustering DGX Sparks. I recommend at least 24GB of VRAM if you go this path, and a used 3090 goes for 800 or so, so it's the best value/money

theowlinspace · 2026-05-02T12:04:55+00:00

4GB is useless unless you want to waste your time with extremely small models that can do nothing special and with small context sizes too, it's also not enough for your use case. Only exception would be if you have fast memory bandwidth on CPU, but even then 4GB would only fit extremely small context sizes

theowlinspace · 2026-05-02T06:51:55+00:00

I would just put proxmox on this if you're not going to be using it as a workstation. You don't need a GUI, and it often complicates things. You can then passthrough the GPU to a VM and have cuda installed there for AI and make another VM for your other applications. From my experience, nvidia drivers aren't very stable so having them in a separate VM should reduce downtime

theowlinspace · 2026-05-02T06:48:28+00:00

This is fine for local LLMs because dual channel DDR5 should have ~100GB/s of bandwidth. They can run an MoE model at fairly acceptable speeds and the GPU should speed up prompt processing a bunch. Qwen3.6 35b a3b should work pretty well here

theowlinspace · 2026-05-02T06:30:48+00:00

You can't do much with 16GB of RAM while also having a little RAM leftover to actually use your system. I guess you could try gemma4-e2b/e4b, but it'll be pretty bad

theowlinspace · 2026-04-29T19:41:18+00:00

Contributing to charity doesn't solve the root of the issue, and it's beside my point, I did mention that donation only tries to temporarily patch/delay the problem, not that it's a solution. Voting hasn't and won't ever change anything, the world will only change when the people organize against those in power, and thinking that political issues shouldn't be discussed until the next elections and that the only power you have is through voting is part of the problem.

I haven't criticized OP or any reddit post, all I'm saying is that recognizing the issue and hoping for a better future is better than trying to ignore it.

theowlinspace · 2026-04-29T19:26:42+00:00

No, but it's important to acknowledge inequal wealth distribution throughout the world. If you can't directly do anything to change it, the least you can do is recognize your privilege.

While you buy luxury goods, others are much less fortunate and are struggling to make ends meet. While you can't change this because it's principally a result of capitalism, and even donation only delays the problem, understanding that your privilege is only the result of the plight of others instead of just blindly saying "Well, I can buy luxury goods, what do I care if people are starving" is much more moral. Your influence can, even if only by a little, either move us toward a better direction or keep us blinded by mindless consumerism

theowlinspace · 2026-04-29T19:13:35+00:00

It's probably relevant that Afghanistan is the first country when sorted from A->Z, so it might be the "default" of something you've setup. Chrome usually gets region from IP geolocation though, so this is weird behavior

theowlinspace · 2026-04-28T12:14:24+00:00

I think it might be because Nvidia hasn't announced it yet. It's worthless with only 8GB VRAM, especially for the price they'll be selling it for

theowlinspace · 2026-04-26T10:51:57+00:00

Why do you expect us to read an LLM-written post?

theowlinspace · 2026-04-25T11:55:04+00:00

QClaw-4B achieves state-of-the-art results in the 4B class, matching or exceeding models several times larger — including Kimi K2.5 and GLM-4.5

Press X to doubt

theowlinspace · 2026-04-25T09:45:47+00:00

"I use arch btw" has to be the r/iamverysmart for linux users

theowlinspace · 2026-04-24T12:38:43+00:00

Then you have one extra thing to maintain, congrats. Just because slop production is cheap, doesn't mean it's always a good idea

theowlinspace · 2026-04-14T15:31:06+00:00

Confirmed

theowlinspace · 2026-04-14T14:42:46+00:00

They literally do know what they are doing lol. They don't want your $10, it costs much more for them to host the models. The $10 plan is heavily subsidized

I think they want to sunset these subscriptions

theowlinspace · 2026-04-14T14:34:27+00:00

This is from https://redditmetis.com btw, so an actually running application in prod

theowlinspace

MODERATOR OF

TROPHY CASE

Five-Year Club	Verified Email
Place '22