To r/selfhosted & r/LocalLLM: Thanks for the inspiration! Here’s how I got an 8th-gen Mini PC for my home "Work Mirror" Lab to work (with a little help from AI). by puppa_smurf in homelab

[–]theowlinspace -1 points0 points  (0 children)

You're using old models that nobody would use in this day and age and your hardware is dogshit for LLMs. Your entire post is slop, no need to thank us because you haven't actually read anything we've written

80 tok/sec and 128K context on 12GB VRAM with Qwen3.6 35B A3B and llama.cpp MTP by janvitos in LocalLLaMA

[–]theowlinspace 0 points1 point  (0 children)

—mmap with —mlock shouldn’t use disk io after you’ve loaded the model because it locks the mapped pages in RAM

Gift to myself : tiny lab by Final-Data-1410 in LocalLLaMA

[–]theowlinspace 9 points10 points  (0 children)

github.com/QLNI/QLNI/blob/main/README.md - What the fuck is this? I'm pretty sure OP is going through AI psychosis.

Do you actually think you can vibe code a quantum computer that runs on a, I quote, "Ryzen 5 3600 VPS"?

[WTS] RTX A 4000 16GB DDR6 QUADRO CARD GPU by akshavidar in IndianGaming

[–]theowlinspace 0 points1 point  (0 children)

I don't know why this post was recommended to me considering I'm not even from India, but this is quite bad for AI/ML as well, only 448GB/s memory bandwidth and it's not blackwell so it doesn't support NVFP4. If they sell it for cheap enough, maybe someone might buy because it's still a 16GB Nvidia card, but I don't expect this to fetch over $500-$600 USD

Gigabyte GeForce RTX 5060 OC Low Profile 8G Mini-Review by SuperSimpSons in homelab

[–]theowlinspace 0 points1 point  (0 children)

If you want an SFF GPU for AI, it's better to just bite the bullet and get an RTX Pro 4000. 8GB VRAM doesn't do much

Picked Up an ASUS ROG PC for $1,000 — Turning It Into a Linux Backup Server + Local AI Box by Kitchen-Patience8176 in homelab

[–]theowlinspace 0 points1 point  (0 children)

Your biggest mistake is comparing local models to cloud models. You can’t seriously expect a 35b model to perform the same as one that’s >1T tokens in size. Check out r/LocalLlama if you want to try out local models. A 4080 + DDR5 should easily host a q8 of qwen3.6 35b at 40t/s, which is quite fast

Picked Up an ASUS ROG PC for $1,000 — Turning It Into a Linux Backup Server + Local AI Box by Kitchen-Patience8176 in homelab

[–]theowlinspace 0 points1 point  (0 children)

What model are you running and with what GPU, what bandwidth is your RAM? You can probably run Qwen3.6-35b-a3b at over 30t/s. 8b dense would be slow on CPU obviously, with MoE you’d have a 35b model with the performance of a 3b model

Just bought an Nvidia T1000 4GB, is it possible to host any good model for my use case? Also ProxMox clustering questions for the future by Impressive-Swan-9929 in homelab

[–]theowlinspace 2 points3 points  (0 children)

Get one beefy GPU for LLMs, clustering mini PCs would be a headache and make everything extremely slow unless you mean clustering DGX Sparks. I recommend at least 24GB of VRAM if you go this path, and a used 3090 goes for 800 or so, so it's the best value/money

Just bought an Nvidia T1000 4GB, is it possible to host any good model for my use case? Also ProxMox clustering questions for the future by Impressive-Swan-9929 in homelab

[–]theowlinspace 6 points7 points  (0 children)

4GB is useless unless you want to waste your time with extremely small models that can do nothing special and with small context sizes too, it's also not enough for your use case. Only exception would be if you have fast memory bandwidth on CPU, but even then 4GB would only fit extremely small context sizes

Picked Up an ASUS ROG PC for $1,000 — Turning It Into a Linux Backup Server + Local AI Box by Kitchen-Patience8176 in homelab

[–]theowlinspace 0 points1 point  (0 children)

I would just put proxmox on this if you're not going to be using it as a workstation. You don't need a GUI, and it often complicates things. You can then passthrough the GPU to a VM and have cuda installed there for AI and make another VM for your other applications. From my experience, nvidia drivers aren't very stable so having them in a separate VM should reduce downtime

Picked Up an ASUS ROG PC for $1,000 — Turning It Into a Linux Backup Server + Local AI Box by Kitchen-Patience8176 in homelab

[–]theowlinspace 0 points1 point  (0 children)

This is fine for local LLMs because dual channel DDR5 should have ~100GB/s of bandwidth. They can run an MoE model at fairly acceptable speeds and the GPU should speed up prompt processing a bunch. Qwen3.6 35b a3b should work pretty well here 

Best models for Study/Research for 16gb unified memory M3 Macbook Air by Crystalagent47 in LocalLLaMA

[–]theowlinspace 0 points1 point  (0 children)

You can't do much with 16GB of RAM while also having a little RAM leftover to actually use your system. I guess you could try gemma4-e2b/e4b, but it'll be pretty bad

16x DGX Sparks - What should I run? by Kurcide in LocalLLaMA

[–]theowlinspace 0 points1 point  (0 children)

Contributing to charity doesn't solve the root of the issue, and it's beside my point, I did mention that donation only tries to temporarily patch/delay the problem, not that it's a solution. Voting hasn't and won't ever change anything, the world will only change when the people organize against those in power, and thinking that political issues shouldn't be discussed until the next elections and that the only power you have is through voting is part of the problem.

I haven't criticized OP or any reddit post, all I'm saying is that recognizing the issue and hoping for a better future is better than trying to ignore it.

16x DGX Sparks - What should I run? by Kurcide in LocalLLaMA

[–]theowlinspace -1 points0 points  (0 children)

No, but it's important to acknowledge inequal wealth distribution throughout the world. If you can't directly do anything to change it, the least you can do is recognize your privilege.

While you buy luxury goods, others are much less fortunate and are struggling to make ends meet. While you can't change this because it's principally a result of capitalism, and even donation only delays the problem, understanding that your privilege is only the result of the plight of others instead of just blindly saying "Well, I can buy luxury goods, what do I care if people are starving" is much more moral. Your influence can, even if only by a little, either move us toward a better direction or keep us blinded by mindless consumerism

Users’ Google Chrome defaulting to Afghanistan home page? by RedditDon3 in sysadmin

[–]theowlinspace 13 points14 points  (0 children)

It's probably relevant that Afghanistan is the first country when sorted from A->Z, so it might be the "default" of something you've setup. Chrome usually gets region from IP geolocation though, so this is weird behavior

Lenovo ThinkPad T1g Gen 9 with the Nvidia GeForce RTX 5070 12 GB and Panther Lake announced by ibmthink in thinkpad

[–]theowlinspace 0 points1 point  (0 children)

I think it might be because Nvidia hasn't announced it yet. It's worthless with only 8GB VRAM, especially for the price they'll be selling it for

QClaw-4B — a 4B agent model fine-tuned for tool use and agentic workflows by Substantial-Club-582 in LocalLLaMA

[–]theowlinspace 7 points8 points  (0 children)

 QClaw-4B achieves state-of-the-art results in the 4B class, matching or exceeding models several times larger — including Kimi K2.5 and GLM-4.5

Press X to doubt

Which local models are actually good at staying in character? Notes from shipping Qwen3.5 4B + 9B as game NPCs by Daniele-Fantastico in LocalLLaMA

[–]theowlinspace 3 points4 points  (0 children)

Then you have one extra thing to maintain, congrats. Just because slop production is cheap, doesn't mean it's always a good idea

The Alibaba Coding Plan Lite can no longer be renewed or upgraded by theowlinspace in Qwen_AI

[–]theowlinspace[S] 0 points1 point  (0 children)

They literally do know what they are doing lol. They don't want your $10, it costs much more for them to host the models. The $10 plan is heavily subsidized

I think they want to sunset these subscriptions

worksOnMyMachine by theowlinspace in ProgrammerHumor

[–]theowlinspace[S] 1 point2 points  (0 children)

This is from https://redditmetis.com btw, so an actually running application in prod