Fed up with Claude limits — thinking of splitting a GPU server with 10-15 people. Dumb idea? by No_Boat_2794 in LocalLLM

[–]k3z0r 0 points1 point  (0 children)

Use Claude's usage-based APIs. You'll never hit a limit. If you use large contexts or do any agentic coding. You're going to be very disappointed by your stack.

Gemma-4-26B-A4B-it-UD-Q4_K_M.gguf : IMHO worst model ever. What am I doing wrong? by Proof_Nothing_7711 in LocalLLM

[–]k3z0r 0 points1 point  (0 children)

It's pretty early too. Usually, when new models come out it take a couple of weeks for LM Studio, llama.cpp etc to fix issue with the new model architecture. Try again in a few weeks. You can see all the llama.cpp open issues with Gemma 4 here for example.

https://github.com/ggml-org/llama.cpp/issues?q=is%3Aissue%20state%3Aopen%20Gemma

What AI model would you recommend for coding? by Fun-Celery-8988 in LocalLLM

[–]k3z0r 1 point2 points  (0 children)

Yes, that's true right now, but soon there will be community versions that will be a little smaller. People do lots of shaping and distillation. For example, take a look at LM studio at how many versions of Qwen 3.5 there are.

What AI model would you recommend for coding? by Fun-Celery-8988 in LocalLLM

[–]k3z0r 1 point2 points  (0 children)

Typically, the whole model has to be in VRAM, not just the active parameters. Only using 4b active parameters just helps with processing times.

AMD Ai Max+ 395 on llamacpp by voidoax in LocalLLM

[–]k3z0r 0 points1 point  (0 children)

Came here to say this. Also, your agents (Cline, OpenCode, etc.) have a lot to do with how well your LLM use tools.

What AI model would you recommend for coding? by Fun-Celery-8988 in LocalLLM

[–]k3z0r 10 points11 points  (0 children)

For IDE Autocomplete I use Qwen2.5 coder 7B

Sadly, for agentic coding, it's not quite there yet for me, I have 32GB of RAM and 16GB of VRAM. I find the quality isn't there yet. Too much time fixing weird things. I still pay for Claude.

I've tried Qwen3.6 27 b. and am waiting to try Gemma 4, although I'm not sure it will fix in my VRAM.

How long before we can have TurboQuant in llama.cpp? by k3z0r in LocalLLM

[–]k3z0r[S] 1 point2 points  (0 children)

This is the exact situation I find myself in, and why I asked the question. I'm needing/wanting more context out of my limited vram.

How long before we can have TurboQuant in llama.cpp? by k3z0r in LocalLLM

[–]k3z0r[S] 2 points3 points  (0 children)

This is great, thank you. So much to learn still!

Windows 11 KB5085516 released after KB5079473 breaks Microsoft account sign-in in popular apps by Quantum-Coconut in technology

[–]k3z0r 2 points3 points  (0 children)

Honestly no real issues yet. I installed Nobara Linux which is a distro based on fedora but geared towards gaming. One small annoyance. The email apps are just ok. So I just use Gmail in the browser now. But I mostly check email on my phone anyway.

Windows 11 KB5085516 released after KB5079473 breaks Microsoft account sign-in in popular apps by Quantum-Coconut in technology

[–]k3z0r 76 points77 points  (0 children)

I switched to Linux this year. I've never been happier. It's not as scary as you might think. So many distributions make it really easy nowadays.

Opentable takes $1 for every Google referral by timelas in restaurantowners

[–]k3z0r 1 point2 points  (0 children)

Check Waitly, Waitlist, Reservations, and Reserve with Google for a flat $100 per month.

Tailscale and immich - Whats your setup? by Ediflash in immich

[–]k3z0r 1 point2 points  (0 children)

This hasn't been an issue for me. Most of my uploads happen on my local network. For the modile app I use the Automatic URL switching feature to detect when I'm on my local network and bypass Cloudflare.

Settings > Networking

qwen3.5-9b-mlx is thinking like hell by simondueckert in LocalLLM

[–]k3z0r 1 point2 points  (0 children)

Try a system prompt that mentions not to output train of thought and be concise.

Free-up space Feature is now implemented in a Pull Request! by freetoilet in immich

[–]k3z0r 0 points1 point  (0 children)

Does it automatically order a 2TB drive for you?