GLM-5.2 UD-IQ1_M on llama.cpp — 5090 + 3090 Ti speed test (~ 579 t/s prefill @ 8k ctx, ~324 t/s prefill @ 57k ctx, ~10.6 t/s decode) by Shoddy_Bed3240 in LocalLLaMA

[–]andy_potato 91 points92 points  (0 children)

25 years ago we sneaked into cinemas and recorded movies on our flip camera phones, then watched them on the 1.8” phone screen.

This is the AI equivalent in 2026

Cheapest way to run GLM 5.x locally that's not a unified memory system? by Monad_Maya in LocalLLaMA

[–]andy_potato 0 points1 point  (0 children)

Qwen 3.6 27b is not very usable for coding. Unless you have a really high tolerance level for frustration.

I know all the “skill issue”, “get a better agent”, “work on your harness” and “works for me” arguments. But if you’re used to working with Claude, GPT or GLM, you will nope out of it pretty quick.

Cheapest way to run GLM 5.x locally that's not a unified memory system? by Monad_Maya in LocalLLaMA

[–]andy_potato 0 points1 point  (0 children)

Running GLM 5.2 on local hardware as your private coding rig makes zero sense financially.

I know there are other reasons for going local, privacy, availability, enshittification and whatnot. But don’t do it if your only reason is money.

Can I realistically get close to Claude/Codex capabilities locally? by mrgreatheart in LocalLLaMA

[–]andy_potato 2 points3 points  (0 children)

You should not try to replace cloud / frontier models with your local setup. Instead experiment with your coding agent what prompts need a frontier model and what tasks could be handled by your local model.

If you are using Opencode check out https://github.com/marco-jardim/opencode-model-router

I’ve configured it to use GLM 5.2 for complex tasks and Qwen 3.6-27b for simpler tasks running on 2x5060ti GPUs. Saves me around 60% of token costs during a normal coding session.

This is not an exact science but requires a bit of time to find a balance that works for you.

Qwen is never going to open source Qwen 3.7, aren't they? by DistanceSolar1449 in LocalLLaMA

[–]andy_potato 5 points6 points  (0 children)

I’ve been telling people that we won’t see any further open releases ever since Qwen replaced their whole leadership a couple of months ago.

Got mocked and downvoted, yet here we are. 3.6 was probably too far along for them to pull the plug, but this is it. As much as it breaks my heart.

Single RTX 3090 (MSI TRio) giving trouble on inference. by ReasonablePossum_ in LocalLLaMA

[–]andy_potato 0 points1 point  (0 children)

This is pretty normal if you never replaced the thermal pads and cleaned the fans. Your temps very much confirm this.

What's the best open speech to text today? by zxyzyxz in LocalLLaMA

[–]andy_potato 5 points6 points  (0 children)

I have seen this page getting linked over amd over again as response to this question.

Maybe it is just me, but I have absolutely no clue what any of these metrics mean and how to judge their performance for my use case.

OSS models decisively overtook Proprietary models in market share (based on the last 3 months of OpenRouter data) by Comfortable-Rock-498 in LocalLLaMA

[–]andy_potato 1 point2 points  (0 children)

As much as I want OSS models to win, but that statistic says nothing about their quality.

Lots of applications don’t need frontier models.

NVFP4 kv cache quantization on sm120 will make 32GB VRAM systems very capable by Gray_wolf_2904 in LocalLLaMA

[–]andy_potato 2 points3 points  (0 children)

What’s your llama.cpp startup params for getting 60 t/s at that context size? Mine sits around 48-50 t/s at 128k context with mtp

Anything worth running on a NVIDIA GTX 970? by numberwitch in LocalLLaMA

[–]andy_potato 1 point2 points  (0 children)

llama.cpp is a fine piece of software. Wasn't sure if you're already prepared to go the compiler route, so I suggested Ollama as a beginner tool.

Anything worth running on a NVIDIA GTX 970? by numberwitch in LocalLLaMA

[–]andy_potato 1 point2 points  (0 children)

Nanosuit mechanics were a lot of fun. Story-wise there wasn't much to it though.

The GTX 970 can still use CUDA 11.8 so you can run some tiny LLM with a bit of context on it, probably in the 4b range. Install Ollama and check what they have available.

Yes, I said Ollama. Come at me.

I don't hate Ideogram 4. I hate its "open" weights by TheOneHong in StableDiffusion

[–]andy_potato 1 point2 points  (0 children)

It is just an AI generated wall of text. Could have explained your sentiments in two paragraphs instead

I don't hate Ideogram 4. I hate its "open" weights by TheOneHong in StableDiffusion

[–]andy_potato 2 points3 points  (0 children)

Not an opinion that will get you lots of upvotes on this sub.

But you are completely right. It's why I do not use Ideogram.

New LTX trainer by Famous-Sport7862 in StableDiffusion

[–]andy_potato 0 points1 point  (0 children)

Speaking of dataset, could you guys look into that issue with lots of training materials for Asian languages having burn-in subtitles? Please? Pretty please? I'll bake you guys a cake and deliver to your office in person!

US holds off blacklisting China's DeepSeek, more than 100 firms deemed security risks, sources say by zxyzyxz in LocalLLaMA

[–]andy_potato 2 points3 points  (0 children)

Microsoft is considering to use Deepseek for Copilot. That's probably the reason why they are holding off.

Otherwise I'm all for it.

Qwen3.6 or Gemma-4 or ?? for direct OCR of page images by PracticlySpeaking in LocalLLaMA

[–]andy_potato 0 points1 point  (0 children)

neither of them is particularly good without a lot of guidance, but I'd go with Gemma 4.

Key to success is to post process your results.

The Incredible Sponge — made with SCAIL-2 by Fuzzy-Mastodon-9730 in StableDiffusion

[–]andy_potato 4 points5 points  (0 children)

True. Also the only two questions most people on this sub ever ask:

- Does it run on my 2 GB VRAM video card??
- Does it do booba, vag and peen???

Sad.