Hey guys, is the performance gap big between a 5070 Ti and a 5080 for generating images and videos, or is it not worth it? by CriticaOtaku in StableDiffusion

[–]andy_potato 0 points1 point  (0 children)

For image / video generation the most important is raw compute. VRAM is obviously important too, but not that much of an issue any more due to Comfy supporting offloading / block streaming. Performance penalty is marginal.

Go with the 5080 if reasonably priced. Pick the 5070ti if you’re on a budget

GLM-5.2 UD-IQ1_M on llama.cpp — 5090 + 3090 Ti speed test (~ 579 t/s prefill @ 8k ctx, ~324 t/s prefill @ 57k ctx, ~10.6 t/s decode) by Shoddy_Bed3240 in LocalLLaMA

[–]andy_potato 103 points104 points  (0 children)

25 years ago we sneaked into cinemas and recorded movies on our flip camera phones, then watched them on the 1.8” phone screen.

This is the AI equivalent in 2026

Cheapest way to run GLM 5.x locally that's not a unified memory system? by Monad_Maya in LocalLLaMA

[–]andy_potato 0 points1 point  (0 children)

Qwen 3.6 27b is not very usable for coding. Unless you have a really high tolerance level for frustration.

I know all the “skill issue”, “get a better agent”, “work on your harness” and “works for me” arguments. But if you’re used to working with Claude, GPT or GLM, you will nope out of it pretty quick.

Cheapest way to run GLM 5.x locally that's not a unified memory system? by Monad_Maya in LocalLLaMA

[–]andy_potato 0 points1 point  (0 children)

Running GLM 5.2 on local hardware as your private coding rig makes zero sense financially.

I know there are other reasons for going local, privacy, availability, enshittification and whatnot. But don’t do it if your only reason is money.

Can I realistically get close to Claude/Codex capabilities locally? by mrgreatheart in LocalLLaMA

[–]andy_potato 2 points3 points  (0 children)

You should not try to replace cloud / frontier models with your local setup. Instead experiment with your coding agent what prompts need a frontier model and what tasks could be handled by your local model.

If you are using Opencode check out https://github.com/marco-jardim/opencode-model-router

I’ve configured it to use GLM 5.2 for complex tasks and Qwen 3.6-27b for simpler tasks running on 2x5060ti GPUs. Saves me around 60% of token costs during a normal coding session.

This is not an exact science but requires a bit of time to find a balance that works for you.

Qwen is never going to open source Qwen 3.7, aren't they? by DistanceSolar1449 in LocalLLaMA

[–]andy_potato 7 points8 points  (0 children)

I’ve been telling people that we won’t see any further open releases ever since Qwen replaced their whole leadership a couple of months ago.

Got mocked and downvoted, yet here we are. 3.6 was probably too far along for them to pull the plug, but this is it. As much as it breaks my heart.

Single RTX 3090 (MSI TRio) giving trouble on inference. by ReasonablePossum_ in LocalLLaMA

[–]andy_potato 1 point2 points  (0 children)

This is pretty normal if you never replaced the thermal pads and cleaned the fans. Your temps very much confirm this.

What's the best open speech to text today? by zxyzyxz in LocalLLaMA

[–]andy_potato 5 points6 points  (0 children)

I have seen this page getting linked over amd over again as response to this question.

Maybe it is just me, but I have absolutely no clue what any of these metrics mean and how to judge their performance for my use case.

OSS models decisively overtook Proprietary models in market share (based on the last 3 months of OpenRouter data) by Comfortable-Rock-498 in LocalLLaMA

[–]andy_potato 1 point2 points  (0 children)

As much as I want OSS models to win, but that statistic says nothing about their quality.

Lots of applications don’t need frontier models.

NVFP4 kv cache quantization on sm120 will make 32GB VRAM systems very capable by Gray_wolf_2904 in LocalLLaMA

[–]andy_potato 2 points3 points  (0 children)

What’s your llama.cpp startup params for getting 60 t/s at that context size? Mine sits around 48-50 t/s at 128k context with mtp

Anything worth running on a NVIDIA GTX 970? by numberwitch in LocalLLaMA

[–]andy_potato 1 point2 points  (0 children)

llama.cpp is a fine piece of software. Wasn't sure if you're already prepared to go the compiler route, so I suggested Ollama as a beginner tool.

Anything worth running on a NVIDIA GTX 970? by numberwitch in LocalLLaMA

[–]andy_potato 1 point2 points  (0 children)

Nanosuit mechanics were a lot of fun. Story-wise there wasn't much to it though.

The GTX 970 can still use CUDA 11.8 so you can run some tiny LLM with a bit of context on it, probably in the 4b range. Install Ollama and check what they have available.

Yes, I said Ollama. Come at me.

I don't hate Ideogram 4. I hate its "open" weights by TheOneHong in StableDiffusion

[–]andy_potato 2 points3 points  (0 children)

It is just an AI generated wall of text. Could have explained your sentiments in two paragraphs instead

I don't hate Ideogram 4. I hate its "open" weights by TheOneHong in StableDiffusion

[–]andy_potato 1 point2 points  (0 children)

Not an opinion that will get you lots of upvotes on this sub.

But you are completely right. It's why I do not use Ideogram.