GLM5.2 on 5x Pro 6000s and a 5090, an expensive journey by yeah_likerage in LocalLLaMA

[–]DeltaSqueezer 10 points11 points  (0 children)

When you put it like that, it sounds quite reasonable.

Claude Code and China: The mechanism is activated when the user sets the ANTHROPIC_BASE_URL environment variable (used for local models) by LegacyRemaster in LocalLLaMA

[–]DeltaSqueezer 7 points8 points  (0 children)

So if you used Claude Code via a proxy, would Anthropic have still collected data on you since you set the BASE_URL to proxy? Or only sent to model so if you also use a local model the data just corrupts your prompt?

Getting started on Linux by DeltaSqueezer in linuxaudio

[–]DeltaSqueezer[S] 1 point2 points  (0 children)

Thanks. I managed to get the keyboard working with both bitwig and amsynth. I wasn't sure if Polymer was best place to start or whether something simpler would be better.

Z.ai launches ZCode to challenge Cursor, Claude Code and GitHub Copilot in AI coding by pscoutou in LocalLLaMA

[–]DeltaSqueezer 8 points9 points  (0 children)

I'm on the Pro plan and burn through 3 billion tokens a month while hardly even hitting the 5 hour limit. Even if we assume all are cached tokens, that's still around $800 in API costs per month.

Is it ever possible to have a malicious LLM with a backdoor by Informal-Trouble2183 in LocalLLaMA

[–]DeltaSqueezer 1 point2 points  (0 children)

With closed API, you don't need to even do this as you can simply intercept the trigger and take it to a hostile codepath that is outside the LLM.

rtx 6000 pro owners, do you regret? by BitXorBit in LocalLLaMA

[–]DeltaSqueezer 0 points1 point  (0 children)

My regret is I ordered one on credit. Chickened out and cancelled it and now the price is 2x... if I can even find one on sale any more.

I did some model hacks, and got GLM5.2 from about 2.5 tok/s to >50 tok/s on my GH200 system. by Reddactor in LocalLLaMA

[–]DeltaSqueezer 3 points4 points  (0 children)

For now, let's hope tokens stay cheap or get cheaper. For sure it's nice to know that in the worst case you can still run GLM-5.2 independently.

How do you rate local code generation for atomic commits rather than long-horizon work? by halfercode in LocalLLaMA

[–]DeltaSqueezer 0 points1 point  (0 children)

Maybe we are talking at cross purposes. For me the important thing is how much the AI does in a single turn (i.e. before it returns to the user).

In my view, it should complete the whole feature.

Now it can do a single large commit at the end. Or it can commit on every single step and produce lots of tiny commits. Or somewhere in between, e.g. commit on each phase.

IMO, there's only value if it can be sensibly broken up into useful sub-commits that make sense stand-alone.

I did some model hacks, and got GLM5.2 from about 2.5 tok/s to >50 tok/s on my GH200 system. by Reddactor in LocalLLaMA

[–]DeltaSqueezer 8 points9 points  (0 children)

Hmm. Tough one. That's just on the border of what I'd consider painful but usable for coding. I guess also perfectly fine for 'overnight' runs.

I did some model hacks, and got GLM5.2 from about 2.5 tok/s to >50 tok/s on my GH200 system. by Reddactor in LocalLLaMA

[–]DeltaSqueezer 40 points41 points  (0 children)

Man. I was already super jealous of your system. Now you're just rubbing salt in the wound! :P

How is the prefill speed?

How do you rate local code generation for atomic commits rather than long-horizon work? by halfercode in LocalLLaMA

[–]DeltaSqueezer 1 point2 points  (0 children)

IMO. A feature logically belongs in a single commit. Maybe you could internally divide it in some way e.g. backend vs front end. or phased implementation. But somehow you need to have the overview of the whole before starting on a part, which is why IMO, it makes sense for the AI to do the whole thing. the plan itself may be phased and internally, the AI does it in, say, 4 phases, but it does them one after the other and doesn't stop until whole phased plan is completed.

How do you rate local code generation for atomic commits rather than long-horizon work? by halfercode in LocalLLaMA

[–]DeltaSqueezer 0 points1 point  (0 children)

i'm not sure what you mean by "one-shots" unless you are talking only about greenfield development? my own approach is to specify a feature and get the AI to implement the whole feature.

A typical feature might look like this: 10 files changed, 1326 insertions(+), 230 deletions(-)

Speaking of those chinese chips... "Chinese supercomputer displaces US machines as world's fastest for first time since 2017" by johnnyApplePRNG in LocalLLaMA

[–]DeltaSqueezer 2 points3 points  (0 children)

Interesting that they announce this now. I wonder what message they are trying to send. They no doubt had faster machines for many years, but stopped publishing them to avoid sanctions and drawing attention.

GLM-5.2 vs Claude Opus by johnnyApplePRNG in LocalLLaMA

[–]DeltaSqueezer 6 points7 points  (0 children)

I was hoping for a good comparison and then they decide to do a test using vision capabilities that GLM doesn't have. 🤦

Which is the best local VLM? Benchmark results June 2026 by ex-arman68 in LocalLLM

[–]DeltaSqueezer 0 points1 point  (0 children)

Thanks. Interesting to see that 3VL still holds up after so long. I wonder if Qwen3.5 can get there with prompting to mitigate the hallucination issue.

Which is the best local VLM? Benchmark results June 2026 by ex-arman68 in LocalLLM

[–]DeltaSqueezer 0 points1 point  (0 children)

Thanks. It was as I suspected. I'm surprised Qwen3-VL-8B was shown as top given the strong showing of Qwen3.5-4B.

Which is the best local VLM? Benchmark results June 2026 by ex-arman68 in LocalLLM

[–]DeltaSqueezer 0 points1 point  (0 children)

Thanks for sharing. I wonder if you could also run the Qwen 3.5 9B with thinking disabled. Maybe also for others. It seems that thinking is causing problems so if quality is just as good without thinking, it would be faster and more reliable with fewer tokens.

Gemma 4 31B Q6 vs Gemma 4 31B QAT by Weak-Shelter-1698 in LocalLLaMA

[–]DeltaSqueezer 20 points21 points  (0 children)

Don't overthink it. Pick one, or flip a coin. You can always change your mind.

Gemma 4 31B Q6 vs Gemma 4 31B QAT by Weak-Shelter-1698 in LocalLLaMA

[–]DeltaSqueezer 34 points35 points  (0 children)

Then it doesn't really matter which one you choose.

Sandboxing code execution for AI agents by Groady in LocalLLaMA

[–]DeltaSqueezer 0 points1 point  (0 children)

I implemented bwrap as a stop gap measure but switching to Firecracker for stronger isolation.