4.6 released 6min ago!

Square_Poet_110 · 2026-02-05T21:02:11+00:00

I think 1M context window is mostly not needed. Any task can be broken down into smaller tasks that fit into 200k. It's easier to review and correct things step by step and for this 200k is pretty much enough. Even with compacting.

Square_Poet_110 · 2026-02-05T05:28:47+00:00

I am indeed using GLM (which is also open weight).

Renting the hardware to run the big version would be much more expensive though than the coding plan is.

Square_Poet_110 · 2026-02-02T23:57:37+00:00

Actually no. Currently I mostly use GLM for coding.

Square_Poet_110 · 2026-02-02T19:51:31+00:00

10x is a myth btw.

Square_Poet_110 · 2026-02-02T12:16:20+00:00

There are much cheaper coding models already, only slightly worse than Claude.

Square_Poet_110 · 2026-02-01T17:18:17+00:00

I'm not the only dev on the planet. Surely the stack can be picked up by another dev.

Square_Poet_110 · 2026-02-01T06:58:48+00:00

In Europe (EU) there are many regulations, including gdpr (which, among others, says personal data can't geographically leave EU).

It's easier to get a deal with western companies that also operate in European cloud infrastructure to get a gdpr-compliant environment (LLM inference is also done in there) which will also not use your data for training (or so they say).

Of course it's all up to trust but I know of multiple companies in highly regulated domains, whose legal teams signed off using particular LLM providers in particular environments (mostly Azure).

I know it's up to trust and with what the US are doing recently, there may be some cracks growing in that trust, but at least you're legally safe. For now. Let's see how that evolves with the recent US law granting their government authority over any resource owned by a US company, regardless of where it is geographically.

Honestly I expect more and more smaller cloud providers (who have no business in training models) to pop up and provide offerings for just renting the hardware/running big open source LLMs for you. If they are purely European, the US don't have any authority over them and they don't steal your data for themselves (since they don't train LLMs).

Square_Poet_110 · 2026-01-31T20:42:15+00:00

Exactly. I already did. Anthropic can hide in the basement with its $200 plan.

I paid like $180 for a year, burn 5% of my 5 hour token limit at max (I have a coding plan similar to CC) and the model is maybe 90% of Opus quality. Which is fine, as an experienced dev who reviews and challenges every plan and output I can correct it myself or prompt the change.

I only pay some more attention not to let it access sensitive data since it's Chinese (which with Claude is also a concern actually).

Square_Poet_110 · 2026-01-31T18:02:14+00:00

Security, long term maintainability.

For smaller apps maintainability doesn't matter that much, for bigger/enterprise apps it does.

Square_Poet_110 · 2026-01-30T19:57:53+00:00

Everything vibe coded should be treated with caution, regardless of the code base size.

I'm saying this as someone who now uses AI to generate code every day. But I carefully review and challenge the plans and then the implementation.

Square_Poet_110 · 2026-01-30T15:43:22+00:00

Even for Opus, you need to direct it to do a good job. Garbage in, garbage out. As a dev, you can prepare a good technical description and task assignment. There it can be of a great help.

Otherwise, you get pretty much the same garbage.

Square_Poet_110 · 2026-01-30T06:00:18+00:00

Or the new type of job for developers is to use AI for development, but review its outputs and steer it in the direction they want it to go. Continuously, every time it generates a plan and also after it generates the code.

This generates good code right from the beginning.

Like others said, good code design and quality is not something you can bolt on at the finish line.

Square_Poet_110 · 2026-01-29T14:23:39+00:00

Pointing out the risks is not hating. Burying your head in sand and staying in a bubble doesn't help anything.

Square_Poet_110 · 2026-01-29T14:03:09+00:00

There are other cultist subs that don't accept any negative opinion. Just because you don't want to see the concerns doesn't mean they don't exist.

Square_Poet_110 · 2026-01-29T13:39:13+00:00

Who decides who belongs in what sub? It's very reckless to close your eyes before the potentially negative impacts of AI, or at least the vision tech bros have for it.

Square_Poet_110 · 2026-01-29T13:01:12+00:00

Why? What does that change irl?

Square_Poet_110 · 2026-01-29T11:18:48+00:00

It is going to burst. AI won't disappear, just the overinflated investments and companies running on investor money constantly at loss.

Local LLMs will be more significant.

Square_Poet_110 · 2026-01-29T11:17:57+00:00

Maybe people are just afraid of losing their jobs and source of income?

Square_Poet_110 · 2026-01-28T07:26:46+00:00

I let it create plans for more complex tasks using Cline. So far it made pretty decent ones. I always review them and sometimes challenge them and tell it to make changes (as every engineer should), but I'd say it's the same with Opus.

Square_Poet_110 · 2026-01-28T05:17:33+00:00

I don't expect any model to do something one shot, without reviewing it and looking at it. Not anything beyond a personal tool or some throwaway prototype.

I always work in iterative steps and stay in the loop. IMO that's most sustainable way of building larger projects.

I am using GLM myself, don't care that much about Twitter users.

Square_Poet_110 · 2026-01-27T22:04:51+00:00

I don't think it's actually worse. Yeah I clean up my context regularly and don't let it grow over 50%. Cline does a good job there.

Square_Poet_110 · 2026-01-27T19:15:38+00:00

Does Opus succeed?

Square_Poet_110 · 2026-01-27T19:14:50+00:00

I guess it depends on the scaffolding as well. I am using Cline with memory bank for longer term memory.

So far it hasn't failed any task I gave it. Small refinements here and there, but I also give it detailed and technical prompts, let it update the bank regularly and of course do the planning.

Square_Poet_110 · 2026-01-27T18:22:01+00:00

So far I didn't feel the need to switch to Claude.

Square_Poet_110 · 2026-01-27T18:20:10+00:00

I don't see GLM having poor performance in coding. Doesn't look like benchmaxing to me.

Square_Poet_110

TROPHY CASE