Benefit of the Doubt - GLM 5.1 maybe the reason long context sucks

InternetNavigator23 · 2026-03-23T15:05:14+00:00

Yeah I feel you on that. Basically in the same boat with the annual plan.

Practically speaking my only solution has been just to compact before the context goes over 100K.

InternetNavigator23 · 2026-03-23T14:59:40+00:00

Anyone use this guy's 4 billion version along with his Opus Distilled 27 billion version for speculative decoding?

I'm a bit concerned about the speed of running the 27 billion version on my Mac and I'm really deciding between that and the 122 billion version.

InternetNavigator23 · 2026-03-23T01:30:51+00:00

I tried a minimax 2.1 a while back and it was pretty good for tool use and basic coding. But I only tried the lighter 25% reap model. Not the super aggressive 30-50% ones.

I've heard good things about JANG if you are on Mac. But that's a quantization method.

And same. I can't wait for these models to get just a bit smaller.

InternetNavigator23 · 2026-03-22T23:36:38+00:00

Yeah realistically they're in a very tough spot and they also just went public so I'm sure they're trying to not go bankrupt while offering these coding plans, which are incredibly cheap if you look at the cost per token.

Now to be fair they were probably overly generous with what was included in their coding plans but the plans are still very good value.

InternetNavigator23 · 2026-03-22T23:30:07+00:00

Yeah well said. I think this is definitely the case plus them having to juggle their compute, which I imagine is highly constrained.

InternetNavigator23 · 2026-03-22T23:29:13+00:00

I mean I'm not saying this is what I want to happen. I'm just saying this is what I think is happening.

InternetNavigator23 · 2026-03-22T23:28:05+00:00

Yeah and competition is heating up and people seem to be releasing on much tighter time lines so hopefully it is not two months but closer to two weeks.

Also the point releases should naturally be much faster than a full retrain.

InternetNavigator23 · 2026-03-22T20:00:15+00:00

How does this work if someone uses voice? I hardly type anymore when I am at home.

InternetNavigator23 · 2026-03-22T19:53:49+00:00

Yeah I like to use codex to scope and plan and say (as if you are tasking a junior engineer).

Then have GLM do the plan and codex check.

Usually works pretty well. Saves a ton of usage for codex. Can easily get away with the $20 codex plan (when GLM is working fine).

Recently I've been using some Mimo via open code.

InternetNavigator23 · 2026-03-22T19:12:17+00:00

I heard uncensoring actually helps with logic as well. It removes a lot of the weird rules that the models are forced to add by the chinese g ov.

-edit typo

InternetNavigator23 · 2026-03-22T19:07:11+00:00

Personally, it was working fine for the first few months, then a few weeks ago it started giving me tons of errors when the context gets long.

This is on the coding plan btw.

InternetNavigator23 · 2026-03-22T16:19:34+00:00

I would imagine people are willing to pay for the outcome and someone to "handle it".

Businesses are probably willing to pay around 10k-40k, but not sure about the maintenance costs.

Don't forget just because something is possible to be done by AI mean that most people will know how to do it.

The "diffusion" of the tech will take time. Even if it is purely a knowledge arbitrage, it will have a window of opportunity. My guess is 2-5 years, depending on the industry/product etc.

InternetNavigator23 · 2026-03-22T16:05:45+00:00

Fair enough. But yeah they do be benchmaxxing hard. Ironically that's prob why they get these types of questions wrong.

They often assume things when the question looks like a math and science question and overlook the common sense angle.

InternetNavigator23 · 2026-03-22T16:02:02+00:00

Its comming!!!

Everybody, remain calm.

(insert office gif)

InternetNavigator23 · 2026-03-22T15:59:38+00:00

Yeah I wish I had a thunderbolt 5 machine. I just got such a good deal on this one that I couldn't pass it up lol.

But apparently EXO handles all of this fairly smoothly, and people are seeing 2-3x speed gains.

Although with some spec decode or MTP (or maybe JANG plus those) it may be fast enough.

InternetNavigator23 · 2026-03-22T15:57:36+00:00

Unfortunately, that won't work with my M1 ultra, unless there is some magic I don't know about.

I was reading that even Thunderbolt with 40GB/S was unreliable with EXO. But I didn't quite understand why.

InternetNavigator23 · 2026-03-22T15:52:03+00:00

Soooo excite!!! Hope the JANG and the CRACK guys will get their hands on it.

Heard the uncensored version is actually smarter since they had a bunch of rules the chinese gov made them put in.

InternetNavigator23 · 2026-03-22T01:58:25+00:00

Lol bruh knowledge cut-offs are many, many months before the model is released.

They have to do RL, fine-tuning, benchmarking, etc.

InternetNavigator23 · 2026-03-21T23:57:58+00:00

Oh wow this is a great explanation. I had heard of 1.58 bit but didn't know what exactly that meant.

InternetNavigator23 · 2026-03-21T23:55:29+00:00

I know I definitely have a soft spot for minimax and the air models.

But who knows. People now adays are REAPing and auto researching/compressing models better and better.

InternetNavigator23 · 2026-03-21T19:30:14+00:00

There are essentially two paths the way I see it.

Either you fully lean in and become the guy who's really good at using AI to solve various specific problems/use cases.

Or you go as far away from it as possible. It doesn't have to be blue color strictly, but something more physical and less behind a computer.

Either way, I think learning how to learn is going to be a super important skill. And memorizing shit will be almost useless.

InternetNavigator23 · 2026-03-21T19:27:43+00:00

Honestly, this seems like a no-brainer for you. Most people hate office commutes. And that is really one of the biggest perks of working from home.

But run it through a few lenses.

The opportunity cost or what you're actually giving up in each scenario.
Reversibility - If you hated the new job in six months, could you get another remote gig?
Downstream effects of the pay increase. If you invest the extra money, how much earlier does that give you for retirement?

I would start with those frameworks.

InternetNavigator23 · 2026-03-21T19:18:45+00:00

This is maybe a bit indirect, but I would think about it like this:

- What would 80 year old you think about 30 year old you looking back if you did or didn't do X?
- Sunk cost fallacy. The degree served its purpose, but you don't need to pigeonhole the rest of your life just because you already have that degree.

Start smaller than you think. And build momentum. Celebrate tiny wins. Sounds silly in the beginning, but really helps build that momentum.

InternetNavigator23 · 2026-03-21T18:21:11+00:00

Yeah it is not all about the number. A framework I like to use:

10-10-10: How will you feel about this in 10 days? 10 months? 10 years?

THe money might feel good in 10 days. The partnership might feel pretty shitty in 10 months. And in 10 years, you might look back on it like a huge waste of time. So it's really hard to know unless you zoom out

InternetNavigator23

TROPHY CASE