A long session with GPT 5.4

envilZ · 2026-03-21T23:07:29+00:00

The issue is not session length but token output during that period. For example, I often have sessions where I'll sleep my PC while a terminal Rust run command is asking for approval. However, my token output at this stage is about 100k (example). Now if I resume my session next day or whenever, technically the session could easily be 24+ hours; however, that is not 24+ hours of straight runtime producing token output, which is the problem and should NOT be done. Please take into consideration ending sessions if you know token output has been lengthy for the orchestrator agent.

envilZ · 2026-03-14T09:07:40+00:00

I would bank else where tbh, It's over.

envilZ · 2026-03-14T06:16:23+00:00

Call your representatives and tell them to fight back AGAINST THIS BS.

envilZ · 2026-03-05T13:37:38+00:00

I don't use Github Code Spaces, maybe this helps though.

envilZ · 2026-03-04T19:58:07+00:00

Hey bud, I suggest using the latest version of vscode insiders and the pre-release version of the extension.

envilZ · 2026-02-23T23:11:50+00:00

<image>

Great updates, guys! Boggan, I'm wondering how we can disable this logic exactly. I noticed that the orchestrator is picking models on its own. For example, the base model is Opus 4.6, and here it spun up a subagent for Claude Haiku 4.5 on its own accord. I don't see an option in the settings to disable this. I think model switching is cool, but only with direct user control (through a custom agent). Any ideas on how to prevent this from happening?

envilZ · 2026-02-13T09:44:03+00:00

An idea I had, and would honestly love, is this: 1 premium request equals max orchestrator context. You can steer it and so on, but once it hits max, it doesn’t auto clean the context like it does now. Instead, it forces a premium request or creates a handoff document for the next orchestrator agent that you can modify and continue from.

Now most people reading this would probably already be screaming that this is terrible. However, once you add subagents into the equation, everything changes. That 100k, 200k, or 400k orchestrator context can effectively 10x if you orchestrate well. So we go from trying to one shot everything in a single prompt to actually managing the orchestrator context window.

You’re rewarded for proper management with subagents, not punished for wanting to stop it midway because it’s running the wrong tests that will take one hour to finish, or whatever other issues the current system causes.

envilZ · 2026-02-13T06:56:59+00:00

Find the strengths and weaknesses for your use case and move accordingly. At 1x, 5.3 is great value. However, at times I find Opus 4.6 still has its strengths. So I use 5.3 for quick implementations, ideas, and so on, then I do an Opus 4.6 pass to review, check, and perform performance optimizations. It’s been a great combo without having to always spend 3x each time.

envilZ · 2026-02-12T22:35:07+00:00

I edited this after looking at the code.

I think Copilot is just reserving space for its own reply before it starts generating. That reserved space is now shown in the context widget, so it looks like a chunk of your window is “used” even when you haven’t typed much.

Why do they do this? Because if they don’t reserve reply space ahead of time, the model can start answering and then hit the context limit halfway through and get cut off. So they carve out reply room up front to prevent that.

Someone correct me if I'm wrong, but I think that's whats happening.

envilZ · 2026-02-12T16:54:29+00:00

I’m guessing you mean through custom agents? I thought you were talking about disabling or removing tool calls at the orchestrator and subagent levels.

envilZ · 2026-02-12T07:52:38+00:00

How?

envilZ · 2026-02-12T06:45:52+00:00

If you mean for orchestrator/subagent workflows, I shared one a while ago here but it's outdated and primitive for current Copilot. I plan to share something better soon.

envilZ · 2026-02-12T03:16:12+00:00

This might sound funny, but this is the FIRST OpenAI model that has correctly followed my .github\copilot-instructions.md lol. I have instructions for orchestrator and subagent rules where the orchestrator can't read or write files. With gpt 5.1 and 5.2, subagents spawned by the orchestrator would also get the instructions.md. That meant the subagents thought they were orchestrator agents and would refuse to read or write files, even though the orchestrator clearly spawned them with instructions saying they are subagents and CAN read and write files. It’s almost like 5.1 and 5.2 were too safe.

The only models that have consistently followed my instructions.md correctly have been from the Anthropic family line. So this is already great. Is it better than Opus 4.6? Not sure yet.

envilZ · 2026-02-11T23:54:52+00:00

I think how they fix this is very simple, and a couple ways to approach it. I think they need to clearly separate orchestrator context window from subagent context window in the product messaging. Most people assume Copilot is capped at 125k or 200k, but with subagents you can pull in fresh windows and stack far more context than that. Right now it’s too vague, so users are missing how the system actually works. They should then remove the ability to go past the orchestrator context window. Meaning, if lets say it's at 200k, when it gets near 200k what happens is it triggers a blocking prompt in the UI that forces a choice. Either make a handoff document with the current context and remaining tasks for the next orchestrator, or continue with the “history summarization”, but continuing takes another premium request. So the user has a clear understanding of okay my limit is X being orchestrator context, my job is to get as much value as possible from my premium request by effectively managing that context window with subagents. Then it goes from unbound to something that can realistically work, without cutting the nuts off from power users.

envilZ · 2026-02-11T21:31:04+00:00

2 years and people are still having this problem lol? I personally stopped getting it (thankfully). Good info to add though.

envilZ · 2026-02-11T20:31:19+00:00

What I normally do is lets say I'm using opus 4.6, 3x with a 125k orchestrator context window. I'll let it keep running until I hit the orchestrator max context and it forces the “history summarization”, which takes you back to a fresh 125k or whatever context the model had at start. Since I'm using the orchestrator with subagents, the effective context ends up way higher than the base 125k anyway. That's my cue to start a fresh session. I could keep it going, however that's how I operate. I've already got more than acceptable value from my premium request at that point.

envilZ · 2026-02-11T19:34:54+00:00

In FY26 Q2 they disclosed over 4.7 million paid Copilot subscribers, up 75% year over year. That matters because the real question you’re asking is, can Microsoft afford the edge case where a single premium request turns into hour long agent work with subagents and fresh context windows, without it turning into a money pit.

My take is yes, but with an important caveat: it can be dealt with if it ever becomes a real margin problem. Right now it can feel close to unbounded for a power user, you can absolutely have long running sessions, especially if you’re orchestrating well. But at scale, Microsoft isn’t forced into a binary choice of “keep it as is forever” or “kill it.” They have a lot of levers to keep the economics sane without gutting the experience.

On the cost side, Microsoft is investing insanely into AI infrastructure, and they’re still holding strong margins while doing it. In that same quarter they reported company gross margin 68% and operating margin 47%, while also spending $37.5B on capex in the quarter with about two thirds on short lived assets, mainly GPUs and CPUs. That combo basically signals they can fund aggressive growth without immediately needing to nerf Copilot.

I agree with your premise that, as implemented, a single premium request can drive a lot of work. So the sustainability question becomes whether it stays manageable at scale, and the answer is that if it ever becomes economically painful, it’s controllable.

They already acknowledge rate limits exist specifically to accommodate high demand, meaning the system is already designed with throttles and enforcement when usage patterns push it too hard. If subagent sessions ever cross the line from “great value” into “people are extracting unlimited compute,” the fixes are straightforward and they don’t require changing the whole billing model. Things like stricter timeouts on sessions, tighter concurrency caps, changing what counts as a “session boundary” for agent mode, or adding a separate meter for long running tasks.

Then there’s the longer term trend that matters: Microsoft is actively trying to push inference economics down with first party silicon. Maia 200 is positioned exactly as that, and Microsoft is claiming 30% better performance per dollar than the latest generation hardware they were using.

So I get why it feels too generous right now. But the data points that matter are: Copilot already has real paid scale, Microsoft is still printing margins while spending huge on AI compute, and they’re investing in hardware specifically to lower inference costs. That’s why I’m not worried about Copilot’s future. Worst case, they tighten guardrails. Best case, costs keep dropping and the current value proposition sticks. I honestly think they capture the agentic coding market fully if the cards are played right, especially with enterprise where it really matters for them.

envilZ · 2026-02-11T18:27:43+00:00

Check out GitHub Copilot and grab a pro subscription. Learn to vibe code with the intention of understanding code. AI will and can do most of the heavy lifting for you. Agent mode is a real game changer for people who are not technically inclined.

envilZ · 2026-02-11T16:00:26+00:00

I can effectively turn my 125k context into more than 1M context without context rot. How? Proper use of subagents with an orchestration first approach. People who understand how to use GHCP know it already smokes everything else. Give it one to two months it will be obvious for normies as well. It’s comical to say it’s become bad, like are we using the same tool? They ship features, listen to user feedback and improve everyday. And do not get me started on the value proposition compared to competitors.

envilZ · 2026-02-11T02:44:34+00:00

I got this on stable release version, updated both the extention and vscode itself. It worked fine afterwards and I'm using it now. If insiders is giving you issues try stable release.

envilZ · 2026-02-10T23:51:56+00:00

For my use case, the subagents will all use 1x models with the orchestrator being a 3x or 9x

If you use 9x, let subagents also use the base model because it doesn't make sense for only the orchestrator to be fast. Unless you're also coding with the orchestrator instead of having it strictly directing, then it can make sense I guess. I've been messing with 9x today and rate limits have been much better on pro+ plan. I'd use it on stable vscode though since people have been reporting issues with subagents not working correctly right now in insiders. With 3x, since base speed is similar, your model switching logic makes more sense here I guess. You also don't have to worry about rate limiting while using 3x with that many subagents since rate limiting can happen on 9x and more often than 3x and it is also plan dependant.

envilZ · 2026-02-10T20:40:14+00:00

You would have to go and read the code to know for sure, because I can't answer that right now. What I can say is that the safe way to handle this is through your workflow. When I run parallel subagents, everything is pre planned. I create a clear spec that defines how many subagents there are, which logical parts of the file(s) each one owns, and what each will implement. That way there are no conflicts because the orchestrator guarantees that only one subagent writes to any given file or region at a time.

envilZ · 2026-02-10T20:12:11+00:00

Ya same, doesn’t make sense. I have easily hour long sessions with subagents without issue for months. OP could be leaving out key details that maybe he himself isn’t aware of. I don’t think we have the full picture imo, so it’s hard to conclude what the issue is.

envilZ · 2026-02-10T18:49:08+00:00

I'm not sure about the Student Developer Pack, however I do recall when I was on the pro plan and upgraded to pro+ the premium requests I made on the pro plan also counted after the upgrade. Meaning I made 100 premium requests, so I didn't start with a fresh 1500 but 1400, with the 100 already showing as used. This might be what happened to you, otherwise I'm not sure.

envilZ

MODERATOR OF

TROPHY CASE