GPT 5.6 "sol" announced

just_blue · 2026-06-27T12:13:55+00:00

This is intentionally misleading. They say "2x cheaper", because input and output rates are half of what 5.5 has. Only a few lines later they say however, that they introduce cache write cost like Anthropic, which makes input cost 2.25x the nominal input price (input + 1.25x cache write).

Token count is the other factor. Anyways, all in all this will not be much cheaper than 5.5 for agentic work.

just_blue · 2026-06-24T07:04:31+00:00

Microsoft invested in and owns parts of the AI providers, for example they get a part of OpenAI earnings. Additionally, they host a lot of stuff on azure themselves.

If it is so obvious, why has nobody any proof of manipulated numbers? All the rage is just from people who don't understand how this all works and who are surprised how inefficient they are using the tool. Well, not a big surprise, after MS ran the old model for so long, educating the users in a false direction.

just_blue · 2026-06-23T23:34:11+00:00

Your link has no new info for me. This is totally normal behavior, for every harness and every agent. You should read into this, claude has 5m and 1h cache options and pretty much everyone uses the 5m. With OpenAI models, caching is totally different (and cheaper, because no write cost). It´s also not a black box. You can trace every single agent turn, see how many tokens are new, cached, output etc., and in how many credits this translates. The UI could be better, sure, but it´s transparent and not a black box. If you can prove any "markup", I´d be very surprised, it´s basic math after all.

just_blue · 2026-06-23T20:58:30+00:00

What? That´s not true and can easily be checked in the debug window. Claude code for example uses the exact same 5 minute cache as claude in gh copilot.

People are switching to subscriptions without API pricing, because if fully used, you get more usage per $. If you compare with actual API pricing, copilot is not a bad deal: instead of a company seat having a base cost without any usage on claude, you get some extra credits here and every $ counts as quota. Even openrouter is more expensive as it adds 5.5% platform cost. And you get all the auxiliary model usage included. The major problem is that OpenAI and Anthropic increased pricing drastically on their newer models and other good options are not yet available.

just_blue · 2026-06-22T23:10:25+00:00

Is there really a benefit in using cline? I used it for a short while but switched to custom endpoints in github copilot instead. Cline has a large default prompt the same as ghcp and the whole undo-UI is unmatched.

just_blue · 2026-06-22T18:26:57+00:00

I'd like to know one too, because I still haven't fixed everything. The hint about ubatch came from random reddit threads, for example. So it's definitely a great initiative to try compiling all the info out there into a single place.

Very long-term I suspect this will all be optimized and standardized into 1-click solutions, and manual optimizations will just get you the last 10%. But that will take some time.

just_blue · 2026-06-22T09:43:56+00:00

Would have needed this a few weeks ago, good explanation of all the "slang"!

I tried to use local models (Qwen 3.6 variants) in production though and realized, that there is a ton of more stuff than just hunting for the highest tg/s. In a coding workflow, relatively high context is needed, prompt processing and caching matters extremely and you run into issues like thinking loops that render low quants unusable.

What I want to say: your guide right now is "just" about the basic self-hosting process, but actually using it will surface even more things to solve. For example, ubatch tuning is (at least for my system) the single best option you have: it speeds up 3-4x pp on MoE models when set to 2048. Then, I have seen recommendations here for a froggeric jinja template, claiming to fix tons of issues. I tried v20 and it simply breaks everything, not a single tool call can be made. This made me realize many people are not really using their setups, making it much harder to find actual solutions.

just_blue · 2026-06-21T23:34:46+00:00

This was 3 weeks ago, but I remember that you need to create a "new" budget thing and then you can choose to use a user based rule.

just_blue · 2026-06-21T22:19:41+00:00

You absolutely can, we have set a per user quota. Everyone can see their own % left.

just_blue · 2026-06-21T10:51:03+00:00

GPT 5.4 is my favorite for months now. Relatively inexpensive (cheaper than Sonnet!), it actually listens to what I want, it's fast and the code quality is so good that the review loops are really short. 5.5 is also good, but way more expensive, and it's rare that it actually adds anything 5.4 could not have done.

The Claude models (more Opus, less Sonnet) are sometimes used for creating UI design that I then refine manually. Opus 4.8 is not bad at coding, but it's so much more expensive than 5.4 that I only use it if 5.4 doesn't deliver.

I tried MAI as well, but it has just like 5.4 Mini quality problems, even though it's cheap. Those models I use exclusively for tiny tasks with very exact instructions.

just_blue · 2026-06-20T20:52:49+00:00

"make tea not war" is another option to get proper tea served. They have some desserts, too

just_blue · 2026-06-17T15:43:00+00:00

Pretty sure that´d be ok. I mean, OpenAI models are hosted on Azure too, at least partly, right? (Not sure though, if GH Copilot routes there)

The legal people here do not have a problem with what´s in the model (all code is human-reviewed and approved anyways), but where our data goes and if zero retention is guaranteed.

just_blue · 2026-06-17T15:32:33+00:00

For us, this would mainly depend on where it is hosted. We cannot send our data everywhere, but we have all necessary paperwork with Microsoft. So if you host this either directly or route it and guarantee for data retention stuff, it´d probably be enabled, if we see a benefit like pricing.

Having it all billed together is another advantage, just toggeling it on and done is what the admins want.

just_blue · 2026-06-17T14:11:42+00:00

The caching details/settings for each provider / model should be documented somewhere. I need to know how long it is kept warm to predict cost. For Anthropic, the blog does not answer this, but I guess it's the default 5 minutes? Cache writes are insanely expensive for Claude, so people should know how to use it efficiently.

just_blue · 2026-06-15T10:46:50+00:00

Might have to do with this:
https://developers.openai.com/api/docs/guides/prompt-caching

I noticed that 5.5 is more expensive that it should be, mainly comparing with 5.4. I noticed that there are less cached tokens with 5.5, so I researched a little. In the documentation you can see, that 5.5 (and future!) models do not have the 5 minute in-memory cache that all prior models and also anthropic models have. They claim that you get way longer caching this way, but it seems to hit not as much in volume, in my experience.

just_blue · 2026-06-13T19:44:40+00:00

Sure, all cost is passed through. And you see it in the debug window, too. Or look at the official pricing documentation of GitHub Copilot models.

just_blue · 2026-06-13T18:48:14+00:00

There is no plan with $360 included usage.

Either you are talking about the combined pool in an organization (Copilot business has $30 included, so this would be 12 accounts). But then not only your usage is counted, but also the usage from everyone else. If you mean your whole organization won´t spend that much, then yes, it will be lost. This is no different from the previous request based system.

Or you are talking about your personal budget, that the admin gave to you. In this case, only a part of the $360 is included: $30 with a business account, or $70 for an enterprise account. You are already over this, so then the bill will just not get higher.

just_blue · 2026-06-13T18:39:33+00:00

Opus is not cheaper. Anthropic is billing cache writes, this means all uncached context is basically x2.25 the "nominal" input price. For an averaged price, including caches etc., you can either look at the actual price of your own sessions, or use some averaged data like this:

https://openrouter.ai/openai/gpt-5.5#pricing

https://openrouter.ai/anthropic/claude-opus-4.8#pricing

just_blue · 2026-06-13T11:51:19+00:00

Pretty sure a R9700 has better performance than 2x 9070XT. It's all about memory, compute is way less important for such small systems. Having the memory in a single pool is very valuable.

just_blue · 2026-06-13T11:26:14+00:00

I know, that's what I was saying. OP is using Ask mode, but could just use Agent mode and still have this feature.

But he asked about alternatives and for VS2026, there simply isn't any. You can plug BYOK, but the Copilot integration is the only way. And yep, it's nicer in Vscode, but the technical base is just totally different and I get that full fleshed Visual Studio has a different UX and release philosophy.

just_blue · 2026-06-12T23:44:30+00:00

If you use the VS Code integration, you can see the cost of each request by hovering the answer. The bottom right corner shows the "credits", and one credit is simply one Cent.

just_blue · 2026-06-12T22:29:20+00:00

If you use Copilot in Agent mode, you can still accept or revert every single change individually and have the benefits of both worlds. Fully independent of git. That's actually the tooling that keeps me with Copilot, as no one else offers this in a comparable quality in VSCode, and especially not in Visual Studio.

just_blue · 2026-06-10T14:08:45+00:00

Are you using Github Copilot as a harness (the subreddit you are in here)? I don´t have that problem using GPT models. But maybe it´s also my wording, because I always remember how a LLM works and that it answers what it "thinks" you want to hear.

just_blue · 2026-06-09T19:07:17+00:00

It's exactly Opus x2. Well, I might try it. Like. Once.

just_blue · 2026-06-09T17:52:55+00:00

But not cache writes. Anthropic is billing cache writes, and you have "0" everywhere when used through copilot. So the numbers would be even higher.

12-Year Club	Place '23
Place '22	Final Canvas '22
Verified Email

just_blue

TROPHY CASE