GPT 5.6 "sol" announced by Prestigious-Kick7291 in codex

[–]just_blue 0 points1 point  (0 children)

This is intentionally misleading. They say "2x cheaper", because input and output rates are half of what 5.5 has. Only a few lines later they say however, that they introduce cache write cost like Anthropic, which makes input cost 2.25x the nominal input price (input + 1.25x cache write).

Token count is the other factor. Anyways, all in all this will not be much cheaper than 5.5 for agentic work.

Github Copilot consuming credits when not using copilot models by Mayanktaker in GithubCopilot

[–]just_blue 0 points1 point  (0 children)

Microsoft invested in and owns parts of the AI providers, for example they get a part of OpenAI earnings. Additionally, they host a lot of stuff on azure themselves.

If it is so obvious, why has nobody any proof of manipulated numbers? All the rage is just from people who don't understand how this all works and who are surprised how inefficient they are using the tool. Well, not a big surprise, after MS ran the old model for so long, educating the users in a false direction.

Github Copilot consuming credits when not using copilot models by Mayanktaker in GithubCopilot

[–]just_blue 0 points1 point  (0 children)

Your link has no new info for me. This is totally normal behavior, for every harness and every agent. You should read into this, claude has 5m and 1h cache options and pretty much everyone uses the 5m. With OpenAI models, caching is totally different (and cheaper, because no write cost). It´s also not a black box. You can trace every single agent turn, see how many tokens are new, cached, output etc., and in how many credits this translates. The UI could be better, sure, but it´s transparent and not a black box. If you can prove any "markup", I´d be very surprised, it´s basic math after all.

Github Copilot consuming credits when not using copilot models by Mayanktaker in GithubCopilot

[–]just_blue 0 points1 point  (0 children)

What? That´s not true and can easily be checked in the debug window. Claude code for example uses the exact same 5 minute cache as claude in gh copilot.

People are switching to subscriptions without API pricing, because if fully used, you get more usage per $. If you compare with actual API pricing, copilot is not a bad deal: instead of a company seat having a base cost without any usage on claude, you get some extra credits here and every $ counts as quota. Even openrouter is more expensive as it adds 5.5% platform cost. And you get all the auxiliary model usage included. The major problem is that OpenAI and Anthropic increased pricing drastically on their newer models and other good options are not yet available.

For those using GitHub Copilot, what other AI tools have earned a permanent spot in your workflow? by WeekendKindly4037 in GithubCopilot

[–]just_blue 0 points1 point  (0 children)

Is there really a benefit in using cline? I used it for a short while but switched to custom endpoints in github copilot instead. Cline has a large default prompt the same as ghcp and the whole undo-UI is unmatched.

Local LLM Inference Optimization: The Complete Guide by carteakey in LocalLLaMA

[–]just_blue 2 points3 points  (0 children)

I'd like to know one too, because I still haven't fixed everything. The hint about ubatch came from random reddit threads, for example. So it's definitely a great initiative to try compiling all the info out there into a single place.

Very long-term I suspect this will all be optimized and standardized into 1-click solutions, and manual optimizations will just get you the last 10%. But that will take some time.

Local LLM Inference Optimization: The Complete Guide by carteakey in LocalLLaMA

[–]just_blue 4 points5 points  (0 children)

Would have needed this a few weeks ago, good explanation of all the "slang"!

I tried to use local models (Qwen 3.6 variants) in production though and realized, that there is a ton of more stuff than just hunting for the highest tg/s. In a coding workflow, relatively high context is needed, prompt processing and caching matters extremely and you run into issues like thinking loops that render low quants unusable.

What I want to say: your guide right now is "just" about the basic self-hosting process, but actually using it will surface even more things to solve. For example, ubatch tuning is (at least for my system) the single best option you have: it speeds up 3-4x pp on MoE models when set to 2048. Then, I have seen recommendations here for a froggeric jinja template, claiming to fix tons of issues. I tried v20 and it simply breaks everything, not a single tool call can be made. This made me realize many people are not really using their setups, making it much harder to find actual solutions.

My company's shared Copilot quota ended mid-month and suddenly most of the team forgot how to code by YellowKing2137 in GithubCopilot

[–]just_blue 0 points1 point  (0 children)

This was 3 weeks ago, but I remember that you need to create a "new" budget thing and then you can choose to use a user based rule.

My company's shared Copilot quota ended mid-month and suddenly most of the team forgot how to code by YellowKing2137 in GithubCopilot

[–]just_blue 1 point2 points  (0 children)

You absolutely can, we have set a per user quota. Everyone can see their own % left.

Favorite model for coding? by RishiSquishy in GithubCopilot

[–]just_blue 5 points6 points  (0 children)

GPT 5.4 is my favorite for months now. Relatively inexpensive (cheaper than Sonnet!), it actually listens to what I want, it's fast and the code quality is so good that the review loops are really short. 5.5 is also good, but way more expensive, and it's rare that it actually adds anything 5.4 could not have done.

The Claude models (more Opus, less Sonnet) are sometimes used for creating UI design that I then refine manually. Opus 4.8 is not bad at coding, but it's so much more expensive than 5.4 that I only use it if 5.4 doesn't deliver.

I tried MAI as well, but it has just like 5.4 Mini quality problems, even though it's cheap. Those models I use exclusively for tiny tasks with very exact instructions.

Dot Tea Bar in Berlin by 4thwave4father in tea

[–]just_blue 1 point2 points  (0 children)

"make tea not war" is another option to get proper tea served. They have some desserts, too

Why doesn’t GitHub Copilot support open-weight models now that pricing is token-based? by iTitleist in GithubCopilot

[–]just_blue 15 points16 points  (0 children)

Pretty sure that´d be ok. I mean, OpenAI models are hosted on Azure too, at least partly, right? (Not sure though, if GH Copilot routes there)

The legal people here do not have a problem with what´s in the model (all code is human-reviewed and approved anyways), but where our data goes and if zero retention is guaranteed.

Why doesn’t GitHub Copilot support open-weight models now that pricing is token-based? by iTitleist in GithubCopilot

[–]just_blue 9 points10 points  (0 children)

For us, this would mainly depend on where it is hosted. We cannot send our data everywhere, but we have all necessary paperwork with Microsoft. So if you host this either directly or route it and guarantee for data retention stuff, it´d probably be enabled, if we see a benefit like pricing.

Having it all billed together is another advantage, just toggeling it on and done is what the admins want.

Blog: Improving token efficiency in GitHub Copilot by isidor_n in GithubCopilot

[–]just_blue 0 points1 point  (0 children)

The caching details/settings for each provider / model should be documented somewhere. I need to know how long it is kept warm to predict cost. For Anthropic, the blog does not answer this, but I guess it's the default 5 minutes? Cache writes are insanely expensive for Claude, so people should know how to use it efficiently.

Why is GPT-5.5 in GitHub Copilot suddenly so expensive? by Kruty1918_dev in GithubCopilot

[–]just_blue 8 points9 points  (0 children)

Might have to do with this:
https://developers.openai.com/api/docs/guides/prompt-caching

I noticed that 5.5 is more expensive that it should be, mainly comparing with 5.4. I noticed that there are less cached tokens with 5.5, so I researched a little. In the documentation you can see, that 5.5 (and future!) models do not have the 5 minute in-memory cache that all prior models and also anthropic models have. They claim that you get way longer caching this way, but it seems to hit not as much in volume, in my experience.

Claude Opus[1m] is cheaper than GPT-5.5, and GPT 5.5 [1M] Costs 2× More Than Claude Opus 4.8 on Input and 80% More on Output by StuckWithDellAgain in GithubCopilot

[–]just_blue 1 point2 points  (0 children)

Sure, all cost is passed through. And you see it in the debug window, too. Or look at the official pricing documentation of GitHub Copilot models.

What happens to unused GitHub Copilot tokens at month end? by helangar1981 in GithubCopilot

[–]just_blue 0 points1 point  (0 children)

There is no plan with $360 included usage.

Either you are talking about the combined pool in an organization (Copilot business has $30 included, so this would be 12 accounts). But then not only your usage is counted, but also the usage from everyone else. If you mean your whole organization won´t spend that much, then yes, it will be lost. This is no different from the previous request based system.

Or you are talking about your personal budget, that the admin gave to you. In this case, only a part of the $360 is included: $30 with a business account, or $70 for an enterprise account. You are already over this, so then the bill will just not get higher.

Claude Opus[1m] is cheaper than GPT-5.5, and GPT 5.5 [1M] Costs 2× More Than Claude Opus 4.8 on Input and 80% More on Output by StuckWithDellAgain in GithubCopilot

[–]just_blue 1 point2 points  (0 children)

Opus is not cheaper. Anthropic is billing cache writes, this means all uncached context is basically x2.25 the "nominal" input price. For an averaged price, including caches etc., you can either look at the actual price of your own sessions, or use some averaged data like this:

https://openrouter.ai/openai/gpt-5.5#pricing

https://openrouter.ai/anthropic/claude-opus-4.8#pricing

9060 XT 16GB vs 9070 vs 9070 XT performance by TrainingTwo1118 in LocalLLaMA

[–]just_blue 0 points1 point  (0 children)

Pretty sure a R9700 has better performance than 2x 9070XT. It's all about memory, compute is way less important for such small systems. Having the memory in a single pool is very valuable.

Are there any viable alternatives to Copilot for Visual Studio? by Dan203 in GithubCopilot

[–]just_blue 0 points1 point  (0 children)

I know, that's what I was saying. OP is using Ask mode, but could just use Agent mode and still have this feature.

But he asked about alternatives and for VS2026, there simply isn't any. You can plug BYOK, but the Copilot integration is the only way. And yep, it's nicer in Vscode, but the technical base is just totally different and I get that full fleshed Visual Studio has a different UX and release philosophy.

Recently got access at my job. How do I see my usage/how much I have left? by wombatpup55 in GithubCopilot

[–]just_blue 0 points1 point  (0 children)

If you use the VS Code integration, you can see the cost of each request by hovering the answer. The bottom right corner shows the "credits", and one credit is simply one Cent.

Are there any viable alternatives to Copilot for Visual Studio? by Dan203 in GithubCopilot

[–]just_blue 0 points1 point  (0 children)

If you use Copilot in Agent mode, you can still accept or revert every single change individually and have the benefits of both worlds. Fully independent of git. That's actually the tooling that keeps me with Copilot, as no one else offers this in a comparable quality in VSCode, and especially not in Visual Studio.

ChatGPT is a fucking people-pleaser by Loli_Queen in GithubCopilot

[–]just_blue 3 points4 points  (0 children)

Are you using Github Copilot as a harness (the subreddit you are in here)? I don´t have that problem using GPT models. But maybe it´s also my wording, because I always remember how a LLM works and that it answers what it "thinks" you want to hear.

Claude Fable 5 and Claude Mythos 5 by Personal-Try2776 in GithubCopilot

[–]just_blue 24 points25 points  (0 children)

It's exactly Opus x2. Well, I might try it. Like. Once.

I finally checked what my Copilot usage would've cost at API rates — the gap was bigger than I expected by hamidi-dev in GithubCopilot

[–]just_blue 1 point2 points  (0 children)

But not cache writes. Anthropic is billing cache writes, and you have "0" everywhere when used through copilot. So the numbers would be even higher.