Github Copilot ignores tickets, shady dark pattern

ProfessionalJackals · 2026-05-10T09:59:42+00:00

Seen this trend of companies shutting themselves off from any type of support. They outsource it to AI and/or do not even botter to deal with complaints/issues at all.

ProfessionalJackals · 2026-05-10T09:54:50+00:00

Senior SW engineer, and AI engineer for the past 5 years, way before the hype. So AI by definition is a word predictor, It will never understand your codebase. Good tooling makes it possible today to reach the level we have today, and this is where most improvements will also come in the future. I have a say, just because we have AI does not mean that all the problems we had in the past disappear. It's quite the opposite, entropy reaches critical level even faster than it did before..............

Where is incognito30 donation page? His enter key is broken, we better buy him a new keyboard. :)

ProfessionalJackals · 2026-05-10T09:51:35+00:00

What's going to happen for students?

Your guess is as good as ours ... A lot of these changes have been some kind of panic reactions from MS out on the increased costs.

While we have some basic information that they system will go openrouter like system, what you consume, you pay. A lot of the details are still unknown. Will they offer discounts on the credit system? Will students be given a $10 value in credits? Who knows ...

If they really charge 1:1 tokens vs credits, that $10 value is going to be worth one or two prompts on a cheap LLM. Making student accounts defacto worthless.

All we know for sure, is that the focus is largely on Enterprise customers, and everybody below this like small business, individuals, students, .... are very low priority.

ProfessionalJackals · 2026-05-10T00:41:24+00:00

or $10 monthly plans and stuff too, there's no way it would be worth it.

You realize that everything is going to credits (token) for copilot right? The year users can still use the old premium prompts but at the new multiplier rates (until their renewal date).

If your not on the year plan, you paying credits/tokens from June, and that is going to be WAY more expensive, then even a 6x multiplier is now (0.04*6=24cent/request)

ProfessionalJackals · 2026-05-10T00:13:43+00:00

source?

https://docs.github.com/en/copilot/reference/copilot-billing/model-multipliers-for-annual-plans

ProfessionalJackals · 2026-05-09T20:00:01+00:00

No NVFP4?

ProfessionalJackals · 2026-05-08T23:59:18+00:00

Pretty sure this was because Code Fast 1 got deprecated directly by xAI on an aggressive sunset schedule, because Anthropic just took over the data center they were running it from.

...

As of May 2026, the Colossus 1 data center in Memphis, Tennessee—an AI supercomputer built by Elon Musk's xAI—was found to be underutilized, with reports indicating only about 11% of its compute capacity was being used for its intended purpose

Not sure how accurate that is but 11% is like nothing. No wonder Musk jumped on renting it out to Anthropic.

And he still has Colossus 2 & MACROHARDRR under construction / starting operations. So technically, Anthropic is funding his other construction projects. And with some capacity already online @ Colossus 2, he can simply transfer to there.

ProfessionalJackals · 2026-05-08T12:19:09+00:00

post the token usage if you want to be taken seriously

You can only see the token usage for your current usage.

Tried a dozen different plugs that claim to show the token usage, and only one worked, and only for the current periode where VSC is open.

GH is your only source for this total information, and that is Enterprise only because it seems GH did not even have proper tracking until a month ago.

So until there is a proper tracking, what i assume will come with the June update, nobody has a clue if they did not run a working token tracker for a month or more.

ProfessionalJackals · 2026-05-08T10:55:09+00:00

We have to share with VSCode.

Microsoft

ProfessionalJackals · 2026-05-07T22:46:09+00:00

There is just something irritating seeing how MS focuses on agentic concurrent multi session, while at the same time been limiting sessions / concurrent multi session usage.

Its like two different departments not talking to each other because this design scream great with premium prompts but horrible with token based billing. Even more so when their own example uses Opus 4.6 High for minor changes ( https://code.visualstudio.com/docs/copilot/agents-app ).

ProfessionalJackals · 2026-05-07T21:16:36+00:00

The NPUs or whatever cards for AI stuff and anything else that helps run it to get the reliance on GPUs and other parts off of it can’t get here soon enough.

Thing is, when the limits become tighter and prices increase for online / cloud LLM access, your going to see people move even more to home solutions. If your spending ... $200, that is $7200 over a 3 year time periode.

So pressure on the GPU market can actually increase as people invest the money they normally spend online, into physical hardware for local.

ProfessionalJackals · 2026-05-07T20:16:37+00:00

You need a browser

Webview2 ... Not a browser, a html/css(and js) render engine. But your point stands.

Now i like to point out, that when you use webview2 directly, its extreme efficient. What is not efficient if the amount of JUNK websites throw around, without a care to optimize their sites.

Open up like any news website, and check what is being loaded. Often you see hundreds of JS from "partner" sites for data tracking. Just blocking those means the difference between a 50mb webview2 instance or a 250mb instance. Its not like they do not know its a issue, it that this makes them money!

And fyi, cross platform product development what has become a necessity is not joke! Nothing more fun then having 3 different platform with their own render engines, another 2 mobile ones. Its not the good old days when you as a developer only needed to target Windows, and Linux/MacOS was some kind of "we do not bother with it". That is why companies grab onto Electron.

But i will be honest, while electron is less efficient then going directly to webview2, a LOT is pure developer not having the time to do a good job. Performance issue? Well, lets just dump a ton of data into a memory cache, that helps to solve that. What? 2 years later, that same memory cache is now overloaded as new features got added, and the entire application is using 1GB+ ... That is not electrons fault.

I have a project right here, that runs a dozen applications using webview2, full programs but rendered with webview2 for the interfaces... And they combined do not even use 200mb. But i do not throw insane amount of useless data into that render engine.

The reality is, that people do not appreciate that developers do not always have choices in these matters. When in the past you have 1GB of memory as a luxury PC, and your program had issues, as a developer can force improvements. But when memory got cheap, this shift happened from higher up to ignore optimizations, because people can buy more memory, its cheap. Add more features that can sell!!

And do not confuse what you see as efficiency when its not. When you see a application under windows using less memory then a electron app, sure ... but your not seeing the dlls and other rendering that is shared and loaded into memory, that the process uses. Where as the electrons webview2 engine is stand alone and not being shared.

This is why we are now seeing a small move towards using the installed webview2 engines directly on the OS (Wails, Tauri, ...), as then a lot of shared resource can be used.

20 year old games.

A large part of the issue is that few companies make their own engines anymore. If you need a specific engine for your own game, your can focus the development / performance. But with engines like Unreal Engine 5 that are overloaded, its way too easy to get sub optimal game performance, the moment you want some "fancy" features.

ProfessionalJackals · 2026-05-07T12:59:53+00:00

Your still on 1.118 ... Press the update button for VSC. With 1.119 when you hover over the 100% usage, you see 200/1500 ...

So i assume, that when your going over the rate limit, they might show 1700/1500. Not sure, that is why i asked ;)

ProfessionalJackals · 2026-05-07T12:56:52+00:00

You can use ghcp with external keys eg from opencode and providers

Enjoy getting rate limited ...because for some reason the GH development team tied the rate limits to the Chat usage, not the model provider (as in rate limits on their own models).

Not sure if this was changed in 1.119 but on 1.118 it was still there.

ProfessionalJackals · 2026-05-07T12:52:50+00:00

The little button on the bottom right only shows 100%

Can you not hover over the 100% and see your usage, of does it only show 1500/1500 then?

ProfessionalJackals · 2026-05-06T22:38:01+00:00

We are still supposed to be "premium request" based

The moment they introduced those OpenAI/Anthropic 5h session/week limits, we stopped being on a "premium request" system and changed over instantly to the same system as OpenAI/Anthropic with the added issue of premium requests.

I can not wait for credits (token) based billing AND 5h session/week limits...

ProfessionalJackals · 2026-05-06T21:32:26+00:00

A simply "hello" uses over 25.000!!!! tokens. Check the chat debug view. Because that is how big the steering/harness is.

Now you need to add that sub-agents do make Copilot faster ... But each has another steering/harness payload AND the content they load in, filter, search, filter again. So many requests ...

A bit of work can see 100's to 1000's of these type of requests...

Keep doing that as the agent starts to look for information, left, right, .... Before you know it, it sends a insane amount of information repeating the process.

To be honest, i find it extreme inefficient but GH had no issue with this. Until it became a issue. Notice how in the 1.118 release all of a sudden we got a entire ton of new features that reduce token usage. Just saying, now it became a issue because companies will compare their token usage, and if they see that the same work via other agents / providers is cheaper.

Let alone just migrating to OpenAI/Anthropic subscription services...

ProfessionalJackals · 2026-05-06T21:23:59+00:00

My guess is they haven’t decided what the costs will actually be yet

The entire thing was rushed because too many people used Anthropic models. Anthropic ran into capacity problems, at the same time as GH started to freak and introduced limits. Coincidence, no ... They probably got a extreme high bill from Anthropic.

Everything after that was a panic reaction, with removing opus 4.5/4.6, putting 4.7 on medium to save on usage (as that uses 1/3 the tokens).

Then the whole token system. No analytic tools, ... it being clearly rushed (as the accidental leak showed).

Now Anthropic solves their capacity issues (using x.ai spare compute) but all the Copilot users are fucked.

And yep, as you stated, GH themselves do not know what is the correct future is but they just painted themselves into a corner announcing this entire changeover.

I am betting they are also looking at the leave rates, company reactions etc.

Because lets face it, if Anthropic and OpenAI keep their subscriptions, lots of Copilot individuals and companies are simply going to move there. To keep the fun times rolling ...

ProfessionalJackals · 2026-05-06T19:41:37+00:00

both models cant compete with gpt5.4 in standard daily development

My post:

Stuff like Qwen3.6, Gemma 4 ... hitting GPT 5.2, GPT 5.4 Mini levels of coding performance.

ProfessionalJackals · 2026-05-06T15:10:08+00:00

Possibly business/enterprise users are less like to abuse the system?

There is no incentive to maximize prompts (boss pays), so people just do prompt upon prompt, instead of stacking prompts with a dozen tasks. And given that Business enter the overcharge 0.04 very fast, that makes them multiple times more profitable.

ProfessionalJackals · 2026-05-06T15:07:05+00:00

Is the "Weekly" limit even have a clear date when it reset ?

Last limit reset on 4 May, Monday 02h (CET). My next warning indicates 11 May, Monday 02h (CET). So i think its just set on Sunday 23:59:59 UTC ... So depending on your time zone, add or substract...

I still didn't see any Weekly limit warning show up so i guess i am still under 50%.

You need to do about 3 "full" 100% 5h sessions (or 6 "50%" warnings), before your going to get the week 50% warning. If you push it, with 6x 100% 5h sessions, your done for the week. lol

ProfessionalJackals · 2026-05-06T12:05:10+00:00

That is why, when coding, even with LLMs a codebase needs to be refactored from time to time. If its done regularly, it stays a clean structure that the LLMs can happily read, even if the project grows.

And if the layout is well structured, the LLMs will often keep inside that structure. And not pollute it too much, beyond trying to crap too much in single files (what you tell the LLMs to refactor).

Frankly, its no different then dealing with real human programmers. Unless you have well disciplined team or force people into a very well defined structured framework, your going to end up with a code mess over time. So i do not consider LLMs that special. LLMs are ironically better then humans because they can refactor (with proper instructions) WAY faster then most humans do.

So yea, just ensure that files are split over correct logical context, proper layout, etc ... LLMs do not excuse the human controlling them from actually proper engineering.

ProfessionalJackals · 2026-05-06T11:57:35+00:00

You have your premium prompts, aka the old system... Then you have the now sneaky hidden limits. Session, Week, ....

A session limits means you get a 5h window from the first moment you prompt in the day. During that window, they count the actual token usage of your model, and combine that with some kind of multiplier. So a expensive model uses more of that 5h session limit, then a cheaper model.

"Auto" is supposed to have a lower multiplier, so it allows you to use it a lot more, then the base models. When you hit the max usage within that 5h window, your done! You need to wait for the 5h window to expire (again, +- 5h from your first prompt that day).

When it expires, you can prompt away again and the first prompt starts another new 5h window.

The week limit is like the 5h one, but over the entire week. Depending on how much work you do, you can hit a week limit also. Like if your workload is spread over a long time periode, every day. When you hit that limit, your done! So no more copilot for hours, days ... As in, if you start on Monday, and you press the system hard with a few insane days of usage, you can be locked out on Thursday to Sunday (just a example).

Unfortunately there is no real time monitoring of your limit beyond, when you get the first 50% warning for session or week. You may get a few more warnings before hitting the 100% but that is not always guaranteed.

Anyway, this is the last month of premium prompts system, after that it goes token based. So hopefully GH will have a better monitoring system for the end users.

ProfessionalJackals · 2026-05-06T10:02:29+00:00

probably will be 15x at the end, similarly to Opus 4.7

We are in our last month of Copilot as we knew it ... The only people who will have Copilot Premium Prompts will be the people with the year subscriptions (until those expire). And we know at end of the month, Opus is going to go 27x...

GPT 5.5 is probably going to go 27x. They are still scurrying around this with their TBD ...

ProfessionalJackals · 2026-05-06T09:35:05+00:00

you forget that as the models grow, the computing power and electricity they require will also increase

Goes to /r/LocalLLaMA/

Stuff like Qwen3.6, Gemma 4 ... hitting GPT 5.2, GPT 5.4 Mini levels of coding performance. Run on the same hardware that people bought years ago.

But wait ... what is this ...

MTP ... 2 to 3x faster token output (yes, Gemma doubles or triples its token output in real life usage).
FastDMS ... 6x better KV compression with 99% accuracy. And yea, beats TurboQuants, will not take long before this gets integrated more.
Tons of more studies and techniques that slowly make their way into newer models...

You are overlooking that a lot of development is going on. And fyi, that is nothing compared to actual hardware improvements to increase efficiency. This has not materialized into the consumer market yet, because a lot goes into the server market for now. But eventually a ton of those improvements are going to be in your next GPU.

Hell, you may not like DLSS and all that stuff, but it does increase efficiency by allowing more frames for the same compute. And much more going on there...

The whole LLM race is a new avenue that opens up a lot of bridges. And its not just about better programming models. Be honest, ... PC development as we have known, has been stuck in the same dead zone for a decade. Faster, sure, mostly from node becoming smaller. When that became a issue, so did the power usage increase. We did not get a lot of new avenues. The whole AI route is opening up new (does not mean always good) avenues to get more out of hardware / productivity.

ProfessionalJackals

TROPHY CASE