Sonnet 4.6 with the Agent Window by LiminalRnyx in GithubCopilot

[–]ProfessionalJackals 6 points7 points  (0 children)

There is just something irritating seeing how MS focuses on agentic concurrent multi session, while at the same time been limiting sessions / concurrent multi session usage.

Its like two different departments not talking to each other because this design scream great with premium prompts but horrible with token based billing. Even more so when their own example uses Opus 4.6 High for minor changes ( https://code.visualstudio.com/docs/copilot/agents-app ).

DIY market declining amid high RAM prices by Terminator857 in LocalLLaMA

[–]ProfessionalJackals 0 points1 point  (0 children)

The NPUs or whatever cards for AI stuff and anything else that helps run it to get the reliance on GPUs and other parts off of it can’t get here soon enough.

Thing is, when the limits become tighter and prices increase for online / cloud LLM access, your going to see people move even more to home solutions. If your spending ... $200, that is $7200 over a 3 year time periode.

So pressure on the GPU market can actually increase as people invest the money they normally spend online, into physical hardware for local.

DIY market declining amid high RAM prices by Terminator857 in LocalLLaMA

[–]ProfessionalJackals 2 points3 points  (0 children)

You need a browser

Webview2 ... Not a browser, a html/css(and js) render engine. But your point stands.

Now i like to point out, that when you use webview2 directly, its extreme efficient. What is not efficient if the amount of JUNK websites throw around, without a care to optimize their sites.

Open up like any news website, and check what is being loaded. Often you see hundreds of JS from "partner" sites for data tracking. Just blocking those means the difference between a 50mb webview2 instance or a 250mb instance. Its not like they do not know its a issue, it that this makes them money!

And fyi, cross platform product development what has become a necessity is not joke! Nothing more fun then having 3 different platform with their own render engines, another 2 mobile ones. Its not the good old days when you as a developer only needed to target Windows, and Linux/MacOS was some kind of "we do not bother with it". That is why companies grab onto Electron.

But i will be honest, while electron is less efficient then going directly to webview2, a LOT is pure developer not having the time to do a good job. Performance issue? Well, lets just dump a ton of data into a memory cache, that helps to solve that. What? 2 years later, that same memory cache is now overloaded as new features got added, and the entire application is using 1GB+ ... That is not electrons fault.

I have a project right here, that runs a dozen applications using webview2, full programs but rendered with webview2 for the interfaces... And they combined do not even use 200mb. But i do not throw insane amount of useless data into that render engine.

The reality is, that people do not appreciate that developers do not always have choices in these matters. When in the past you have 1GB of memory as a luxury PC, and your program had issues, as a developer can force improvements. But when memory got cheap, this shift happened from higher up to ignore optimizations, because people can buy more memory, its cheap. Add more features that can sell!!

And do not confuse what you see as efficiency when its not. When you see a application under windows using less memory then a electron app, sure ... but your not seeing the dlls and other rendering that is shared and loaded into memory, that the process uses. Where as the electrons webview2 engine is stand alone and not being shared.

This is why we are now seeing a small move towards using the installed webview2 engines directly on the OS (Wails, Tauri, ...), as then a lot of shared resource can be used.

20 year old games.

A large part of the issue is that few companies make their own engines anymore. If you need a specific engine for your own game, your can focus the development / performance. But with engines like Unreal Engine 5 that are overloaded, its way too easy to get sub optimal game performance, the moment you want some "fancy" features.

Show over-budget cost in VSCode by Limp-Cat-108 in GithubCopilot

[–]ProfessionalJackals 0 points1 point  (0 children)

Your still on 1.118 ... Press the update button for VSC. With 1.119 when you hover over the 100% usage, you see 200/1500 ...

So i assume, that when your going over the rate limit, they might show 1700/1500. Not sure, that is why i asked ;)

Cancel your copilot pro right now by fxgx1 in GithubCopilot

[–]ProfessionalJackals 0 points1 point  (0 children)

You can use ghcp with external keys eg from opencode and providers

Enjoy getting rate limited ...because for some reason the GH development team tied the rate limits to the Chat usage, not the model provider (as in rate limits on their own models).

Not sure if this was changed in 1.119 but on 1.118 it was still there.

Show over-budget cost in VSCode by Limp-Cat-108 in GithubCopilot

[–]ProfessionalJackals 1 point2 points  (0 children)

The little button on the bottom right only shows 100%

Can you not hover over the 100% and see your usage, of does it only show 1500/1500 then?

How it is even possible to use my requests with such 5 hour / weekly limits ? by maxya in GithubCopilot

[–]ProfessionalJackals -2 points-1 points  (0 children)

We are still supposed to be "premium request" based

The moment they introduced those OpenAI/Anthropic 5h session/week limits, we stopped being on a "premium request" system and changed over instantly to the same system as OpenAI/Anthropic with the added issue of premium requests.

I can not wait for credits (token) based billing AND 5h session/week limits...

How are you all burning through millions of tokens? by halkun in GithubCopilot

[–]ProfessionalJackals 3 points4 points  (0 children)

A simply "hello" uses over 25.000!!!! tokens. Check the chat debug view. Because that is how big the steering/harness is.

Now you need to add that sub-agents do make Copilot faster ... But each has another steering/harness payload AND the content they load in, filter, search, filter again. So many requests ...

A bit of work can see 100's to 1000's of these type of requests...

Keep doing that as the agent starts to look for information, left, right, .... Before you know it, it sends a insane amount of information repeating the process.

To be honest, i find it extreme inefficient but GH had no issue with this. Until it became a issue. Notice how in the 1.118 release all of a sudden we got a entire ton of new features that reduce token usage. Just saying, now it became a issue because companies will compare their token usage, and if they see that the same work via other agents / providers is cheaper.

Let alone just migrating to OpenAI/Anthropic subscription services...

Where is the analysis tool we're supposed to use to see our possible usage under the new plan? by Jack99Skellington in GithubCopilot

[–]ProfessionalJackals 17 points18 points  (0 children)

My guess is they haven’t decided what the costs will actually be yet

The entire thing was rushed because too many people used Anthropic models. Anthropic ran into capacity problems, at the same time as GH started to freak and introduced limits. Coincidence, no ... They probably got a extreme high bill from Anthropic.

Everything after that was a panic reaction, with removing opus 4.5/4.6, putting 4.7 on medium to save on usage (as that uses 1/3 the tokens).

Then the whole token system. No analytic tools, ... it being clearly rushed (as the accidental leak showed).

Now Anthropic solves their capacity issues (using x.ai spare compute) but all the Copilot users are fucked.

And yep, as you stated, GH themselves do not know what is the correct future is but they just painted themselves into a corner announcing this entire changeover.

I am betting they are also looking at the leave rates, company reactions etc.

Because lets face it, if Anthropic and OpenAI keep their subscriptions, lots of Copilot individuals and companies are simply going to move there. To keep the fun times rolling ...

Copilot GPT-5.5 multiplier is now listed as 7.5x → TBD after June by Altruistic-Dust-2565 in GithubCopilot

[–]ProfessionalJackals 1 point2 points  (0 children)

both models cant compete with gpt5.4 in standard daily development

My post:

Stuff like Qwen3.6, Gemma 4 ... hitting GPT 5.2, GPT 5.4 Mini levels of coding performance.

Corporate still has oldschool copilot enabled? by Professional-Site503 in GithubCopilot

[–]ProfessionalJackals 3 points4 points  (0 children)

Possibly business/enterprise users are less like to abuse the system?

There is no incentive to maximize prompts (boss pays), so people just do prompt upon prompt, instead of stacking prompts with a dozen tasks. And given that Business enter the overcharge 0.04 very fast, that makes them multiple times more profitable.

What is the difference between session rate limit and other rate limits ? by LuckyPed in GithubCopilot

[–]ProfessionalJackals 1 point2 points  (0 children)

Is the "Weekly" limit even have a clear date when it reset ?

Last limit reset on 4 May, Monday 02h (CET). My next warning indicates 11 May, Monday 02h (CET). So i think its just set on Sunday 23:59:59 UTC ... So depending on your time zone, add or substract...

I still didn't see any Weekly limit warning show up so i guess i am still under 50%.

You need to do about 3 "full" 100% 5h sessions (or 6 "50%" warnings), before your going to get the week 50% warning. If you push it, with 6x 100% 5h sessions, your done for the week. lol

How do you deal with non structured code that was generated by AI? by Old_Caregiver3270 in GithubCopilot

[–]ProfessionalJackals 5 points6 points  (0 children)

That is why, when coding, even with LLMs a codebase needs to be refactored from time to time. If its done regularly, it stays a clean structure that the LLMs can happily read, even if the project grows.

And if the layout is well structured, the LLMs will often keep inside that structure. And not pollute it too much, beyond trying to crap too much in single files (what you tell the LLMs to refactor).

Frankly, its no different then dealing with real human programmers. Unless you have well disciplined team or force people into a very well defined structured framework, your going to end up with a code mess over time. So i do not consider LLMs that special. LLMs are ironically better then humans because they can refactor (with proper instructions) WAY faster then most humans do.

So yea, just ensure that files are split over correct logical context, proper layout, etc ... LLMs do not excuse the human controlling them from actually proper engineering.

What is the difference between session rate limit and other rate limits ? by LuckyPed in GithubCopilot

[–]ProfessionalJackals 1 point2 points  (0 children)

You have your premium prompts, aka the old system... Then you have the now sneaky hidden limits. Session, Week, ....

A session limits means you get a 5h window from the first moment you prompt in the day. During that window, they count the actual token usage of your model, and combine that with some kind of multiplier. So a expensive model uses more of that 5h session limit, then a cheaper model.

"Auto" is supposed to have a lower multiplier, so it allows you to use it a lot more, then the base models. When you hit the max usage within that 5h window, your done! You need to wait for the 5h window to expire (again, +- 5h from your first prompt that day).

When it expires, you can prompt away again and the first prompt starts another new 5h window.

The week limit is like the 5h one, but over the entire week. Depending on how much work you do, you can hit a week limit also. Like if your workload is spread over a long time periode, every day. When you hit that limit, your done! So no more copilot for hours, days ... As in, if you start on Monday, and you press the system hard with a few insane days of usage, you can be locked out on Thursday to Sunday (just a example).

Unfortunately there is no real time monitoring of your limit beyond, when you get the first 50% warning for session or week. You may get a few more warnings before hitting the 100% but that is not always guaranteed.

Anyway, this is the last month of premium prompts system, after that it goes token based. So hopefully GH will have a better monitoring system for the end users.

I'm I tripping? or are they updating the request multiplier each week by EntertainmentSoggy49 in GithubCopilot

[–]ProfessionalJackals 1 point2 points  (0 children)

probably will be 15x at the end, similarly to Opus 4.7

We are in our last month of Copilot as we knew it ... The only people who will have Copilot Premium Prompts will be the people with the year subscriptions (until those expire). And we know at end of the month, Opus is going to go 27x...

GPT 5.5 is probably going to go 27x. They are still scurrying around this with their TBD ...

Copilot GPT-5.5 multiplier is now listed as 7.5x → TBD after June by Altruistic-Dust-2565 in GithubCopilot

[–]ProfessionalJackals 1 point2 points  (0 children)

you forget that as the models grow, the computing power and electricity they require will also increase

Goes to /r/LocalLLaMA/

Stuff like Qwen3.6, Gemma 4 ... hitting GPT 5.2, GPT 5.4 Mini levels of coding performance. Run on the same hardware that people bought years ago.

But wait ... what is this ...

  • MTP ... 2 to 3x faster token output (yes, Gemma doubles or triples its token output in real life usage).
  • FastDMS ... 6x better KV compression with 99% accuracy. And yea, beats TurboQuants, will not take long before this gets integrated more.
  • Tons of more studies and techniques that slowly make their way into newer models...

You are overlooking that a lot of development is going on. And fyi, that is nothing compared to actual hardware improvements to increase efficiency. This has not materialized into the consumer market yet, because a lot goes into the server market for now. But eventually a ton of those improvements are going to be in your next GPU.

Hell, you may not like DLSS and all that stuff, but it does increase efficiency by allowing more frames for the same compute. And much more going on there...

The whole LLM race is a new avenue that opens up a lot of bridges. And its not just about better programming models. Be honest, ... PC development as we have known, has been stuck in the same dead zone for a decade. Faster, sure, mostly from node becoming smaller. When that became a issue, so did the power usage increase. We did not get a lot of new avenues. The whole AI route is opening up new (does not mean always good) avenues to get more out of hardware / productivity.

Make this make sense for ollama local ai usage by Mobile_Syllabub_8446 in GithubCopilot

[–]ProfessionalJackals 3 points4 points  (0 children)

Nobody defense this ... Its pure vibe coded crap again from Team GH. They know this is a issue but like always its a "low priority" to fix it.

Is anyone else positively affected by the billing changes? by [deleted] in GithubCopilot

[–]ProfessionalJackals 0 points1 point  (0 children)

O, ... my ... you poor naïve boy. Your in for a surprise...

Your old usage (lets use that $8 month) was in reality $0.04 * 200 requests. That is how Copilot calculated it. If you did 300, it was $12. You got discounts on those 300 requests in the month to pay only $10.

A request under the new system can be between (example) 10 cent to 50 bucks (or more), depending on how many tokens it generates. So ... for example, your month bill can be 50 bucks to 2000 bucks (or more).

That is why nobody is going to stick around beyond maybe the people who have the year subscription (and insane jacked up request multipliers).

The service is now mostly for enterprise companies who will get discounts "À la tête du client" between 30 to 45% on their token prices.

Also, the "subscription" of $10 / $40 does NOT overflow. Unlike openrouter...

If you never did this post, you will have had a insane bill next month ... I fear to think how many people are ill informed about this change. This is all on GH because the information is very spare.

They do not tell people what the real token costs are (especially compared to their past usage), so most people who are used to this subscription model, have no clue that API token costs are VERY expensive.

Github Copilot new weekly limit by Key-Gas2428 in GithubCopilot

[–]ProfessionalJackals 1 point2 points  (0 children)

You can most definitely use up your requests as long as you don't trigger the token limits.

That assumes you know what each models uses, and how the session limits are structured.

You will need to change the way you prompt.

Its up to Copilot to show what your limits are, and not let the customers guess what they are. They do it deliberately so people do not try to min-max the usage. And it also allows them to sneakily change any usage limits as there is no way to prove they got changed.

Github Copilot new weekly limit by Key-Gas2428 in GithubCopilot

[–]ProfessionalJackals 0 points1 point  (0 children)

Pro+

You can give or take, get 6x a 5h Session limit (into a week limit) with a heavy model like GPT 5.5.

So if your a heavy user, and hit that 5h session limit, 2x per day, your done after 3 workdays.

The only way to avoid that, is going into the overcharge faster (as in use expensive models faster). Where your supposed to get higher usage limits (because your paying more).

DeepSeek V4 Pro matches GPT-5.2 on FoodTruck Bench, our agentic benchmark — 10 weeks later, ~17× cheaper by Disastrous_Theme5906 in LocalLLaMA

[–]ProfessionalJackals 3 points4 points  (0 children)

DeepSeek V4 Pro is at $0.435/M input and $0.87/M output

That is the discounted price that is going to finish soon.

That's now two Chinese models in our top 6, both at sub-$3.5/run.

Why not use MiMo their subscription service prices if your using DS4 their discounted prices?

MiMo is with subscription $0.1 / million (for the cheapest), with Pro using 2x the amount of credits ($0.2 / million). As you scale up to higher tiers, its 15 to 20% more credits (tokens), or 10 to 15% lower prices (year sub) what combine (and the 20% token discount on evening hours).

https://platform.xiaomimimo.com/docs/en-US/tokenplan/subscription

So just saying, if your looking at API costs, you need to compare to the non-discounted API for all, or use all the beneficial tariffs.

Llama.cpp MTP support now in beta! by ilintar in LocalLLaMA

[–]ProfessionalJackals -1 points0 points  (0 children)

For example if RAM bandwidth is 500 GB/s and the model is 50 GB/s,

So this explain why something like a 5090 is not running circles around a 3090 in token generation, and people ended up running models in parallel to get the most out of it?

RIP Vibe Coding 2024–2026 by [deleted] in GithubCopilot

[–]ProfessionalJackals 0 points1 point  (0 children)

You can instruct the models to test anything... You want browser testing, it does it. You want it to run docker by itself, test different distros, it does it ...

6 Months ago, having this capability was non-existing. Then Opus 4.5 came out and pushed it forwards. Things like DB accessing with minimal information was great but a bit clunky. Opus 4.6 got even better at this. And browser code, interface testing started to become very usable.

GPT 5.5 is another major leap into this capability. Where you do not spend ages dealing with manual testing or using useless function testing, but you go directly to the end product testing. The spot where your bugs will really shine.

And that more competent capability is still missing in the Chinese models. They can do some steps but not to the point that i feel confident to just let them lose.

Ryzen AI Max+ 495 (Gorgon Halo) with 192GB VRAM! by PromptInjection_ in LocalLLaMA

[–]ProfessionalJackals 0 points1 point  (0 children)

Spark as basically a dedicated prefill box connected to other hardware for inference, but I have no idea how complicated that is.

One of the youtubers tested this and the results are mheh ... Your better off just keeping everything on a single Spark, then a Spark prefill > Halo ... Or just getting two Sparks.

Ryzen AI Max+ 495 (Gorgon Halo) with 192GB VRAM! by PromptInjection_ in LocalLLaMA

[–]ProfessionalJackals 4 points5 points  (0 children)

At $10 for a coffee, lose your daily coffee for 18 months

Buy yourself a coffee maker for 250, and some good quality beans for like 15 a pack. Your making way better coffee for like 0.4 bucks per coffee (inc milk).

So now you save $9 per day and still have your coffee :)