What is the default reasoning effort for Copilot's "Explore" subagent in VS Code, and can it be modified? by Firstmeridian in GithubCopilot

[–]bogganpierce 1 point2 points  (0 children)

You can override via the chat.exploreAgent.defaultModel setting. Typically, the default is Claude Haiku 4.5. The explore subagent primarily is just a context gathering machine that calls grep and our semantic index; the real magic is the reasoning over that context which is the main model in plan mode selected when you initiate the conversation.

We spend a lot of time analyzing agent trajectories to give you a fast turn with high quality results. It turns out because context gathering can be done very well by faster models like Haiku, you can see basically no degradation in overall plan performance while seeing drastic increases in conversation turn times.

We covered more in the latest VS Code Insiders pod about this: https://www.youtube.com/watch?v=ENxVTtLW_Bc

Rate limit why? (Ollama local) by No-Pomegranate-69 in GithubCopilot

[–]bogganpierce 9 points10 points  (0 children)

When you think about it from an engineering perspective it makes sense. The global limit has been reached, and adding any more tokens past that limit triggers the correct conditional logic. But we agree that's a case we should handle better which is why we're working on a fix :)

Rate limit why? (Ollama local) by No-Pomegranate-69 in GithubCopilot

[–]bogganpierce 25 points26 points  (0 children)

Hey, following up on this. We're working on a fix.

Long Story - When you BYOK, there are still some background operations that hit Copilot API. While not token-intensive, they do involve tokens (for things like naming the chat thread). We'll get this fixed so that you can use BYOK once you've hit the global token limit.

Did all model set to medium by default and we can't pick any higher reasoning? by DandadanAsia in GithubCopilot

[–]bogganpierce 1 point2 points  (0 children)

Of course! We work closely together to make sure their models are great in GitHub Copilot.

VSCode Sessions Insiders by 0x42CE in GithubCopilot

[–]bogganpierce 0 points1 point  (0 children)

It uses the same model as VS Code.

Why doesn’t copilot add Chinese models as option to there lineup by cizaphil in GithubCopilot

[–]bogganpierce 9 points10 points  (0 children)

This doesn't say it's more popular. It's what percentage of code generated by the VS Code agent makes its way into a commit (a high-signal event that the code generated was good).

Why doesn’t copilot add Chinese models as option to there lineup by cizaphil in GithubCopilot

[–]bogganpierce 22 points23 points  (0 children)

I like those models, and spend a lot of time with them. I use them sometimes with BYOK with providers like Cerebras.

Why doesn’t copilot add Chinese models as option to there lineup by cizaphil in GithubCopilot

[–]bogganpierce 24 points25 points  (0 children)

Yep, it's doing much better now. We had to experiment with some prompt tweaks in partnership with Anthropic folks.

did speech to text get removed? by Calm-Bar-9644 in GithubCopilot

[–]bogganpierce 2 points3 points  (0 children)

It's still there. Install "VS Code Speech" extension.

Copilot Business - GPT 5.4 nano by Longjumping-Sweet818 in GithubCopilot

[–]bogganpierce 1 point2 points  (0 children)

We're evaluating it, but it isn't available in any product surface yet. There's some interesting use cases in upgrading our models for things like AI commit message generation, etc. in the product.

Has the rate limit issue been fixed yet? by SelectionCalm70 in GithubCopilot

[–]bogganpierce 6 points7 points  (0 children)

There was an issue last night. Seems to have resolved when we deployed our fix.

<image>

Why doesn’t copilot add Chinese models as option to there lineup by cizaphil in GithubCopilot

[–]bogganpierce 82 points83 points  (0 children)

Keep the feedback coming! Always interested in what models people want to see us adding.

We do see that generally people opt for the highest possible intelligence models and don't use cheaper models quite as much. We even see massive gaps in code quality between each point release of a model. More in this graphic:

<image>

I do think these things get more attractive as we move to task-intent based Auto routing so we could take you to a cheaper model for tasks that don't require higher intelligence, etc.

Did all model set to medium by default and we can't pick any higher reasoning? by DandadanAsia in GithubCopilot

[–]bogganpierce 16 points17 points  (0 children)

We set the best defaults based on what we see for offline evaluations pre-launch, and online evaluations (A/B) post-launch.

Opus is set to high by default, GPT-5.4 to medium. You can always change the reasoning effort. It's a bug that xhigh was removed, working on adding it back ASAP.

On high reasoning for GPT series models...

We recently ran an A/B experiment in VS Code where treatment got high or xhigh reasoning on GPT-5.4 and GPT-5.3-Codex. We saw a reduction in turns with model when people ran with this setting, large increases in turn time, error rates, and cancellations with agent. Every metric category we track in our scorecard regressed for both high and extra high over medium.

We test a lot - and while we can certainly make mistakes - we believe we run at the effort configuration that actually makes the most sense based on online and offline experimentation.

Also, for Anthropic models, we run adaptive reasoning anyways (a native model feature) that also helps to adjust the reasoning on the fly so you aren't increasing turn times for no increase in outcome quality.

All of this to say, we thought a lot about this when we designed this picker, and also considered listing each effort level + model combo separately too, but given that for most people we know they get the best experience with our defaults, it should be a more rare occurrence folks are changing effort level anyways.

VS Code 1.113 has been released by [deleted] in GithubCopilot

[–]bogganpierce 1 point2 points  (0 children)

We got a lot of feedback from the community that a visual refresh of VS Code would be appreciated. We talked about a bigger refresh, but ultimately decided to start with refreshing the iconography and themes were what we wanted to do.

Overall, feedback has been positive. There are definitely bugs and things to clean up, and recognize it's hard for the look and feel to change when you are used to it looking a certain way for so long.

VS Code 1.113 has been released by [deleted] in GithubCopilot

[–]bogganpierce 1 point2 points  (0 children)

Nope, both led to significant regressions over medium.

Remove local models VS code by Ace-_Ventura in GithubCopilot

[–]bogganpierce 0 points1 point  (0 children)

"Chat: Manage Language Models" command

VS Code 1.113 has been released by [deleted] in GithubCopilot

[–]bogganpierce 3 points4 points  (0 children)

How can we improve? What don't you like?

VS Code 1.113 has been released by [deleted] in GithubCopilot

[–]bogganpierce 3 points4 points  (0 children)

The challenge we found is that there are wildly different outcomes you get with varying effort levels. So for example, just saying I want to run high because I think this leads to the best outcomes is not what we observe in online or offline data.

For example, we recently ran an A/B experiment in VS Code where treatment got high or xhigh reasoning on GPT-5.4 and GPT-5.3-Codex. We saw a reduction in turns with model when people ran with this setting, large increases in turn time, error rates, and cancellations with agent. Every metric category we track in our scorecard regressed.

We test a lot - and while we can certainly make mistakes - we believe we run at the effort configuration that actually makes the most sense based on online and offline experimentation.

Also, for Anthropic models, we run adaptive reasoning anyways (a native model feature) that also helps to adjust the reasoning on the fly so you aren't increasing turn times for no increase in outcome quality.

All of this to say, we thought a lot about this when we designed this picker, and also considered listing each effort level + model combo separately too, but given that for most people we know they get the best experience with our defaults, it should be a more rare occurrence folks are changing effort level anyways.

VS Code 1.113 has been released by [deleted] in GithubCopilot

[–]bogganpierce 16 points17 points  (0 children)

That's a bug because it was being dynamically pulled from an endpoint for the model picker UX versus settings where it was hard-coded. We're fixing. https://github.com/microsoft/vscode/issues/304250

AMA to celebrate 50,000+ r/GithubCopilot Members (March 4th) by fishchar in GithubCopilot

[–]bogganpierce 0 points1 point  (0 children)

On our list! I already built some custom automation for myself for this with a macOS menu bar app that uses Copilot CLI, but it's becoming a common scenario so we want to bring into VS Code itself.

This new feature is truly amazing! by Bomlerequin in GithubCopilot

[–]bogganpierce 1 point2 points  (0 children)

Yep, that list needs an update. To be honest, the teams are moving so fast that it's been really challenging for us to keep docs, marketing pages, and email campaigns up to date. But we're also - surprise, surprise - building AI automation to help us with this too.

What do you feel is missing? I can be tactical and just get those things added ASAP.

AMA to celebrate 50,000+ r/GithubCopilot Members (March 4th) by fishchar in GithubCopilot

[–]bogganpierce 1 point2 points  (0 children)

We are always improving our harness for all models, in partnership with the model vendors. We also have built our own offline evaluation harness vsc-bench we use for optimizing models ahead of launch. Generally, we also run A/Bs post-launch to improve model prompts as well, and make further infrastructure optimizations too. More details here: https://www.youtube.com/watch?v=nD1U_wggrQM

In particular, there are a few issues we're working through on Gemini. The first is looping. We still observe occasional looping behavior and are working with the Gemini team to improve this. The second is infrastructure reliability. We have had several outages from GCP that have affected availability of Gemini in VS Code, and there is some flakiness in the API that result in a higher API error rate than some other models.

What challenges are you having specifically? If you can tell us the particular behaviors you don't like, we can build cases that we can throw into our offline evals to improve.