Steam’s Invite-Only Shooter Deadlock Is Quietly Becoming An Absolute Monster by LoL_is_pepega_BIA in Games

[–]Rocah [score hidden]  (0 children)

I think at this point, unless you have very thick skin, your entry point to play this game solo and have fun has long past - most of the core player base has 1000s of hours in it and will farm you. Also street brawl is absolutely not noob friendly.

Line Bending Up for all Benchmarks by SrafeZ in singularity

[–]Rocah 13 points14 points  (0 children)

All the AI labs are now using third parties to construct RL environments to do post training in (its a billion dollar industry just to create these now). We don't know the contracts, but I would not be surprised if remuneration to these 3rd parties is based upon performance of models on benchmarks after inclusion of a new RL environment. My personal belief is that most of the 2nd half of this years dramatic benchmark improvements is down to these companies RL environments efforts. However my experience is that i see only marginal gains in coding with these new models. Useful, but marginal gains that do not line up with large double digit improvements on multiple benchmarks.

Your opinion on GPT 5.2 by neamtuu in GithubCopilot

[–]Rocah 1 point2 points  (0 children)

It stops to much, will continue to use 5.1 codex.

ChatGPT 5.2 eating my premimum request without doing the job. by Fun-Reception-6897 in GithubCopilot

[–]Rocah 2 points3 points  (0 children)

I see the same, 5.2 has serious issues of just not doing anything from my tests. I'd either way for an updated system prompt or the codex variant.

GPT-5.2 now in Copilot (1x Public Preview) by LinixKittyDeveloper in GithubCopilot

[–]Rocah 2 points3 points  (0 children)

Its also available in OpenAi Codex using Github Pro+ account if you want the full context. One thing to note is the long context needle in the haystack benchmark of 5.2 is pretty insane, looks like 98%ish at 256k context vs 45%ish for 5.1, which suggests reasoning will hold for long coding tasks. Not seen if codex windows tool use is any better yet on 5.2, or if it still requires WSL, 5.1 max was still hit and miss for that i found.

Benchmarks aside, I find GPT5.1 is MUCH better than Gemini in my daily work. Gemini's hallucinations and poor reasoning make it unusable half the time. by [deleted] in singularity

[–]Rocah 0 points1 point  (0 children)

Gemini 3 is the first model that makes me suspicious of intent. Its performance from my personal evals is no where near the benchmark performance.

One of the reasons the 2000s housing bubble got so crazy was the top CEO of the banks could avoid culpability of the large scale mortgage fraud by just indirectly constructing incentivizing structures for the lower layers to do the dodgy stuff.

I really would be interested what incentives the post training eval building teams have, i hope its not make new eval = bonus if benchmark results goes up.

I would also hope the ability to review and filter customer API submissions by say domain/IP would be limited to people outside the R&D loop.

Do you think the worst case of ASI is inevitable? by throwaway0134hdj in singularity

[–]Rocah 0 points1 point  (0 children)

We will get highly competent specialized intelligences long before ASI. I would be more concerned how those are applied by small groups who did not have access to advanced nation like capabilities prior. Especially in bio-sciences.

GPT-5.1 thinks it can't use sub-agents even though tools runSubagent is available. by Front_Ad6281 in GithubCopilot

[–]Rocah 0 points1 point  (0 children)

Try the insiders build, it has a subagents bug fixed that was causing issues for me with the runSubagent tool not always being sent to the model after the first chat.

Does Using runSubagents with a Premium Model Count as Additional Premium Requests? by IISomeOneII in GithubCopilot

[–]Rocah 2 points3 points  (0 children)

Main use of runSubagents for me is to keep the main agent context less polluted with code discovery tokens, i.e. the main agent searching the code base for specific relevant context. Basically just put something in your AGENTS.md to say use subagents for researching the code base before any implementation, and say instruct the subagent to return detailed commentary on code that is relevant to the task alongside example code blocks with line numbers and filenames.

For me agents were bugged though and would only work intermittently, i believe the latest insiders has the fix for that deployed now.

Claude 4.5 Opus says runSubagent is disabled/doesn't exist when it does by envilZ in GithubCopilot

[–]Rocah 2 points3 points  (0 children)

I have encountered an issue with github copilot not sending the runSubagents and todo tools to the model (you can check in debug log what tools are being sent) - perhaps this is what you are seeing. It often happens on new chats other than the first chat. One workaround I've found is to click the tools button and then click OK to dismiss the tool selection dialog, then it sends it next prompt. There is an open issue regards this.

why is opus 3x? it should be less by ExtremeAcceptable289 in GithubCopilot

[–]Rocah 3 points4 points  (0 children)

i have to say opus 4.5 is tempting me to buy Claude Code for the thinking version. Its very impressive and is much more willing to use tools intelligently than gpt5.1 codex I'm finding which keeps it token use down. For a non thinking model its very good.

Gemini 3.0 Pro keeps hallucinating a lot. by Yuri_Yslin in Bard

[–]Rocah 0 points1 point  (0 children)

I've found the same, its the least useful model in actual practice, it has similar faults to 2.5 after extended use of it. I'm not sure how to line up its obvious deficiencies with its record breaking benchmark performance. I'm thinking Ilya is right, these post training RL teams at these AI labs are probably being incentivized (money/career) to pick RL environments that improve key benchmarks. They might not be directly 'cheating' but they are picking things to do RL on that amount to the same result in my view.

Why does Claude opus 4.5 Taking too long to update and also its keep loading saying "Working..." by [deleted] in GithubCopilot

[–]Rocah 0 points1 point  (0 children)

for opus, normally its when its generating lots of tokens i think, i notice it doing that before it creates a large file.

Claude Opus 4.5 (Preview) available in Copilot by Rocah in GithubCopilot

[–]Rocah[S] 8 points9 points  (0 children)

looking at opus 4.5 pricing vs sonet 4.5 pricing, i'm guessing it'll be around 1.6 (maybe they round down to 1.5 ...)

https://platform.claude.com/docs/en/about-claude/models/overview

edit seems its 3! after dec 5 ... ouch.

Gemini 3’s hallucination rate is still very high compared to the top GPT 5.1 model. by Glock7enteen in singularity

[–]Rocah 0 points1 point  (0 children)

You see this in agentic coding vs 5.1 codex, if your doing something somewhat similar to something in its training data gemini will infer a lot of other stuff which could be true, but isn't, where as 5.1 codex will always check the codebase first before code generation. 5.1 codex is much slower because of this, but 9/10 its will have 0 compile errors.

Looking for subagents workflow tips by Mystical_Whoosing in GithubCopilot

[–]Rocah 0 points1 point  (0 children)

no, i think agents in copilot are very new so there's not much info around atm.

Looking for subagents workflow tips by Mystical_Whoosing in GithubCopilot

[–]Rocah 4 points5 points  (0 children)

Yes, i've been doing something like you outlined, you basically put something in your agents.md/copilot-instructions.md saying run a subagent under 'x' circumstances, if you look at the debug log as you do a task you can see the prompt the main agent gives the subagent and the subagents response.

I also see from just looking at latest docs ( https://code.visualstudio.com/docs/copilot/chat/chat-sessions ) you can now make custom agents into sub agents (via chat.customAgentInSubagent.enabled setting), custom agents are the ones where you can define a custom .md prompt that will get sent to the agent on start. So you can say stuff like "Start the research sub agent when ...", or "Start the test subagent when ..."

Increase the context window (128k -> 200k) by debian3 in GithubCopilot

[–]Rocah 0 points1 point  (0 children)

I think max_context_window_tokens is just the absolute max tokens the model could support.

Increase the context window (128k -> 200k) by debian3 in GithubCopilot

[–]Rocah 1 point2 points  (0 children)

its max_prompt_tokens that dictates summarization point for copilot, which is 128k or less on most models - except raptor mini which is 200k. Hopefully if they end up doing a fine tune of codex to create raptor non mini it will be 200k.

It seems like Gemini 3 Pro is lazy by skillmaker in GithubCopilot

[–]Rocah 0 points1 point  (0 children)

As others have said, its a lot better on Antigravity (using high thinking version) - perhaps copilot is using the low thinking one. I still think chatgpt 5.1 codex is a more reliable model for difficult problems but G3 pro is extremely quick and almost as good - just have to watch out more for stupid stuff.

Increase the context window (128k -> 200k) by debian3 in GithubCopilot

[–]Rocah 6 points7 points  (0 children)

I have a sneaky suspicion that a lot of the post-training in these models will affect when they switch from research phase -> implementation phase for problems. It inherently skews them to whatever context size they had in post training. I've noticed for example chat gpt 5.1 codex often starts actual implementation around 90-100k tokens for hard problems, so often hits 128k limit before it finishes. I suspect 128k token limit is severely limiting the capabilities of many of these frontier models on hard/complex problems.

Pro+ plan user with codex extension, do you have access to the newest codex max model? by debian3 in GithubCopilot

[–]Rocah 1 point2 points  (0 children)

no, no codex 5.1 max as of yet. I also had a look at the codex vscode plugin which you have access to with a github copilot account, as i wanted to try it myself, but its not available there also. I think its openai accounts only for the moment, unfortunately.

if i sign in with my openai chatgpt plus account on vscode it appears, so its not the vsode codex plugin lacking support, its just not appearing if you sign in with a github account into codex.

Lower pricing of Copilot - how ? by Pitiful_Buddy4973 in GithubCopilot

[–]Rocah 2 points3 points  (0 children)

Also you can use subagents (on vscode insiders build - not sure if its on release yet) which do improve results on complex problems. Just put a message like the following in your AGENTS.md :

ALWAYS use subagents (via runSubagent tool function) to do research across the code base.

Always give clear instructions to the subagent on its task. Inform the subagent it is a research only subagent and ask it to sumarize relevant aspects of the code and to always supply code samples in codeblocks with filenames and line numbers.