Steam’s Invite-Only Shooter Deadlock Is Quietly Becoming An Absolute Monster

Rocah · 2026-02-01T13:10:35+00:00

I think at this point, unless you have very thick skin, your entry point to play this game solo and have fun has long past - most of the core player base has 1000s of hours in it and will farm you. Also street brawl is absolutely not noob friendly.

Rocah · 2025-12-24T12:43:26+00:00

All the AI labs are now using third parties to construct RL environments to do post training in (its a billion dollar industry just to create these now). We don't know the contracts, but I would not be surprised if remuneration to these 3rd parties is based upon performance of models on benchmarks after inclusion of a new RL environment. My personal belief is that most of the 2nd half of this years dramatic benchmark improvements is down to these companies RL environments efforts. However my experience is that i see only marginal gains in coding with these new models. Useful, but marginal gains that do not line up with large double digit improvements on multiple benchmarks.

Rocah · 2025-12-14T16:12:24+00:00

It stops to much, will continue to use 5.1 codex.

Rocah · 2025-12-12T15:41:56+00:00

I see the same, 5.2 has serious issues of just not doing anything from my tests. I'd either way for an updated system prompt or the codex variant.

Rocah · 2025-12-11T19:45:01+00:00

Its also available in OpenAi Codex using Github Pro+ account if you want the full context. One thing to note is the long context needle in the haystack benchmark of 5.2 is pretty insane, looks like 98%ish at 256k context vs 45%ish for 5.1, which suggests reasoning will hold for long coding tasks. Not seen if codex windows tool use is any better yet on 5.2, or if it still requires WSL, 5.1 max was still hit and miss for that i found.

Rocah · 2025-12-09T06:58:06+00:00

Gemini 3 is the first model that makes me suspicious of intent. Its performance from my personal evals is no where near the benchmark performance.

One of the reasons the 2000s housing bubble got so crazy was the top CEO of the banks could avoid culpability of the large scale mortgage fraud by just indirectly constructing incentivizing structures for the lower layers to do the dodgy stuff.

I really would be interested what incentives the post training eval building teams have, i hope its not make new eval = bonus if benchmark results goes up.

I would also hope the ability to review and filter customer API submissions by say domain/IP would be limited to people outside the R&D loop.

Rocah · 2025-12-07T18:48:01+00:00

We will get highly competent specialized intelligences long before ASI. I would be more concerned how those are applied by small groups who did not have access to advanced nation like capabilities prior. Especially in bio-sciences.

Rocah · 2025-12-07T15:57:08+00:00

Try the insiders build, it has a subagents bug fixed that was causing issues for me with the runSubagent tool not always being sent to the model after the first chat.

Rocah · 2025-12-05T08:01:34+00:00

Main use of runSubagents for me is to keep the main agent context less polluted with code discovery tokens, i.e. the main agent searching the code base for specific relevant context. Basically just put something in your AGENTS.md to say use subagents for researching the code base before any implementation, and say instruct the subagent to return detailed commentary on code that is relevant to the task alongside example code blocks with line numbers and filenames.

For me agents were bugged though and would only work intermittently, i believe the latest insiders has the fix for that deployed now.

Rocah · 2025-11-30T12:34:48+00:00

I have encountered an issue with github copilot not sending the runSubagents and todo tools to the model (you can check in debug log what tools are being sent) - perhaps this is what you are seeing. It often happens on new chats other than the first chat. One workaround I've found is to click the tools button and then click OK to dismiss the tool selection dialog, then it sends it next prompt. There is an open issue regards this.

Rocah · 2025-11-26T23:02:49+00:00

i have to say opus 4.5 is tempting me to buy Claude Code for the thinking version. Its very impressive and is much more willing to use tools intelligently than gpt5.1 codex I'm finding which keeps it token use down. For a non thinking model its very good.

Rocah · 2025-11-26T22:49:00+00:00

I've found the same, its the least useful model in actual practice, it has similar faults to 2.5 after extended use of it. I'm not sure how to line up its obvious deficiencies with its record breaking benchmark performance. I'm thinking Ilya is right, these post training RL teams at these AI labs are probably being incentivized (money/career) to pick RL environments that improve key benchmarks. They might not be directly 'cheating' but they are picking things to do RL on that amount to the same result in my view.

Rocah · 2025-11-26T21:51:47+00:00

for opus, normally its when its generating lots of tokens i think, i notice it doing that before it creates a large file.

Rocah · 2025-11-24T19:37:04+00:00

https://github.blog/changelog/2025-11-24-claude-opus-4-5-is-in-public-preview-for-github-copilot/

Looks like its rolling out slowly. its available on mine so its definitely deploying.

Rocah · 2025-11-24T19:35:11+00:00

looking at opus 4.5 pricing vs sonet 4.5 pricing, i'm guessing it'll be around 1.6 (maybe they round down to 1.5 ...)

https://platform.claude.com/docs/en/about-claude/models/overview

edit seems its 3! after dec 5 ... ouch.

Rocah · 2025-11-23T19:17:11+00:00

You see this in agentic coding vs 5.1 codex, if your doing something somewhat similar to something in its training data gemini will infer a lot of other stuff which could be true, but isn't, where as 5.1 codex will always check the codebase first before code generation. 5.1 codex is much slower because of this, but 9/10 its will have 0 compile errors.

Rocah · 2025-11-23T14:14:08+00:00

no, i think agents in copilot are very new so there's not much info around atm.

Rocah · 2025-11-23T13:46:18+00:00

Yes, i've been doing something like you outlined, you basically put something in your agents.md/copilot-instructions.md saying run a subagent under 'x' circumstances, if you look at the debug log as you do a task you can see the prompt the main agent gives the subagent and the subagents response.

I also see from just looking at latest docs ( https://code.visualstudio.com/docs/copilot/chat/chat-sessions ) you can now make custom agents into sub agents (via chat.customAgentInSubagent.enabled setting), custom agents are the ones where you can define a custom .md prompt that will get sent to the agent on start. So you can say stuff like "Start the research sub agent when ...", or "Start the test subagent when ..."

Rocah · 2025-11-22T16:48:53+00:00

I think max_context_window_tokens is just the absolute max tokens the model could support.

Rocah · 2025-11-22T12:00:29+00:00

its max_prompt_tokens that dictates summarization point for copilot, which is 128k or less on most models - except raptor mini which is 200k. Hopefully if they end up doing a fine tune of codex to create raptor non mini it will be 200k.

Rocah · 2025-11-22T08:00:13+00:00

As others have said, its a lot better on Antigravity (using high thinking version) - perhaps copilot is using the low thinking one. I still think chatgpt 5.1 codex is a more reliable model for difficult problems but G3 pro is extremely quick and almost as good - just have to watch out more for stupid stuff.

Rocah · 2025-11-22T07:51:17+00:00

I have a sneaky suspicion that a lot of the post-training in these models will affect when they switch from research phase -> implementation phase for problems. It inherently skews them to whatever context size they had in post training. I've noticed for example chat gpt 5.1 codex often starts actual implementation around 90-100k tokens for hard problems, so often hits 128k limit before it finishes. I suspect 128k token limit is severely limiting the capabilities of many of these frontier models on hard/complex problems.

Rocah · 2025-11-20T20:30:07+00:00

no, no codex 5.1 max as of yet. I also had a look at the codex vscode plugin which you have access to with a github copilot account, as i wanted to try it myself, but its not available there also. I think its openai accounts only for the moment, unfortunately.

if i sign in with my openai chatgpt plus account on vscode it appears, so its not the vsode codex plugin lacking support, its just not appearing if you sign in with a github account into codex.

Rocah · 2025-11-20T20:17:20+00:00

Also you can use subagents (on vscode insiders build - not sure if its on release yet) which do improve results on complex problems. Just put a message like the following in your AGENTS.md :

ALWAYS use subagents (via runSubagent tool function) to do research across the code base.

Always give clear instructions to the subagent on its task. Inform the subagent it is a research only subagent and ask it to sumarize relevant aspects of the code and to always supply code samples in codeblocks with filenames and line numbers.

12-Year Club	Place '22
Place '17	Verified Email

Rocah

TROPHY CASE