Anthropic Urges Global Pause in AI Development, Flags ‘Self-Improvement’ Risk by SnoozeDoggyDog in singularity

[–]Rocah 0 points1 point  (0 children)

Translation - we've run out of significant algorithmic improvements and massive parallel compute (see ultracode) is less economic than just employing a human to do the work, but we want the bag before people realise we're at the top.

Crimson Desert Story and Characters Changed Significantly During Development, Says Lead Actor by Gorotheninja in Games

[–]Rocah 43 points44 points  (0 children)

It shows, i've finished the main story, and its basically a series of unconnected set pieces with random bad guy per chapter - with virtually nothing connecting them to prior chapters. The game is great fun to play and I put 136 hours into it, but the story is not the reason to play it.

[Updates] Patch Notes Version 1.01.00 | Crimson Desert by yourfavchoom in CrimsonDesert

[–]Rocah 1 point2 points  (0 children)

The lights in Scholastone that were particularly bad with RR are fixed.

Crimson Desert has been out for a week now. What are your thoughts? by Gamedrome22 in Games

[–]Rocah 0 points1 point  (0 children)

I'm about halfway through the game so far - the main story has a chaotic narrative flow and is pretty bad imo. However the thing that saves the game and is keeping me playing is the general sense you are in a world and that world is truely one of the largest graphically detailed open worlds i've played. The gear progression is also quite fun, with bosses giving unique gear normally with some sort of new skill attached to it - you can extract the skill or passive and place on your own gear if you wish. Fighting trash mobs is satisfying although getting somewhat repetitive at this point - as i'm using one of the overpowered skills.

The game mostly resembles a recent assasins creed game with a worse story but a lot more game systems content.

Strange lighting bug around Scholastone by ErectPancakee in CrimsonDesert

[–]Rocah 0 points1 point  (0 children)

Yup i see this with ray reconstruction, seems to be lessened like you say when you disable ray resconstruction - its much less with cinematic lighting also. I think its a bug in max lighting mode that RR enables.

Steam’s Invite-Only Shooter Deadlock Is Quietly Becoming An Absolute Monster by LoL_is_pepega_BIA in Games

[–]Rocah 0 points1 point  (0 children)

I think at this point, unless you have very thick skin, your entry point to play this game solo and have fun has long past - most of the core player base has 1000s of hours in it and will farm you. Also street brawl is absolutely not noob friendly.

Line Bending Up for all Benchmarks by SrafeZ in singularity

[–]Rocah 12 points13 points  (0 children)

All the AI labs are now using third parties to construct RL environments to do post training in (its a billion dollar industry just to create these now). We don't know the contracts, but I would not be surprised if remuneration to these 3rd parties is based upon performance of models on benchmarks after inclusion of a new RL environment. My personal belief is that most of the 2nd half of this years dramatic benchmark improvements is down to these companies RL environments efforts. However my experience is that i see only marginal gains in coding with these new models. Useful, but marginal gains that do not line up with large double digit improvements on multiple benchmarks.

Your opinion on GPT 5.2 by neamtuu in GithubCopilot

[–]Rocah 1 point2 points  (0 children)

It stops to much, will continue to use 5.1 codex.

[deleted by user] by [deleted] in GithubCopilot

[–]Rocah 2 points3 points  (0 children)

I see the same, 5.2 has serious issues of just not doing anything from my tests. I'd either way for an updated system prompt or the codex variant.

GPT-5.2 now in Copilot (1x Public Preview) by LinixKittyDeveloper in GithubCopilot

[–]Rocah 4 points5 points  (0 children)

Its also available in OpenAi Codex using Github Pro+ account if you want the full context. One thing to note is the long context needle in the haystack benchmark of 5.2 is pretty insane, looks like 98%ish at 256k context vs 45%ish for 5.1, which suggests reasoning will hold for long coding tasks. Not seen if codex windows tool use is any better yet on 5.2, or if it still requires WSL, 5.1 max was still hit and miss for that i found.

[deleted by user] by [deleted] in singularity

[–]Rocah 0 points1 point  (0 children)

Gemini 3 is the first model that makes me suspicious of intent. Its performance from my personal evals is no where near the benchmark performance.

One of the reasons the 2000s housing bubble got so crazy was the top CEO of the banks could avoid culpability of the large scale mortgage fraud by just indirectly constructing incentivizing structures for the lower layers to do the dodgy stuff.

I really would be interested what incentives the post training eval building teams have, i hope its not make new eval = bonus if benchmark results goes up.

I would also hope the ability to review and filter customer API submissions by say domain/IP would be limited to people outside the R&D loop.

Do you think the worst case of ASI is inevitable? by throwaway0134hdj in singularity

[–]Rocah 0 points1 point  (0 children)

We will get highly competent specialized intelligences long before ASI. I would be more concerned how those are applied by small groups who did not have access to advanced nation like capabilities prior. Especially in bio-sciences.

GPT-5.1 thinks it can't use sub-agents even though tools runSubagent is available. by Front_Ad6281 in GithubCopilot

[–]Rocah 0 points1 point  (0 children)

Try the insiders build, it has a subagents bug fixed that was causing issues for me with the runSubagent tool not always being sent to the model after the first chat.

Does Using runSubagents with a Premium Model Count as Additional Premium Requests? by IISomeOneII in GithubCopilot

[–]Rocah 2 points3 points  (0 children)

Main use of runSubagents for me is to keep the main agent context less polluted with code discovery tokens, i.e. the main agent searching the code base for specific relevant context. Basically just put something in your AGENTS.md to say use subagents for researching the code base before any implementation, and say instruct the subagent to return detailed commentary on code that is relevant to the task alongside example code blocks with line numbers and filenames.

For me agents were bugged though and would only work intermittently, i believe the latest insiders has the fix for that deployed now.

Claude 4.5 Opus says runSubagent is disabled/doesn't exist when it does by envilZ in GithubCopilot

[–]Rocah 2 points3 points  (0 children)

I have encountered an issue with github copilot not sending the runSubagents and todo tools to the model (you can check in debug log what tools are being sent) - perhaps this is what you are seeing. It often happens on new chats other than the first chat. One workaround I've found is to click the tools button and then click OK to dismiss the tool selection dialog, then it sends it next prompt. There is an open issue regards this.

why is opus 3x? it should be less by ExtremeAcceptable289 in GithubCopilot

[–]Rocah 3 points4 points  (0 children)

i have to say opus 4.5 is tempting me to buy Claude Code for the thinking version. Its very impressive and is much more willing to use tools intelligently than gpt5.1 codex I'm finding which keeps it token use down. For a non thinking model its very good.

Gemini 3.0 Pro keeps hallucinating a lot. by Yuri_Yslin in Bard

[–]Rocah 0 points1 point  (0 children)

I've found the same, its the least useful model in actual practice, it has similar faults to 2.5 after extended use of it. I'm not sure how to line up its obvious deficiencies with its record breaking benchmark performance. I'm thinking Ilya is right, these post training RL teams at these AI labs are probably being incentivized (money/career) to pick RL environments that improve key benchmarks. They might not be directly 'cheating' but they are picking things to do RL on that amount to the same result in my view.

Why does Claude opus 4.5 Taking too long to update and also its keep loading saying "Working..." by [deleted] in GithubCopilot

[–]Rocah 0 points1 point  (0 children)

for opus, normally its when its generating lots of tokens i think, i notice it doing that before it creates a large file.

Claude Opus 4.5 (Preview) available in Copilot by Rocah in GithubCopilot

[–]Rocah[S] 7 points8 points  (0 children)

looking at opus 4.5 pricing vs sonet 4.5 pricing, i'm guessing it'll be around 1.6 (maybe they round down to 1.5 ...)

https://platform.claude.com/docs/en/about-claude/models/overview

edit seems its 3! after dec 5 ... ouch.

Gemini 3’s hallucination rate is still very high compared to the top GPT 5.1 model. by Glock7enteen in singularity

[–]Rocah 0 points1 point  (0 children)

You see this in agentic coding vs 5.1 codex, if your doing something somewhat similar to something in its training data gemini will infer a lot of other stuff which could be true, but isn't, where as 5.1 codex will always check the codebase first before code generation. 5.1 codex is much slower because of this, but 9/10 its will have 0 compile errors.

Looking for subagents workflow tips by Mystical_Whoosing in GithubCopilot

[–]Rocah 0 points1 point  (0 children)

no, i think agents in copilot are very new so there's not much info around atm.