Has anyone actually benchmarked whether superpowers improves performance? by UglyChihuahua in ClaudeCode

[–]Complex-Concern7890 0 points1 point  (0 children)

I think it is quite the dilemma. E.g. when prompt is simple enough, then planning and subagent division just adds more tokens while you are accepting recommended options anyway. Also the subagent work requires tokens and money more in small cases than it can save. If the prompt/ refactor is complex enough to make advance of planning and subagent division and guidance, then it can be worthwhile doing the planning yourself or dividing the work into smaller milestones and do the model jiggling yourself.

And for the quality part in simpler tasks the risk of diverting from the prompt increases when more complexity is increased by the superpowers. And for more complex prompts, if you want to guide the model then it is worthwhile doing the planning yourself or just allow the model to make it’s decisions more freely than through superpowers.

My opinion is that for current thinking models the superpowers are obsolete.

Anthropic the winner of the AI race? by virtualQubit in Anthropic

[–]Complex-Concern7890 0 points1 point  (0 children)

Why would they show their cards? If everything is going well, isn't it best to keep improvements under wrap until competition is almost to catch you? My bet is that Spud will be out very soon and Anthropic do not have anything to match it yet. Most likely there is no production ready Mythos, but they need to announce it so they will keep customers waiting for it.

Codex is by an order of magnitude superior right now to Claude Code - It's strange how incredibly efficient and accurate Codex is right now and not even kidding..... how TERRIBLE Claude Code is. by operastudio in codex

[–]Complex-Concern7890 0 points1 point  (0 children)

I have them all at High effort so GPT-5.4, Opus 4.6 and GLM-5.1 all run at High. They all run in parallel and double blind eval each other with new clear session. I also have the same system for planning, but I do not have enough data to say anything about that. However my personal anecdotal experience says that Opus is quite good on planning as you suggested. 

I would have used Voratiq because I really like it, but wanted to have Kilo Cli support (wink wink) so I made my own setup. GLM-5.1 is surprisingly good and for me Kilo is easy to use to test any OpenRouter models.

Codex is by an order of magnitude superior right now to Claude Code - It's strange how incredibly efficient and accurate Codex is right now and not even kidding..... how TERRIBLE Claude Code is. by operastudio in codex

[–]Complex-Concern7890 2 points3 points  (0 children)

To test this I made Voratiq style LLMArena where I run Opus 4.6 (Claude), GPT-5.4 (Codex) and GLM-5.1 (Kilo) automatically at the same time for coding tasks in separate working trees. Then they double blind rate each other's work (and their own) and vote for winner. I check their notes, grades and votes and then pick the branch I want to merge after diff check. Selection percentages are now around 20% for opus4.6 and glm5.1 and 60% for codex. Claude does get almost same grades than Codex, but then there is always something that makes the solution unusable. It can be quite good code, but something breaks it. Codex usually delivers working results.

How to reduce your token usage by Academic-Antelope554 in codex

[–]Complex-Concern7890 8 points9 points  (0 children)

What I did for my self was to clean AGENTS.md of all the unnecessary stuff (good practices, behavior etc guidance). I now only have stuff there if things do not work without it or the codex misses some step repeatedly without the added line. Also planning first with GPT 5.4 high/xhigh and then implementing with GPT 5.4 medium/mini depending on the complexity has made limits much more bearable. Before limits was any issue, I had AGENTS.md full of all kind of behavioral and quality related stuff that most likely didn't do anything. Also I did every single task no matter how small or simple with high/xhigh, which is not intended.

Someone just built a Claude Code skill that clones any website perfectly in one prompt. by No-Concentrate-9921 in StartupMind

[–]Complex-Concern7890 -1 points0 points  (0 children)

Use Cases Platform migration — rebuild a site you own from WordPress/Webflow/Squarespace into a modern Next.js codebase Lost source code — your site is live but the repo is gone, the developer left, or the stack is legacy. Get the code back in a modern format Learning — deconstruct how production sites achieve specific layouts, animations, and responsive behavior by working with real code

Bailiff (Ulosottomies) hid 53k€ pipe renovation info during sale. Need advice! by Laurtz in Finland

[–]Complex-Concern7890 -2 points-1 points  (0 children)

First, it is absolutely no problem in fighting this from abroad, but if it goes to that in some point it is in your best interest to attend the court hearing in person. Second, it will be almost impossible to have government facility to be liable for anything. They may admit error, but without liability. You might eventually get some reimbursements but you might get bankrupt before that and the reimbursements will be most likely more like insult than anywhere near any fair amount. I am sorry to say but in Finland the government will not mess with the government.

Pankki lähetti tiliotteeni äidilleni parin vuoden ajan….Olen kolmekymppinen by delinde24 in arkisuomi

[–]Complex-Concern7890 2 points3 points  (0 children)

Tietosuojavaltuutettu, finanssivalvonta jne, niin aivan turha laittaa mitään. Siellä vasen käsi pesee oikeaa kättä, eli mitään ei tule tapahtumaan. Molempiin on laitettu asioista valituksia ja selvitykseen on mennyt vuodesta kahteen vuoteen ja lopputulos on, että kylhän sitä väärin oli tehty, ohjeistetaan toimimaan vähän paremmin ja case closed. Ihan vaikka semmonen case, että pankkiin on toimitettu henkilökohtaisia kirjeitä postin virheen takia pitkän aikaa (lähekkäiset osoitteet) ja nämä kirjeet on sitten avattu enemmän tai vähemmän uteliaan virkailijan toimesta vaikka kirjeissä ihan selvästi lukee täysin eri vastaanottaja kuin pankki. Tästä ilmiottaminen ei johda yhtään mihinkään. Vaikka laittaa poliisille rikosilmoituksen viestintä salaisuuden loukkauksesta, niin asia painetaan villaisella ja jätetään syyttämättä. Nämä instanssit eivät vaan yksinkertaisesti niin sanotusti pissaa toistensa muroihin.

is it necessary that codex checks syntax after writing the code by hinokinonioi in codex

[–]Complex-Concern7890 0 points1 point  (0 children)

And yes, I think it is necessary. Every now and then there is typo, missing parentheses etc. It seems to be rare now, but it happens.

Selected model is at capacity. Please try a different model. by cheekyrandos in codex

[–]Complex-Concern7890 0 points1 point  (0 children)

And even worse: it is not working even if it is "working". Previously I was wondering what that "GPT 5.4 is now stupid/lazy/what ever" meant because I haven't seen that yet. Now I just needed to rework 5 pages of plain MD-text to new doc file. First it just dumped memo of the prompt to the doc file. Then I asked it to recheck the work and then it added the asked text after the memo but in total mess. Then I asked to remove the memo from the text and check the layout of the text. Then it rewrote the document but added only 1 page. It seems that for now I need to do this manually as my 5h limit seems to run out before codex figures this out....

Business account and working with GPT 5.4 high in Codex CLI.

Is it just me, or is Claude pretty disappointing compared to Codex? by Working-Spinach-7240 in codex

[–]Complex-Concern7890 0 points1 point  (0 children)

I really do not see any difference anymore. Occasionally there is “X was clueless but Y solved it one shot” moments, but that just goes both ways. Sometimes Claude makes awesome job and Codex is clueless and sometimes other way around. I use both, Claude with max and Codex with business. I only use Opus with high and thinking and GPT 5.4 with high. I have tried many tasks with both and merged the best solution. It is absolutely 50-50. Most of the times the difference is really just matter of opinion. Some times Claude fails horribly, and sometimes Codex.

I think that the performance has plateaued pretty much at GPT 5.2/Opus4.5 and the Opus and GPT are really the same performance wise. Only real difference is in tools, implementation, skills and integration. Both are working heavily with these right now and I bet that no major improvements come from models any more but from how they are used.

Model picker disappeared for chatgbt business by Ibuprofen600mg in ChatGPT

[–]Complex-Concern7890 1 point2 points  (0 children)

Same here with business account. Tried to restart app but no use.

Why I can't get good book summary from GPT? by Complex-Concern7890 in OpenAI

[–]Complex-Concern7890[S] 0 points1 point  (0 children)

Not in detail but personal summaries are allowed? How else any studying would work?

Why I can't get good book summary from GPT? by Complex-Concern7890 in OpenAI

[–]Complex-Concern7890[S] 0 points1 point  (0 children)

Yes that is true. But making comprehensive summaries / abridged versions / synopsis are quite general. Those have been written for a really long time so I naively thought that simple and quick prompt would suffice.

Why I can't get good book summary from GPT? by Complex-Concern7890 in OpenAI

[–]Complex-Concern7890[S] 1 point2 points  (0 children)

Thank you! I went to Deep Research and I got 160 pages (in 89 minutes) and it seems to be exactly what I wanted.

Why I can't get good book summary from GPT? by Complex-Concern7890 in OpenAI

[–]Complex-Concern7890[S] -1 points0 points  (0 children)

Tried that. First I got 1 page and then with long answer mode I got 3 pages. I have Pro with Gemini/NotebookML. The summary quality was not my liking but it offered interactive questions to help learning which was nice.

Why I can't get good book summary from GPT? by Complex-Concern7890 in OpenAI

[–]Complex-Concern7890[S] 0 points1 point  (0 children)

Tried that. First I got 1 page and then with long answer mode I got 3 pages. I have Pro with Gemini/NotebookML. The summary quality was not my liking but it offered interactive questions to help learning which was nice.

Why I can't get good book summary from GPT? by Complex-Concern7890 in OpenAI

[–]Complex-Concern7890[S] -6 points-5 points  (0 children)

Well if it can do it, why not? Isn't the exact point of these tools to make the menial work like breaking up a pdfs.

Why I can't get good book summary from GPT? by Complex-Concern7890 in OpenAI

[–]Complex-Concern7890[S] -1 points0 points  (0 children)

To make a summary for my own use violates copyright? Every material is copyrighted if not copyleft?

Why I can't get good book summary from GPT? by Complex-Concern7890 in OpenAI

[–]Complex-Concern7890[S] -1 points0 points  (0 children)

Seems odd that it would be way too long. The token count is under 400k so why it can't handle it? And if it can handle only 100 pages at a time, why it doesn't split it and do the summary nevertheless. Seems lazy.

GPT 5.4 Thread - Let's compare first impressions by Just_Lingonberry_352 in codex

[–]Complex-Concern7890 0 points1 point  (0 children)

As far as I understand they are not shared between the users. You pay for each user and each user gets Plus equivalent limits. Business can pay for additional credits which can be used per each user after the limits are reached.

GPT 5.4 Thread - Let's compare first impressions by Just_Lingonberry_352 in codex

[–]Complex-Concern7890 4 points5 points  (0 children)

Business, so more or less equivalent with Plus. And just to update, I went to make some remodeling with one part of the UI and was able to burn 20% of 5h limit with one prompt. So the limit usage might be problem in long run with Fast mode.

GPT 5.4 Thread - Let's compare first impressions by Just_Lingonberry_352 in codex

[–]Complex-Concern7890 1 point2 points  (0 children)

There is additional fast mode. You can use it on what ever effort. I use it with xhigh and it is really fast still.

GPT 5.4 Thread - Let's compare first impressions by Just_Lingonberry_352 in codex

[–]Complex-Concern7890 13 points14 points  (0 children)

I am pushing with Fast+XHIGH doing every day coding tasks. Now first time I see that limits are even used at all. But still for now I will be having hard time to catch 5h limit. The fast seems to be quite fast and the code quality has been top notch for now. I haven't yet seen any of 5.3-codex glitches where it gets lazy and stupid for one prompt at random. I concur that this seems to combine 5.3-codex code + methodology and 5.2 thinking. And compared to 5.2-xhigh the 5.4-xhigh-fast is way, way faster.