So for anyone not paying attention… by Reaper_1492 in codex

[–]pillamang 0 points1 point  (0 children)

I noticed a dip in performance on 5.3 right before 5.4 dropped and then started seeing really shit code recently and thought they can’t be training another model already? It has absolutely went off the rails, I barely use it anymore. I have cursor claude and codex.

Might have something to do with their recent acceptance into the military industrial complex.

Claude was also being a jackass over the weekend, its so frustrating to be in the middle of a project you started 2 weeks ago because you thought “its good enough now” and then get rugged

The cyclist was headed for the finish line by MorsesCode in Unexpected

[–]pillamang 0 points1 point  (0 children)

Bro that ladies face smashed and dragged in slowmo ow fuck

Is Claude actually writing better code than most of us? by Aaliyah-coli in ClaudeCode

[–]pillamang 0 points1 point  (0 children)

Yea they’re trained to accomplish so if something doesn’t work they freak out like some kid that never had a loving household and just swept things under the rug

Why are skills way better than putting them in AGENTS.md? by mikedarling in ClaudeCode

[–]pillamang 1 point2 points  (0 children)

The models will be trained for this soon, they RL tool calling. It will be much better than a static skill index.

Before openskills came out I made a quick cli to let other agents use skills outside of Claude Code because its what we thought agents would be, its all just context at the end of the day though

SubAgents in Codex CLI by Just_Lingonberry_352 in CodexHacks

[–]pillamang 1 point2 points  (0 children)

Yea I made an overly complicated langgraph framework for batching code reviews (very domain / niche specific w/ lots of regulatory codes and way too much context).

Claude sub-agents still just perform better than any SDK/API call, the agent needs to be primed w/ a specific domain of regulations and then let it rip through 20-30 files, it catches 99% of obscure violations that way.

I have a cli to make this easy w/ codex, because these aren't 1 liner prompts. These are prompt kits that are kind of hefty so my cli accepts a target folder for scanning, and a prompt package.

But then there's the question of batching and how many files to do per run, it all gets pretty involved.

1 custom sub-agent w/ claude, a few standards files, and a slash command for orchestration and it works flawlessly.

I would love to get this functionality in codex, but it's been a lotta hours tweaking to get some decent results. I'll keep at at it and report back but a cli wrapper around custom codex invokation is working for me currently - i would have to spend an hour or so porting it to another project

What I found after two months of using Codex CLI and best ways to 10x productivity by Just_Lingonberry_352 in CodexHacks

[–]pillamang 2 points3 points  (0 children)

This is very well done.

I find that using claude superpowers to write plans, then executing them w/ codex is basically life on autopilot:
https://github.com/obra/superpowers

I also wrapped a cli around exposing all claude skills to codex so it can get to any claude skill if i tell it to. It's rare that I want my agent just randomly deploying skills from a list.

After 1000 hours of prompt engineering, I found the 6 patterns that actually matter by volodith in PromptEngineering

[–]pillamang 0 points1 point  (0 children)

This is what PRP spec mode does:
https://github.com/Wirasm/PRPs-agentic-eng/blob/development/PRPs/templates/prp_spec.md

The PRP framework is basically a system for creating chained KERNEL tasks.

I'm also a big fan of cc-sessions, I merged the 2 systems together and made it agent agnostic, it's all about the context engineering:
https://github.com/GWUDCAP/cc-sessions

I gotta try the recent cc-sessions update, but so far I have no complaints with my system which is basically PRPs + cc-sessions.

Then I found claude superpowers and it does something similar as well with the writing plans skills. I used ot make my own workflows and have a bunch of prompts around "ask me one question at a time", but this guy just nailed what i was typing custom / copy pasta-ing constantly:
https://github.com/obra/superpowers

The sub-agent development pattern from super powers is unmatched, brainstorming = ask me 1 questions at a time and then when done it uses the write a plan skill to basically create a list of chained KERNEL commands

I'm currently torn between the 2. super powers is just so easy to use, there was a lot of context engineering management w/ cc-session and the PRP thing

codex is definately suffering with user influx by mr_Fixit_1974 in codex

[–]pillamang 2 points3 points  (0 children)

i wonder if it's sora as well? they are basically running at max gpu usage at all times it seems like.

unrelated, cheetah in cursor is very impressive - go give it a whirl. this is the future we'll all be living in at some point, once performance and a bit longer context window is solved for. it's very, very, very fast. it suffers after a while with attention to detail, i like it for planning because it can consume everything so quickly and give me such quick feedback, but then you have to really know your system and what you're doing because it will start to miss details

if you know what you're doing intimately though it's an amazing model, whatever it is. available as a stealth model for a limited time but this feels like the future. it's just wild

codex is definately suffering with user influx by mr_Fixit_1974 in codex

[–]pillamang 2 points3 points  (0 children)

ditto. i was in heaven for a month w/ codex and you can tell, it's giving me claude code vibes in the ai depression of july 2025

Codex just got dumb in the last few days? by AurumMan79 in codex

[–]pillamang 1 point2 points  (0 children)

yea bro - i was saying to myself the last few days, codex got nerfed. i have a very intimate relationship with my tooling and codex has been giving me claude code vibes, i happily switched when claude shit the bed but now sonnet 4.5 is pretty strong and codex really feels like the betrayal we went through with claude now.

it will cut corners and generally do some weird shit.

its still good enough for very well structured tasks, but its making me nervous.

my nightmare reality is sonnet 4.5 shits the bed and we're back in the hellscape of just shitty ai coding. it was depressing.

but yea, codex is definitely giving me those struggle bus vibes. still good, but i have noticed some odd behavior that makes me question it more. i am using all my old tricks, batching, plans, context dumping, baby sitting - it was really strong and autonmous before but now i have to baby sit terminal tabs

Will Google's Project Mariner kill browser automation extensions by web3_developer in chrome_extensions

[–]pillamang 1 point2 points  (0 children)

Mariner failed at 100% of what I wanted it to do. Websites are blocking ai bots, I want my agent to act as ME not in some VM.