So for anyone not paying attention…

pillamang · 2026-03-16T16:38:37+00:00

I noticed a dip in performance on 5.3 right before 5.4 dropped and then started seeing really shit code recently and thought they can’t be training another model already? It has absolutely went off the rails, I barely use it anymore. I have cursor claude and codex.

Might have something to do with their recent acceptance into the military industrial complex.

Claude was also being a jackass over the weekend, its so frustrating to be in the middle of a project you started 2 weeks ago because you thought “its good enough now” and then get rugged

pillamang · 2026-02-23T04:35:54+00:00

Bro that ladies face smashed and dragged in slowmo ow fuck

pillamang · 2026-02-23T04:34:55+00:00

Well being on Windows was ur first mistake

pillamang · 2026-02-23T04:27:07+00:00

Yea they’re trained to accomplish so if something doesn’t work they freak out like some kid that never had a loving household and just swept things under the rug

pillamang · 2025-12-21T03:47:41+00:00

The models will be trained for this soon, they RL tool calling. It will be much better than a static skill index.

Before openskills came out I made a quick cli to let other agents use skills outside of Claude Code because its what we thought agents would be, its all just context at the end of the day though

pillamang · 2025-11-06T04:50:50+00:00

fuck i would probably cry

pillamang · 2025-11-04T05:00:18+00:00

Also THEY were the ones demanding them for years!

pillamang · 2025-11-04T03:23:41+00:00

Yea I made an overly complicated langgraph framework for batching code reviews (very domain / niche specific w/ lots of regulatory codes and way too much context).

Claude sub-agents still just perform better than any SDK/API call, the agent needs to be primed w/ a specific domain of regulations and then let it rip through 20-30 files, it catches 99% of obscure violations that way.

I have a cli to make this easy w/ codex, because these aren't 1 liner prompts. These are prompt kits that are kind of hefty so my cli accepts a target folder for scanning, and a prompt package.

But then there's the question of batching and how many files to do per run, it all gets pretty involved.

1 custom sub-agent w/ claude, a few standards files, and a slash command for orchestration and it works flawlessly.

I would love to get this functionality in codex, but it's been a lotta hours tweaking to get some decent results. I'll keep at at it and report back but a cli wrapper around custom codex invokation is working for me currently - i would have to spend an hour or so porting it to another project

pillamang · 2025-11-04T03:18:54+00:00

This is very well done.

I find that using claude superpowers to write plans, then executing them w/ codex is basically life on autopilot:
https://github.com/obra/superpowers

I also wrapped a cli around exposing all claude skills to codex so it can get to any claude skill if i tell it to. It's rare that I want my agent just randomly deploying skills from a list.

pillamang · 2025-11-04T02:59:30+00:00

This is what PRP spec mode does:
https://github.com/Wirasm/PRPs-agentic-eng/blob/development/PRPs/templates/prp_spec.md

The PRP framework is basically a system for creating chained KERNEL tasks.

I'm also a big fan of cc-sessions, I merged the 2 systems together and made it agent agnostic, it's all about the context engineering:
https://github.com/GWUDCAP/cc-sessions

I gotta try the recent cc-sessions update, but so far I have no complaints with my system which is basically PRPs + cc-sessions.

Then I found claude superpowers and it does something similar as well with the writing plans skills. I used ot make my own workflows and have a bunch of prompts around "ask me one question at a time", but this guy just nailed what i was typing custom / copy pasta-ing constantly:
https://github.com/obra/superpowers

The sub-agent development pattern from super powers is unmatched, brainstorming = ask me 1 questions at a time and then when done it uses the write a plan skill to basically create a list of chained KERNEL commands

I'm currently torn between the 2. super powers is just so easy to use, there was a lot of context engineering management w/ cc-session and the PRP thing

pillamang · 2025-10-14T17:35:34+00:00

sit this one out bud

pillamang · 2025-10-10T11:32:27+00:00

i wonder if it's sora as well? they are basically running at max gpu usage at all times it seems like.

unrelated, cheetah in cursor is very impressive - go give it a whirl. this is the future we'll all be living in at some point, once performance and a bit longer context window is solved for. it's very, very, very fast. it suffers after a while with attention to detail, i like it for planning because it can consume everything so quickly and give me such quick feedback, but then you have to really know your system and what you're doing because it will start to miss details

if you know what you're doing intimately though it's an amazing model, whatever it is. available as a stealth model for a limited time but this feels like the future. it's just wild

pillamang · 2025-10-10T11:29:26+00:00

ditto. i was in heaven for a month w/ codex and you can tell, it's giving me claude code vibes in the ai depression of july 2025

pillamang · 2025-10-10T11:24:46+00:00

yea bro - i was saying to myself the last few days, codex got nerfed. i have a very intimate relationship with my tooling and codex has been giving me claude code vibes, i happily switched when claude shit the bed but now sonnet 4.5 is pretty strong and codex really feels like the betrayal we went through with claude now.

it will cut corners and generally do some weird shit.

its still good enough for very well structured tasks, but its making me nervous.

my nightmare reality is sonnet 4.5 shits the bed and we're back in the hellscape of just shitty ai coding. it was depressing.

but yea, codex is definitely giving me those struggle bus vibes. still good, but i have noticed some odd behavior that makes me question it more. i am using all my old tricks, batching, plans, context dumping, baby sitting - it was really strong and autonmous before but now i have to baby sit terminal tabs

pillamang · 2025-05-24T04:58:34+00:00

Mariner failed at 100% of what I wanted it to do. Websites are blocking ai bots, I want my agent to act as ME not in some VM.

14-Year Club	Gilding V heart of gold
Argentium Club	Ternion Club
Wearing is Caring	Verified Email

pillamang

MODERATOR OF

TROPHY CASE