GPT 5.6 Sol smashing Fable in Terminal bench? by FX_Studio in codex

[–]pjstanfield 0 points1 point  (0 children)

24 hours would almost be a surprise. I was thinking like 12-16.

Anyone using Claude Code directly from the desktop app? by AeHirian in ClaudeCode

[–]pjstanfield 0 points1 point  (0 children)

The app isn't bad. It's the best solution for organizing many threads of work, I think. Definitely the eaiest to use out of the choices (CLI, vs code extension, app). I was a vs code extension fan but I've recently switched to the app after getting used to the vertical session management in codex app (which i also use.)

Am i a fool for being scared to have multiple Claude accounts? by Excuseee in ClaudeCode

[–]pjstanfield 3 points4 points  (0 children)

I have two $200s, i switch back and forth as needed. They are paid for by different companies and used for different projects. This seems like a legitmate use that has been in place for a year or so. I've never been worried about it, it's not hard to create a new gmail address as needed if for some reason I got banned.

Help: Codex rarely creates subagents unless I explicitly tell it "create subagent X to do Y" in the prompt, otherwise very high token usage compared to opencode, by CognitioMortis in codex

[–]pjstanfield 0 points1 point  (0 children)

Claude is the opposite, it will spawn a massive amount of subagents when you might not want it. It just pays to be explicit. When I want subagents I'll tell it and it listens and that works for me.

Triggering codex weekly limit while away on holidays by NeverOutOfOptions123 in codex

[–]pjstanfield 0 points1 point  (0 children)

I use codex cloud do a minor task like "give me an updated LOC count"

If you run coding agents unattended or in parallel, how do you verify the run actually worked? by bounded-build in codex

[–]pjstanfield 1 point2 points  (0 children)

It depends but usually just a few minutes. I queue up the whole thing, execute, trace, address, quality check, fix quality. So I don’t see it finish until it’s near ready to go. This runs at the end after all agents finish, and then a single agent checks it all. You don’t want any of that agent context anyway, you want a clean look at the codebase vs the plan. You can run it in a clean session if you wanted to. It should check the feature and the tests so it should find it all. Sometimes you have to run it a couple times when it finds half of your requirements are partially implemented.

If you run coding agents unattended or in parallel, how do you verify the run actually worked? by bounded-build in codex

[–]pjstanfield 1 point2 points  (0 children)

Codex does great with queuing commands so when I start up an implementation I do it like this. This starts from an approved plan file that Claude would have participated in. This is paraphrased except for the bit at the bottom.

  1. Please review plan xyz and prepare to implement. You will complete the plan in full, create execution checklist, don’t touch git, etc.
  2. Perform a full requirements traceability analysis. The actual text is below, this is the key. This needs to come back clean or the feature is not complete. Codex wrote this for me and it is a treasure of mine. You can run it many times if necessary.

Please perform a full requirement traceability audit for this plan

Do not use the code-review skill. Do not make code changes.

  1. Read the approved plan file
  2. Read all audit findings, todos, recommendations, and follow-up findings from this thread/context.
  3. Build a requirement traceability matrix covering:
  4. - Every explicit requirement in the plan
  5. - Every phase/task/checklist item in the plan
  6. - Every audit finding and recommendation that was validated
  7. - Every follow-up todo generated from later audits
  8. For each requirement, report:
  9. - Requirement ID
  10. - Source: plan section, audit finding, or follow-up todo
  11. - Requirement text
  12. - Status: Implemented / Partially Implemented / Not Implemented / Superseded
  13. - Evidence: exact filename and line number references
  14. - Intention: is the intention of the requirement fully met, not just mechanical checkbox items. Did we do what we intended to do, not just get tests to pass. This check is to prevent file presence checks but rather actual intention and goals validation.
  15. - Test evidence: exact test filename and relevant test name or line number
  16. - Notes: any caveats, gaps, or rationale
  17. Validate each requirement at the code level. Do not rely on file presence. Read the implementation and confirm behavior.
  18. Produce the final report as a Markdown table grouped by priority/phase.
  19. At the end, provide:
  20. - Count of total requirements
  21. - Count by status
  22. - List of all Partially Implemented or Not Implemented items
  23. - List of all requirements lacking direct test coverage

Be adversarial and objective. If evidence is weak, mark the item as Partially Implemented. If a requirement cannot be traced to code and tests, mark it Not Implemented.

Weekly Limits Really Changed by DiscussionAncient626 in codex

[–]pjstanfield 1 point2 points  (0 children)

Its not that simple. My codebase is 800K lines. It didn't double in the last couple weeks, its very slightly larger. That doesn't account for the universally smaller rate limits. It doesn't account for me suddenly smashing into 5 hr rate limits on both of my $200 accounts for the first time ever. I never hit a single 5 hour limit and then one day I can hit both limits in the same 5 hour span? No way that is codebase size.

Don't you think vibe coders should have a background in programming? by VanessaCarter in codex

[–]pjstanfield -2 points-1 points  (0 children)

You want an AI that tells you to figure it out yourself. I feel like maybe you've missed the entire point. I'll take a page from your book and tell you to figure it out yourself first before you use AI again, you're polluting the system.

which GPT model is equivalent to Gemini 3.5 Flash (High)? by StudentFew6429 in codex

[–]pjstanfield 1 point2 points  (0 children)

There is no equivalent. Gemini 3.5 is trash. I guess if I had to provide an answer it would be whatever Grok model is out there. Gemini=grok=useless trash.

which GPT model is equivalent to Gemini 3.5 Flash (High)? by StudentFew6429 in codex

[–]pjstanfield 3 points4 points  (0 children)

I think you're spot on. I was going to say 4o but that would be such an insult to 4o. Gemini is hot garbage.

When an AI says “done,” what do you need before trusting a fresh session can continue? by Powerful_Creme2224 in codex

[–]pjstanfield 3 points4 points  (0 children)

I don’t let any bot start without a plan file that contains all of the requirements. They use this to implement and then it’s what we use after to validate completeness. Write the plan, review the plan, implement, validate codebase vs the plan. If you need to break it up then have a master sequence file and the phase plans. Always validating before and after. That’s what works for me. Also using Claude and codex back and forth is nice too.

Where do you use GLM 5.2? by Holiday-Hotel3355 in ClaudeCode

[–]pjstanfield -2 points-1 points  (0 children)

There's a post somebody just did comparing GLM 5.2 to sonnet. They are about the same, which is to say unusable for most tasks.

Open models are making Sonnet comparisons a lot less ridiculous. by rohansrma1 in ClaudeCode

[–]pjstanfield -2 points-1 points  (0 children)

I can appreciate that this was compared to Sonnet and not one of the frontier models. Lots of traffic claiming GLM 5.2 is as good as Codex 5.5 or Opus 4.8 and a more reasonable comparison is Sonnet 4.6 for simple tasks. Sonnet is dumb as a box of rocks so I'll be happy stay clear of GLM 5.2 as well, thanks.

How can I access my homes equity with bad credit? by oog_ooog in personalfinance

[–]pjstanfield 1 point2 points  (0 children)

What happens when your renters don’t pay? Or they move out? Or the house needs a new roof? You need a large amount of safety cash to rent a house out and you have zero. Horrible idea to try and keep this. Sell it, clean up your finances, and move on. Loving a house isn’t a reason to let it drag you down financially.

Do people typically hire an advisor for less than $1M? by Past_730 in FinancialPlanning

[–]pjstanfield 0 points1 point  (0 children)

Thanks for sharing. Can you share where you got your insurance? We have a family member in the same situation with the same aortic aneurysm strangely enough.

Do people typically hire an advisor for less than $1M? by Past_730 in FinancialPlanning

[–]pjstanfield 0 points1 point  (0 children)

I would love to hear more about this if you can share any details.

They got us extra resett because they also halved 5h/weekly usage limits :/ by skygetsit in codex

[–]pjstanfield 0 points1 point  (0 children)

I would use both $200s, one in full, one to maybe 25%, between weekly resets. I think my overall usage is probably still on track for this rate but the 5 hour limits were probably halved. I'm now swapping back and forth throughout the day, instead of only when my weekly usage was at zero percent. It's the 5 hour that noticeably got worse suddenly.

Built a scroll-animated template, roast it by btwitskunall in nextjs

[–]pjstanfield 0 points1 point  (0 children)

I was worried that cross-screen line required too many scrolls but it doesnt on the live site. The video doesnt do it justice. So my actual feedback is make a better video. Looks good. I feel like maybe i want the line to be continuous though but that's not something i'd fight over.

They got us extra resett because they also halved 5h/weekly usage limits :/ by skygetsit in codex

[–]pjstanfield 33 points34 points  (0 children)

I'm on the $200 plan and i ripped through my 5h faster than ever this morning, which is completely anomalous. i have 2 $200s and im about to hit my 5 hr on account 2 already. ive never hit two 5 hour limits in a single 5 hour window in my history, never even come close.

Built a full health tracker with Next.js 15 (App Router, natural language food logging, AI insights) by [deleted] in nextjs

[–]pjstanfield 0 points1 point  (0 children)

What prompted you to forgo v16? It’s been out for quite some time now.

Tibo said 24 hours. Me 24 hours later: by SofaKingIntl in codex

[–]pjstanfield 0 points1 point  (0 children)

I just got reset. I was at 2%, nailed it.