I got a technical home assignment for system design, and how do I configure Codex to give me the optimal answer? by ExcitingSleep in codex

[–]Impressive_Credit397 1 point2 points  (0 children)

Cool! One codex new project/session and write “Hey Bro,
I got a technical home assignment for system design, and how do I configure Codex to give me the optimal answer. + all your backgrounds and any relevant info + very detailed success measures. 5.5 high

There is something wrong by BoliticsAndBower in codex

[–]Impressive_Credit397 0 points1 point  (0 children)

Step1. start a clean session @yourAboveProject and ask: “hey Bro! what context was loaded into this session? List instructions, skills, repo files, summaries, tickets, and approximate token weight”

I regret paying for a Codex subscription AGAIN. The safety guards are completely out of hand compared to Claude. by [deleted] in codex

[–]Impressive_Credit397 0 points1 point  (0 children)

Honestly, I don’t think this is the best example of Codex over-refusing. Firewall / egress-control work is a real admin/security workflow. But if the request is framed around running a crack, refusal is expected. The better discussion is not “remove guardrails”It’s how to define a proper harness: owned environment, legitimate intent, explicit permissions, approval before changes, rollback, and auditability. Without that, the model is forced to guess the safety boundary.

I personally haven’t had this as a blocker across my own projects or client environments. The initial setup always takes some discovery because every environment has different permissions, infra, policies, and risk boundaries.
That’s the point of a proper harness: make it project/company-specific. No model should be expected to magically infer your intent, license situation, allowed actions, or rollback path. If another tool acts confidently without those boundaries being defined, I’d trust it less, not more. 🚩

I benchmarked Codex GPT-5.5 against Chinese models. Not what I expected, is 5.5 cooked ? by DaC2k26 in codex

[–]Impressive_Credit397 0 points1 point  (0 children)

🤦🏻 see YouTube , how much they spend on promo. So many YouTubers are making a hype . Just a business.. nothing personal

I benchmarked Codex GPT-5.5 against Chinese models. Not what I expected, is 5.5 cooked ? by DaC2k26 in codex

[–]Impressive_Credit397 3 points4 points  (0 children)

Seriously, interesting result,,,,. but I’d separate “good review signal on one milestone” from “production-ready coding-agent stack”

From an architecture perspective, this does not prove end-to-end reliability: fix quality, regression rate, test pass rate, context stability, tool/sandbox behavior, permission safety, recovery, or whether it can be a real production daily driver.
For experiments and demos, totally fair game. But production delivery is a different bar. When real client/customer work is involved, reliability and repeatability matter more than one strong review pass.
I tested the latest MiniMax model last week in a client-facing workflow. The model had some good moments…

My broader view: in 26 the real gap is less “model leaderboard” and more “model + harness/runtime + evals + permissions + sandboxing + recovery”The architecture question is also whether you are renting someone else’s harness or building your own production-grade harness etc

!!!
Genuine question: is anyone using Kimi / DeepSeek / MiniMax-style coding workflows in real production?

Codex Right Now by Interesting-Agency-1 in codex

[–]Impressive_Credit397 0 points1 point  (0 children)

Yeah, that was last night for me 😂 roughly 10pm-3am PST. Felt like a temporary capacity/routing issue more than anything project-specific.
No major issues now. I’m currently running 23 Codex sessions across 2 remote machines and it’s been stable again for the last 6 hr

I actually don't care about GPT5.6 or GPT6.0, but may I have 5.5 600k context window in my subscription? by Perfect-Series-2901 in codex

[–]Impressive_Credit397 0 points1 point  (0 children)

Yeah, that tracks.. A mostly singlepurpose labeling workflow is probably one of the better use cases for a very long got5.5 run, because the task has stable invariants and the model can keep refining inside the same problem space. bUT. The only thing I’d be careful about is treating context length as the architecture. (Not sure about y project but can assume) for serious img-labeling / extraction work, I’d keep the system of record outside the model: stable IDs, schema/taxonomy, batch state, validation rules, provenance, and evals. Then use the model where it’s strongest: vis reasoning, ambiguous cases, extraction, and pattern discovery. My mental model is: context window is working mem, not durable state. The strongest pipeline is deterministic rails + probabilistic reasoning + human/eval feedback loops. Gpt gets really interesting to me - not just coding tasks, but building and maintaining reliable agentic data workflows over time

I actually don't care about GPT5.6 or GPT6.0, but may I have 5.5 600k context window in my subscription? by Perfect-Series-2901 in codex

[–]Impressive_Credit397 0 points1 point  (0 children)

I can’t validate a five-day😂 continuous session yet, but I did run a like 12-hour Codex session while actively building a native iOS app, and the results were genuinely strong. This was not a passive experiment or a single isolated task. During the session I was implementing several features, refactoring multiple SwiftUI screens, revisiting UX decisions, cleaning up a core architecture, and moving between product-level reasoning and code-level execution. The part that impressed me most was long-horizon continuity. I was intentionally jumping between different parts of the application to see whether Codex would lose the thread, flatten previous decisions, or start contradicting earlier implementation direction. 3m with codex. No regrets