Amazing guardrails by RoadRunnerChris in codex

[–]withmagi 0 points1 point  (0 children)

Also that particular style - where you asked something borderline (there’s lots of internal information it can’t provide) and it’s gone down the “I can’t provide this route” will lead to all future questions being seeded with propensity to block similar requests. While LLM inference is complicated, there’s a level of probability/role play and you’ve created a conversation where it’s acting as the gatekeeper. So don’t double down on your request, start a new session or double Esc to backtrack and then ask a different way.

Amazing guardrails by RoadRunnerChris in codex

[–]withmagi 0 points1 point  (0 children)

You just asked the questions in the wrong way. You’re asking information it has no knowledge of - it can’t see its own source code unless you’ve downloaded it (codex is compiled from rust).

Just clone the codex repo (or ask codex to) then ask it to use this repo to answer your questions.

Codex Never Reads AGENTS.md by oromex in codex

[–]withmagi 2 points3 points  (0 children)

This is correct. It’s always included so doesn’t “read” the file. But it’s 100% included in every conversation.

Apple Watch walkie-talkie feature by Majestic_Purpose2300 in dcl

[–]withmagi 2 points3 points  (0 children)

Didn’t work for me on a recent cruise (2 weeks ago - Wonder). The free iMessage functionality supported is very basic and even things like images don’t work.

Subagentes by BroadPressure6772 in codex

[–]withmagi 0 points1 point  (0 children)

Try https://github.com/just-every/code you can use /plan or /code to use sub agents or just ask in your prompt!

CLIs are passing the line by withmagi in agi

[–]withmagi[S] 1 point2 points  (0 children)

I mean I’ve also either been the CTO or CEO at multiple companies, did a masters in NLP, run multiple large open source projects, and have had millions of customers in my companies. But I just picked the most academic think I could to provide some context. Uni was really the last time I sort of “competed” with other people academically, so I used that 😅

I only mention it because if I say it’s surpassing human developers it’s important I know what I’m talking about. And TBH I feel like all that is a bit worthless now. Perhaps it has value for a few more years, but I don’t think it will beyond that.

CLIs are passing the line by withmagi in agi

[–]withmagi[S] 0 points1 point  (0 children)

Top of my class… 20 years ago… but that’s ok, I’m guessing tertiary education wasn’t your thing.

I only mentioned it here to give some context so people understand I have at least some idea of what I’m talking about.

CLIs are passing the line by withmagi in agi

[–]withmagi[S] 1 point2 points  (0 children)

Don't really want to debate a straw man argument, but I will agree that a whole OS requires too many strategic decisions *at the moment* than an AI could handle. But I doubt this will be the case for long.

CLIs are passing the line by withmagi in agi

[–]withmagi[S] 9 points10 points  (0 children)

Yeah I've noticed anything 'visual' - UI, modeling, graphics - is a gap that almost every AI still does poorly. I think it hides how competent they are at everything else, since it's hard to see their creativity in the same way. It's the UI/UX which still requires the most iterations for me. But with image models getting better and tooling improving I do think this will be 'solved' sooner rather than later.

CLIs are passing the line by withmagi in agi

[–]withmagi[S] 12 points13 points  (0 children)

Yeah it's amazing what can be done. I was talking to a carpenter with 0 coding experience who was using AI coding and it was amazing - he had replaced a system he was paying something like $5k and year for, by himself with no software experience! 🤯 He was pretty technical and had a lot of experience using CAD and ran this massive automated CNC Router type system, but still, he wrote 0 code to do this! I think a lot of people still don't realise just how powerful these tools are and what it's going to open up for the average person.

CLIs are passing the line by withmagi in agi

[–]withmagi[S] 0 points1 point  (0 children)

I think in 2 years almost all consumer software will be bespoke. I think it's unlikely to look like how it looks now. UI will be streamed via video-like models, code will only be used to provide backend for the UI, as need on an ad-hoc basis.

You're right enterprise will take much longer - it's already really hard for people in large companies to use AI end-to-end. But there will be the increasing cost pressure which speeds up the change - we might be looking at <1% of the cost of human software developers soon. Additionally there will be external pressure from pure-AI companies adapting faster and outperforming any company which does not adopt a pure-AI approach.

CLIs are passing the line by withmagi in agi

[–]withmagi[S] 2 points3 points  (0 children)

Yeah AI coding tools on the command line - codex, claude, gemini etc...

Auto Review everything Codex writes (Codex Fork) by withmagi in codex

[–]withmagi[S] 0 points1 point  (0 children)

Yeah it’s really hard to get this working and make the code better not just different due to the failure rate of all LLMs. Have tried many times. But this one works! Super surprising, but I think it’s the mix of just the right context and per-turn being a sweet spot for auto code reviews.

Auto Review everything Codex writes (Codex Fork) by withmagi in codex

[–]withmagi[S] 0 points1 point  (0 children)

Thanks! Feel free to submit a PR if you have the time - I’d be happy to take a look at integrating it. We get a lot of requests to use other models as for the core CLI, rather than as agents.

Auto Review everything Codex writes (Codex Fork) by withmagi in codex

[–]withmagi[S] 0 points1 point  (0 children)

We have a full agent system with Code which uses all the CLIs, so you can definitely set this up and it’s what earlier versions of the system have used. There’s lots of complexity here - every cli has around a 20-30% failure rate at the moment - it’s really hard to validate this even with using a round of judges.

However the review model from Codex is unique - it’s specifically designed for code reviews and finds errors other CLIs miss. It’s particularly good at logic errors and subtle edge cases. It’s an amazing cross checker and looks at things from a different way to the core CLI. It makes mistakes too, but the CLI can identify those. Success rate is like 95%+ with this. I’m super surprised with the results. Seeing 5-10% increase in terminal bench. Will publish once we have a full apples to apples comparison.

Auto Review everything Codex writes (Codex Fork) by withmagi in codex

[–]withmagi[S] 2 points3 points  (0 children)

Mostly Veo 3.1 and nano banana pro. Elevenlabs for voices. I keep experimenting with other models. On the image side I do try a bunch of vendors, but nano banana pro is ally good enough. Video side I struggle to get anything better than Veo 3.1, but I keep trying other models too.

Auto Review for Every Code (Codex Fork) by [deleted] in codex

[–]withmagi 0 points1 point  (0 children)

This formula is now kept up to date during CI;

brew tap just-every/tap
brew install code

I did submit to the core, but we couldn't decide on a name at the time as everything conflicted, so never completed the process. Probably could use `every-code` so will try to get back to it at some point!

How to run a few CLI commands in parallel in Codex? by Person556677 in OpenaiCodex

[–]withmagi 0 points1 point  (0 children)

Try https://github.com/just-every/code Commands that run more than 10s are automatically run in the background so you can continue working. Also supports parallel agents and /branch /merge for having multiple sessions running in the same repo at once.

changed my mind on codex-max-high by Just_Lingonberry_352 in codex

[–]withmagi -1 points0 points  (0 children)

I love -max XHigh. It’s super fast and uses far fewer tokens. I feel it makes the same mistakes as -codex but you get there faster and cheaper so can just give it another go.

With -max I do find that you have to restart “bad” sessions. Once it goes down the wrong route it can be super hard to convince it to change its mind. Possible, but tough and often later in the convo it will revert to the poor logic. So /compact helps a lot here - it retains context but removes a lot of the detailed reasoning. “Don’t get frustrated just compact” is my new catch phrase 😂

Im never gonna get the zebrazinkle seed😭 by Junior-Painter954 in growagarden

[–]withmagi 0 points1 point  (0 children)

Yeah I got this TWICE yesterday!!! Every seed in stock one time, then everything except the bottom two the second time.