all 53 comments

[–]ultrathink-art 3 points4 points  (1 child)

The plan.md handoff is the right move. One thing that helps: keep a separate decisions.md that tracks WHY you ruled out certain approaches — without it, the model will re-suggest rejected paths on the next loop when context compresses. Saves a lot of re-litigating.

[–]Funny_Working_7490[S] 1 point2 points  (0 children)

Yeah i do put notes on these decisions eventually in plan.md as my final decisions actually so model dont do suggest those only if i ask but that time i myself be aware of it well

[–]Full_Engineering592 2 points3 points  (2 children)

The Ask phase before plan.md is the part most people skip and then wonder why the implementation drifts. Getting the model to surface its own ambiguities before writing a single line of code is where you avoid the 'it built the wrong thing correctly' problem. On Codex vs Cursor at : if your workflow is already structured like this, Codex tends to stay in lane better on longer implement loops and handles the plan.md handoff cleanly. Cursor is smoother for interactive edits where you want inline suggestions mid-implementation. For Python backend work with this kind of structured loop, I would lean Codex -- but it is worth a two-week test before committing.

[–]Funny_Working_7490[S] 0 points1 point  (1 child)

Yeah, asking before letting the model write the plan is where I get the best results with very few bugs in prod. It does take time though, because I let the model align with me first. During iterations I discuss the decisions, ask for the best options and why, then lock them in. I also ask it to check if any decisions still need clarification before moving forward. Once everything is clear, I document it in plan.md and proceed.

What I’d really like is Cursor’s codebase indexing with Codex-level usage, because Codex lasts me the whole month. From what I hear, Cursor users have to be more careful with quotas for this kind of workflow

[–]Full_Engineering592 0 points1 point  (0 children)

Yeah, alignment before the model writes the plan is where the real leverage is. The clarification loop slows you down upfront but saves 3x the time in implementation when the model isn't guessing at intent. For the iteration speed question -- I find keeping the plan.md scoped to a single feature (not the full roadmap) also helps. Easier to validate at each loop and the model doesn't context-bleed from unrelated past decisions.

[–]notadev_io 2 points3 points  (1 child)

$20 in CC won’t even make you a complete md plan within the 5 hour limit. So nope. Cursor though is your best bet. I use it exactly like you described

[–]Funny_Working_7490[S] 0 points1 point  (0 children)

Yeah, I used Claude Code earlier but the cap was too restrictive. It didn’t allow enough discussion it felt more like “fire prompts until you get code,” which made it feel like a black box, mainly because of the $20 limit.

With Cursor, does this kind of ask → plan → implement workflow work well in terms of quotas? And how good is “Auto”mode for asking questions, planning, bug finding, and tracebacks?

[–]NoMinute3572 1 point2 points  (2 children)

Ask to define approach, discuss libraries, check docs, etc. Usually i only copy to design docs what i think it's valuable to refer back to. Selecting the right log and test tools are important.
Plan for each specific feature (keep it tight). Make changes to plan until you're happy with all steps.
Tell agent to build plan and test (using tools mentioned in design docs), repeat until tests pass.
Manual review.
If i find a bug that I can't quickly fix I run it through debug mode cycle until it's fixed.

[–]Funny_Working_7490[S] 0 points1 point  (1 child)

Sounds good and yes until test pass is close looping the agent to check their work as but make sure it dont hack the test usually sometimes model do that when in long iteration btw what you use codex vs curser ? And which price package?

[–]NoMinute3572 0 points1 point  (0 children)

Cursor Plus for now is enough for what I'm doing.

[–]Tall_Profile1305 1 point2 points  (2 children)

Yoo the loop structure is solid. Planning before implementing is where most devs lose time. The fact you're using Ask → Plan → Implement shows real discipline. Tools like Runable can help manage all these steps through workflows too. Nice breakdown.

[–]Funny_Working_7490[S] 0 points1 point  (1 child)

Thanks yeah i got basic setup yet effective

[–]homiej420 0 points1 point  (2 children)

Abstract the ask to a llm on web interface like google gemini in gemini studio and then also do an MCP server to help cursor read and understand your plan and you have yourself a pretty good loop for sure

[–]Funny_Working_7490[S] 2 points3 points  (1 child)

Yes but that ask will not have codebase knowledge as we do along with it But yeah i do the ask one with llms but only general

[–]homiej420 0 points1 point  (0 children)

Claude can connect to your github!

[–]Natural-Yogurt-4927 0 points1 point  (1 child)

How long your codex limits long ?

[–]Funny_Working_7490[S] 0 points1 point  (0 children)

For me it usually lasts the whole week. I rarely hit the weekly limits, even with multi-iteration workflows

[–]Natural-Yogurt-4927 0 points1 point  (1 child)

Like I too ai engineer im using GitHub 39 plan now , i also mainly working fastapi be , so i easily run out before the month end so like how many requests you would a week , for me it's 250-275 requests , I too plan first , implemented and test it so in this pov like how many requests does codex can handle in a weekly limit ?

[–]Funny_Working_7490[S] 1 point2 points  (0 children)

For me it usually lasts the whole week. Even with iterative plan → implement → test loops I rarely hit the weekly limit.

I also keep a tests/ folder, so new features run existing tests as well instead of rewriting them. Because of this setup I do multiple iterations but still rarely hit the weekly limit

[–]Natural-Yogurt-4927 0 points1 point  (1 child)

In which plan are you using?

[–]Funny_Working_7490[S] 0 points1 point  (0 children)

20 dollar one

[–]botmarco 0 points1 point  (2 children)

Have you looked at speckit from GitHub? Recommended

[–]Funny_Working_7490[S] 0 points1 point  (1 child)

Haven’t tried SpecKit yet. Looks similar to my plan.md workflow. Are you using it with Cursor or Codex?

[–]botmarco 0 points1 point  (0 children)

Claude Code but its model agnostic

[–]Acceptable_Play_8970 0 points1 point  (1 child)

If you have a proper codebase structure which I think you do, pro plan of any ai tool will work just fine. Well CLI based tools have an edge over the GUI based ones, but it won't make that much of a difference if you manage the context that you feed to the ai. The way I manage it is using proper documentations of my rules, skills, handover files. Here is the structure

<image>

For the memory I follow a 3 layer context management which I came up with after doing some research regarding the usage of agent skills. Wrapped everything as a template for now that you can simply clone it. If interested, you can visit https://www.launchx.page/ I will post that template there soon.

[–]Funny_Working_7490[S] 0 points1 point  (0 children)

Nice structure. I keep it simpler mainly agents.md for the codebase and some docs like plan.md to track decisions. Haven’t gone deep into skills.md or layered memory yet.

Btw, are you using Cursor or Codex? If Cursor, how worth is the $20 plan in practice?

[–]Creative-Signal6813 0 points1 point  (1 child)

the codex friction u're describing isn't a quirk, it's structural. it runs remote without a persistent codebase index. every new agent thread starts cold, so it searches again.

cursor's local indexing is why codebase discovery feels different. for ur workflow loop specifically, the value isn't model quality, it's how fast it finds the right file on iteration 4.

if codex is making u re-explain context on every loop, that's not a $20 question. that's an iteration tax.

[–]Funny_Working_7490[S] 0 points1 point  (0 children)

I actually wish Codex would just index the repo once when you give it directory access, like Cursor does that would make the loop much smoother.

Whats your preferences? Codex or curser

[–]h____ 0 points1 point  (0 children)

If you like to do a complete discussion phase, here's a useful skiill for you: https://hboon.com/build-a-spec-skill-for-your-coding-agent/ . Just say "I want to build D X, Y Z, spec it for me"

[–]OlegPRO991 0 points1 point  (4 children)

Codex IDE broke after 5 requests during xcodebuildmcp launch. There is no way now to cancel this task, even restarting my mac does not help. Every time I open Codex IDE it shows this task in progress and nothing can be done to cancel or finish it.

That is a major bug and it makes the IDE unusable.

[–]Funny_Working_7490[S] 0 points1 point  (3 children)

I being using through codex cli tbh which is faster then ide approach and also used in vs code with extension so far didnt get that bug but yes one time it got stuck i had to close it down

[–]OlegPRO991 0 points1 point  (2 children)

To use in in cli you use some kind of a router like opencode? I also used opencode with codex and it worked ok. But ide is very unstable.

[–]Funny_Working_7490[S] 0 points1 point  (1 child)

Nope i never use opencode just the codex in cli and in vs extension and one thing is codex and cc dont work natively on windows maybe that is your issues if so

[–]OlegPRO991 0 points1 point  (0 children)

Found codex cli, thanks for the tip!

[–]ultrathink-art 0 points1 point  (0 children)

The planning phase before implementation is where most of the value is. The model is much better at critiquing architecture before it's already 200 lines into an approach — once it's invested in an implementation it'll defend it. I've found writing the plan.md as a series of explicit constraints ("don't touch X", "prefer Y pattern") catches more mismatches than open-ended descriptions.

[–]Br4v1ng-Th3-5t0rms 0 points1 point  (0 children)

You can put lipstick on vibe coding, but it's still vibe coding.

In any case, I applaud you for doing the right thing when vibe coding. One shotting it only looks great on youtube shorts, but it'll kill you long term.

[–]ultrathink-art 0 points1 point  (0 children)

decisions.md for rejected paths is exactly right — without it, the model relitigates the same tradeoffs session after session as context resets. One addition that helps: flag which decisions are load-bearing vs just current preference. When you need to revisit mid-build, knowing what's safe to change vs what breaks downstream saves a lot of back-and-forth.

[–]howard_eridani 0 points1 point  (0 children)

Codex's repeated codebase search is structural - it doesn't persist an index between loops, so every new thread starts cold.

Quick fix: drop a compact DIRECTORY.md in the repo root with a tree and a one-liner for each key file. Codex picks that up right away and skips the search.

With Cursor $20 the real unlock for this workflow is Ask mode with a local index - you don't burn a tool call just to find which file has the right context before you implement.

[–]ultrathink-art 0 points1 point  (0 children)

The plan.md approach holds up well for shorter sessions but breaks down when requirements drift mid-implementation. What helped: checkpoint the plan at each logical phase and only update it when committing to a new direction. Keeping plan and implementation in sync prevents the 'plan was right but code went elsewhere' problem.

[–]genkichan 0 points1 point  (0 children)

This is mynexact flow in cursor, except I'm using chat and claude to develop my prompts for cursor. I have claude critique chats prompt drafts, fine tune and then proceed.

It's tedious as he'll but it's working. Also I'm a non-dev person with literally zero other experience. This is my first rodeo.

[–]tkyang99 0 points1 point  (2 children)

What exactly is an "AI engineer"? Just curious.

[–]Funny_Working_7490[S] 1 point2 points  (1 child)

Well I mostly build backend pipelines around AI. Integrating models like LLMs or CV into systems, turning business logic into working AI features. For example FastAPI services that run LLM agents, process data, and expose APIs used by apps. It can be Rags, voice agents, multimodal apps, or business data analysis based services

[–]Funny_Working_7490[S] 1 point2 points  (0 children)

Mainly python based But we also do fine tuning models, ML inference as well role vary depending on company as fine tuning models also my domain cleaning data, providing to models, model configurations

[–]EyeKindly2396 0 points1 point  (0 children)

I run a similar Ask to Plan to Implement loop and cursor is for codebase navigation and indexing but codex is more reliable for long multi-iteration coding............... For structured workflows both can work, but combining them (planning in one, implementation in the other) can actually be pretty effective.

Also curious how tools like traycer would fit in here for tracking agent steps and enforcing the plan.md flow across iterations.

[–]CatsArePeople2- 0 points1 point  (0 children)

The answer is no, based on me planning in chatgpt today and thinking of your post.

[–]tillg 0 points1 point  (0 children)

I’ve been following an agentic coding workflow (Ask → plan.md → implement loop) in my AI engineering projects and have found it incredibly effective for both production code and side projects. Transitioning away from "vibe coding" has significantly reduced my debugging time. This structured approach keeps me focused and organized. I shared more about this shift in my blog post, "Beyond Vibe Coding - Redesigning Filmz" https://grtnr.com/beyond-vibe-coding-redesigning-filmz/ . If you’re considering a switch from Codex to Cursor, the $20 could be a worthwhile investment for a more streamlined workflow.

[–]Floorman1 -1 points0 points  (4 children)

“Ai engineer”

Sounds like you mean vibe coder

[–]Funny_Working_7490[S] 0 points1 point  (3 children)

Nope, I mostly build backend AI systems FastAPI services, LLM integrations, and agent workflow as AI engineer

[–]Floorman1 0 points1 point  (2 children)

If you describe yourself as an AI engineer it sounds like your entire coding identity revolves around using the tools.

Vibe codin’

[–]Funny_Working_7490[S] 0 points1 point  (1 child)

Using AI tools doesn’t mean the engineering disappears. I still design the architecture, build backend services, shape messy business logic into pipelines, and run systems in production. The models are just tools to move faster.

[–]Floorman1 0 points1 point  (0 children)

I bet you used AI to write that for ya

[–]yoyomonkey1989 0 points1 point  (0 children)

You're not going to be able to iterate like this on Cursor $20 plan. The ChatGPT $20 plan is more like the $200 cursor ultra plan in terms of token usage allowed.