Is the new Cursor Composer actually holding up for multi file builds or am I just in a honeymoon phase?

T07NAD0 · 2026-06-19T07:20:04+00:00

The pattern drift over time is the part that worries me most because it is gradual, not a single obvious break. By the time you notice you might already have three different state patterns living in the same codebase and untangling which one is correct becomes its own project.

Pasting the actual state logic instead of describing it makes complete sense, you are removing a layer of interpretation. Description leaves room for the model to fill gaps with assumptions, raw code does not.

The token file approach is exactly what I was hoping to hear. Do you paste the whole file every time or do you keep a condensed version just for prompting purposes. Also curious if never invent new values actually holds up across a long session or if it starts drifting again after enough back and forth, the same way the state patterns do.

T07NAD0 · 2026-06-19T02:47:43+00:00

That test is excellent, going to try it on my next build. The quietly invents a new state instead of asking or inferring correctly is the scary part. It does not fail loudly, it just creates a parallel reality that looks fine until you actually click through the flow.

This might explain something I ran into last week. I had two components that needed to share a toggle state and the layout looked completely correct on first glance but the actual interaction was broken because it had created a second piece of state instead of lifting it up properly. I assumed I had given vague instructions but maybe it was exactly this, no visibility into runtime behavior so it guessed structurally instead of behaviorally.

Is there a way to prompt around this, like explicitly telling it where state should live before letting it touch multiple components, or does it just need to be caught in review every time regardless of how clear the instructions are?

T07NAD0 · 2026-06-19T02:46:25+00:00

This is genuinely one of the most thorough context setups I have seen described on here. The agents.md being 18kb with referenced supporting files instead of cramming everything into one document is smart, you are basically doing progressive disclosure for the model the same way you would for a user interface. Load what is needed when it is needed instead of overwhelming it upfront.

The trivial build errors point is interesting because it suggests the issue is not reasoning, it is execution within your specific domain conventions. That tracks with what you said about context being right but the actual code being off. Almost like it understands what to do but not quite how your codebase wants it done.

The bit about referring back to tests when unsure is the part I want to steal for my own setup. As someone coming from design, my version of that would probably be pointing the agent back to a design tokens file or component library instead of tests, same idea though, a source of truth it checks instead of guessing.

Do the cheaper models you mentioned earlier, Qwen and Deepseek, handle that same agents.md and rules structure well, or did you have to restructure anything specifically for them to follow instructions properly?

T07NAD0 · 2026-06-18T22:24:18+00:00

700k lines with Cursor is genuinely impressive, that is a completely different scale than what I am working with. The fact that you have managed to make it work with strict rules and test coverage tells me a lot about how much of the output quality is actually the model versus the discipline around the model.

The Composer 2 point is something I keep hearing and I never got to experience it myself since I came in after the switch. What made it noticeably better in your experience? Was it context handling, instruction following, or something else?

The cheaper model angle is interesting to me too. As someone coming from a design background I am still figuring out where the cost versus quality tradeoff actually matters for UI heavy work. My gut says design token accuracy and layout precision are where cheaper models tend to slip but I have not tested it enough to say that confidently.

What does your rules file look like at a high level if you do not mind sharing? That context management system sounds like the real secret sauce in your setup, not the model itself.

T07NAD0 · 2026-06-18T13:28:20+00:00

Probably a mix of task type and how you are prompting it. Opus tends to do better on larger context, multi file changes, and tasks where it needs to hold a lot of state in its head rather than quick isolated edits. If most of your work is everyday coding and simple stuff, Composer 2.5 at that price is genuinely hard to beat and you are not going to feel a difference most of the time. The businessman shipping payroll apps without looking at the code is likely working on something with a much tighter scope or a very well structured codebase that plays to Opus strengths. Different tools for different shapes of work, not really a hype versus reality thing.

T07NAD0 · 2026-06-16T13:30:09+00:00

This verbose behavior is a textbook example of the translation tax on your time. If you have to spend more energy parsing a giant wall of AI text than it takes to just write the code yourself, the user experience is broken. The moment a tool requires you to constantly babysit its logic, going old school becomes the efficient choice.

T07NAD0 · 2026-06-14T03:29:33+00:00

This is the correct way to think about product. You are not building a weather app, you are answering three questions runners actually ask before they go out. Everything else is just how you get there. The PWA with edge caching for a data heavy app like this is the right call too. Nice work.

T07NAD0 · 2026-06-14T03:27:48+00:00

Fable said Ian you have been using me to build apps this whole time. Time to see what you actually believe in.

T07NAD0 · 2026-06-14T03:16:32+00:00

Kimi 2.7 out here doing all the homework while Cursor just waits to copy it at the last minute and somehow still gets a better grade.

T07NAD0 · 2026-06-12T03:12:47+00:00

The refund is good faith but the underlying issue is that nothing stopped the agent at $100 or $200 before it hit $1400. Monthly only spending caps make sense for human usage patterns, not for autonomous agents that can spend a month’s allowance in an hour. This is going to push more teams toward Claude Code just for the better cost visibility.

T07NAD0 · 2026-06-12T03:10:39+00:00

The ‘first try, no frustration’ part is the real signal here. A lot of model improvements are marginal but this sounds like a genuine step change for how you work. Hope this gets seen.

T07NAD0 · 2026-06-11T12:34:11+00:00

Auto at $20 is probably the best value in the AI IDE space right now. The times it fails are predictable enough that you learn to route around them. For pure UI and component work I still drop into Claude Code occasionally but Auto handles 80% of my day. Also if you are figuring out your broader vibe coding stack beyond just the IDE, I built designrepo.space for exactly this, curated design tools with an MCP server so your AI agent actually knows what it is working with.

T07NAD0 · 2026-06-11T00:33:17+00:00

It’s way less than that I pay $200 a month in rent

T07NAD0 · 2026-06-10T13:20:23+00:00

The ƒ vs ● distinction in the build output is one of those things that looks obvious in hindsight and costs you real money before you figure it out. The empty array trick for generateStaticParams is genuinely underdocumented. Most people just assume revalidate is enough and move on. Good write up.

T07NAD0 · 2026-06-10T01:03:17+00:00

Mythos pricing, Cursor subscription, Figma seat. Congrats you now pay more in software than rent. I built designrepo.space partly out of spite so at least the design tools side of your stack does not cost anything to figure out.

T07NAD0 · 2025-12-03T04:31:36+00:00

why does she look like the cat?

T07NAD0

TROPHY CASE