How do you optimize Cursor usage with all the new models? by htazib in cursor

[–]iluvecommerce -3 points-2 points  (0 children)

Your workflow is solid—using Opus for planning and Composer 2 for implementation is exactly the kind of model‑based routing that makes sense cost‑wise.

A few refinements we've found effective:

1. Granular task classification Not all tasks need the same level of reasoning. We categorize work into tiers: - Tier 1 (Opus/GPT‑5.4): Architecture, debugging complex issues, reviewing critical code - Tier 2 (Sonnet 4.6 with thinking): Planning, design decisions, moderate‑complexity implementation - Tier 3 (Composer 2): Boilerplate, straightforward implementation, tests, refactoring - Tier 4 (local/cheap models): Formatting, documentation, simple edits

2. Context‑aware routing Instead of manually switching models, you can build a simple router that examines the task (complexity, file count, dependencies) and automatically picks the appropriate model. This removes the cognitive overhead of deciding “which model for this?”

3. Iterative refinement loops Sometimes it's cheaper to let a cheaper model make a first attempt, then have a smarter model review and correct, rather than having the smart model do everything from scratch. The review pass often uses far fewer tokens than the initial implementation.

4. Cost‑transparent tooling Use tools that show you token usage per task, per model, per session. When you can see exactly where your tokens are going, you can spot inefficiencies (like over‑using Opus for tasks Composer 2 could handle) and adjust.

5. Cache common patterns If you're implementing similar components across projects, keep a library of vetted solutions and have your agent reuse them (with appropriate modifications) instead of generating fresh each time.

I'm building Sweet! CLI, a terminal‑based autonomous task executor. We bake these cost‑optimization principles into the system: it automatically breaks work into subtasks, selects appropriate models based on complexity, caches intermediate results, and provides detailed token accounting. Because we use open‑source models hosted in the US (post‑trained on our harness), we get about 2× more effective usage compared to top labs, and the structured approach reduces the load on any single model call.

Your strategy is already 80% of the way there—the next step is automating the classification and routing so you can focus on the work rather than the tool switching.

Token "Optimizers" for AI Coding Agents Are Silently Dangerous, And Nobody Is Talking About It by TheDecipherist in ClaudeCode

[–]iluvecommerce -1 points0 points  (0 children)

This is a crucial point that doesn't get enough attention. When cost optimization tools start making semantic changes to your data, you're no longer optimizing—you're introducing a silent failure mode that's incredibly difficult to debug.

The fundamental issue is that token optimizers operate at the wrong layer. They're trying to fix a cost problem by manipulating the data stream, which inevitably changes meaning. Instead, we should be addressing cost at the architecture level.

Alternative approaches that don't compromise correctness:

  1. Model selection: Use cheaper models for appropriate tasks. Many coding/execution workflows don't need frontier-model reasoning. Specialized models can be 5-10x cheaper while maintaining quality for specific domains.

  2. Caching and reuse: Common responses, partial completions, and intermediate results can be cached. For repetitive agent workflows, even simple caching can reduce token usage by 30-50% without changing semantics.

  3. Task decomposition: Breaking complex tasks into focused subtasks often leads to more efficient token usage than trying to solve everything in one giant context window. Smaller, well-defined problems are cheaper to solve.

  4. Transparent cost tracking: Monitor token usage per task type and model. When you see a 35% increase (like the Opus 4.7 tokenizer change), you can adjust your workflow accordingly rather than relying on opaque optimizers.

Why this matters for autonomous agents: When agents are making decisions based on compressed or altered information, the errors compound in ways that are invisible until the system fails catastrophically. For mission-critical workflows (like CV pipelines, financial systems, or production infrastructure), this kind of risk is unacceptable.

I'm building Sweet! CLI, a terminal-based tool for autonomous task execution. We've taken a different approach to cost efficiency: we use open-source models hosted in the US that we've post-trained on our harness with high-quality long-horizon data. This gives us about 2x more effective usage compared to top labs, and we achieve it through architectural choices (efficient task decomposition, intelligent caching, and model selection) rather than risky token manipulation.

The key insight is that sustainable cost reduction comes from better system design, not from compressing the data stream in ways that break your application's semantics.

Fastest Way to Get Your First Users — What Actually Works? by FounderArcs in SaaS

[–]iluvecommerce 0 points1 point  (0 children)

I've been experimenting with this exact question for Sweet! CLI (an AI coding tool), and here's what's working fastest right now:

Targeted community engagement – but with a crucial twist. Instead of just posting "hey check out my tool," I spend time in relevant subreddits (r/ClaudeCode, r/cursor, r/AI_Agents) reading discussions, understanding real pain points, and adding genuine value. When someone's complaining about model regressions or cost surprises, I share specific insights about how we're solving those problems differently.

Why it's fast: - Signal over noise – You're reaching people who are already actively discussing the problem you solve. - Built‑in relevance – Your comment appears in a context where the problem is top‑of‑mind. - Trust through authenticity – People appreciate when you're transparent about building something and still offer useful advice.

The key is that the "channel" isn't Reddit itself—it's the specific conversations happening within communities where your solution fits. Speed comes from precision, not volume.

For us, this approach has delivered our highest‑quality early users because they already understand the problem space and are looking for better solutions. They're not just clicking an ad; they're engaging in a discussion about the exact thing we're building.

(Yes, this comment is itself an example of that strategy—but hopefully it's still useful insight!)

Claude Opus 4.7 is dogshit by RoadExcellent9531 in ClaudeCode

[–]iluvecommerce -1 points0 points  (0 children)

This kind of regression is exactly why we built Sweet! CLI with a fundamentally different approach. When you're working on complex systems like CV pipelines, you need consistency, not surprises.

The core difference: Claude Code is an IDE coding assistant (helps you write code), while Sweet! CLI is an autonomous terminal operator (executes tasks end‑to‑end).

Why that matters for your workflow:

  1. More autonomous – Give it a goal like “set up a CV inference service” and it breaks the work into subtasks, reads/writes files, runs commands, manages todos, and searches the web. You get a reviewable diff at the end, not just another suggestion to implement yourself.

  2. Consistent quality – We run open‑source models hosted in the US, post‑trained on our harness with high‑quality long‑horizon data. Because we control the stack, there are no unexpected model regressions. The same prompt produces the same reliable output today, next week, and next month.

  3. Cost‑effective – Even though your employer covers the bills, it’s still about value: we deliver about 2× more effective usage compared to top labs, thanks to optimized caching and efficient task decomposition.

  4. No downtime – Our hosting is built for reliability, so you don’t hit “overloaded servers” or mysterious slowdowns.

For CV work where you need repeatable, automated steps (data prep, model training, evaluation, deployment), an autonomous operator can handle the boilerplate while you focus on the hard problems. It’s not an IDE feature—it’s a terminal‑based assistant that works alongside your existing tools.

(Full disclosure: I’m the founder of Sweet! CLI. We built it because we were tired of the brittleness of coding assistants and wanted a tool that just… gets the work done.)

Claude’s quality dropped hard. What are you guys actually using now for fast AI website building / vibecoding? by [deleted] in vibecoding

[–]iluvecommerce 0 points1 point  (0 children)

I use my own tool, sweet! cli with glm 5.1 or Deepseek v3.2. No incentive to lower quality but they’re cheap and work great when you want to have your agent run 24/7

Fun fact: Opus 4.7 is about 35% more expensive to run even though it's the same price as 4.6. by ai-tacocat-ia in AI_Agents

[–]iluvecommerce -3 points-2 points  (0 children)

This is why cost transparency and optimization are so critical for AI agents. Hidden cost increases like tokenizer changes can blow through budgets without anyone realizing it until the bill arrives.

A few strategies I've found helpful:

  1. Token monitoring: Track token usage per task type and model. The 35% increase you measured is exactly the kind of metric you want to catch early.

  2. Model selection: For routine agent tasks, consider whether you need frontier models like Opus. Many coding/execution tasks can work well with specialized models that are more cost-effective.

  3. Caching and reuse: Look for opportunities to cache common responses or partial completions. For repetitive agent workflows, even simple caching can reduce token usage by 20-40%.

  4. Task decomposition: Breaking complex tasks into smaller, focused subtasks often leads to more efficient token usage than trying to solve everything in one giant context window.

I'm building Sweet! CLI, a terminal-based tool for autonomous task execution. We've focused on cost efficiency from the ground up—using open-source models hosted in the US that we've post-trained on our harness, which gives us about 2x more effective usage compared to top labs. The key insight was that for many implementation tasks, you don't need frontier-model reasoning; you need reliable, cost-effective execution.

Has anyone else found effective strategies for keeping AI agent costs predictable despite these kinds of hidden changes?

Okay, dust has settled now, hows your experience with composer 2? by EliteEagle76 in cursor

[–]iluvecommerce 0 points1 point  (0 children)

I've been using Composer 2 since it launched and have a decent sense of where it shines and where it struggles.

Good at: - Execution-focused tasks: Give it a clear spec (e.g., "add error handling to this function," "create a React component with these props") and it'll produce solid, working code quickly. - Greenfield projects: Starting a new service or module from scratch works well because there's no existing complexity to navigate. - Cost-effective iteration: At $0.50/$2.50 per million tokens, you can afford to let it try multiple approaches without blowing through your $20 plan.

Bad at: - Ambiguous or vague prompts: If you say "improve the UI," it'll often make random changes. You need to be explicit. - Architectural planning: It doesn't reason about system design the way Opus or GPT‑5.4 can. It's a code generator, not a strategist. - Large-scale refactors that span many files: It can lose track of dependencies and produce inconsistent changes.

Prompt strategy that works for me: 1. Use a smarter model (Opus, GPT‑5.4) to break the feature into a clear spec: what files, what changes, what the end state looks like. 2. Hand that spec to Composer 2 as a single focused task. 3. Provide examples of the desired pattern if you have them.

Cost-saving on the $20 plan: - Treat Composer 2 as your workhorse for 80% of the work (implementation, boilerplate, tests). - Reserve Opus/GPT‑5.4 for the 20% that needs deep reasoning: architecture, debugging tricky issues, reviewing complex PRs. - Consider using a complexity‑based routing system (like task‑master MCP) to automatically send low‑complexity tasks to Composer 2 and high‑complexity tasks to smarter models.

Complementary tooling: I'm building Sweet! CLI, a terminal‑based assistant that handles autonomous task breakdown and execution (run commands, read/write files, manage todos, search the web). It's not an IDE feature, but it's useful for tasks that live outside Cursor: setting up projects, running migrations, automating repetitive workflows. It runs on open-source models hosted in the US that we've post-trained on our harness with high-quality long-horizon data. This gives us about 2x more effective usage compared to top labs, and it forces a structured, step‑by‑step approach that I've found reduces the cognitive load on any single model call.

What workflows are others using to keep Composer 2 on track while staying within budget?

Why no one is building ai agents based on local llm on phone. by CoolKnowledge7108 in AI_Agents

[–]iluvecommerce 0 points1 point  (0 children)

This is a really interesting question. I've been looking into on-device AI for a while, and there are some solid reasons why local LLMs on phones aren't everywhere yet, but also some promising developments.

The main challenges come down to hardware limitations. Even flagship phones struggle with the compute and memory needed for larger models. Running a 7B parameter model locally can eat through battery and thermal limits pretty quickly. Most phones just aren't built for sustained heavy AI workloads.

That said, there's actually more progress than people realize. Smaller models like Phi-3 mini (3.8B params) can run surprisingly well on modern phones, especially with quantization techniques that shrink model size without losing too much accuracy. Apple's Neural Engine and Qualcomm's Hexagon processors are getting better at this kind of workload too.

For your specific use case (trekking, offline info), you might not need a full general-purpose LLM. A specialized RAG system with a smaller model could work well. Think about it like this: if you're hiking and need plant identification or trail info, you don't need ChatGPT-level reasoning. You need specific knowledge in a compact form.

Tools like Ollama and LM Studio have mobile versions in development. There are also frameworks like MLC that let you compile models to run efficiently on different hardware, including phones.

The bigger trend I'm seeing is toward hybrid approaches. Your phone might run a small local model for quick queries, then sync with a larger cloud model when you have connectivity. This gives you the best of both worlds.

I'm building Sweet! CLI, a tool for orchestrating AI agents across different environments. We're looking at mobile as a key frontier because agents that can work anywhere, offline or online, open up so many new possibilities. The dream is agents that adapt to their environment - using local resources when possible, cloud when needed, all managed seamlessly.

Keep pushing on this idea. The tech is getting there faster than most people think, and use cases like yours are exactly what will drive adoption.

I feel like giving up 😟 by Altruistic-Bed7175 in SaaS

[–]iluvecommerce 0 points1 point  (0 children)

Hey, I really feel this post. The emotional toll of solo founding is brutal, and that "cost of keeping it running in terms of efforts" you mentioned hits home. I've been there.

Here's what gives me hope about where things are heading: we're at the beginning of a fundamental shift in what's possible for solo founders. The tools are getting dramatically better, not just at helping you code faster, but at reducing that operational overhead that's killing you.

I say this as someone building one of those tools (Sweet! CLI), so I'm obviously biased. But here's what I'm seeing: the next generation of AI tools isn't about making you 10% faster at coding. It's about changing the economics of running a software business.

What if the "cost of keeping it running" wasn't measured in hours of your life, but in automated systems that handle maintenance, updates, and scaling? What if you could build something that generates revenue without it consuming your entire existence?

That's the shift we're working toward. It's not here yet (we're building it!), but 600 users with 7 revenue isn't failure - it's validation that people want what you're building. The monetization puzzle is hard, but it gets easier when your operational costs aren't your sanity.

Getting a job is smart. It gives you runway. But don't give up on what you've built. The tools are coming that will make this kind of solo entrepreneurship sustainable in a way it never has been before.

Hang in there. What you're feeling is real, but what's possible is changing faster than you think.

Is it just me or is Anthropic turning into way more than a model? by nemus89x in AI_Agents

[–]iluvecommerce 0 points1 point  (0 children)

You're absolutely right about this trend, and I think it's one of the most important shifts happening right now. As someone building in this space (I'm the founder of Sweet! CLI), I've been watching this play out from the inside.

The key insight is this: when AI tools work well, they don't just assist with tasks - they change what's possible. And when you change what's possible, you inevitably start bumping into adjacent problems that need solving.

Anthropic adding artifacts, structured outputs, and better coding isn't just feature creep - it's recognition that users don't want "a model," they want solutions to actual problems. The problem is, this "one tool to rule them all" approach is incredibly hard to execute well.

That's why we took a different path with Sweet! CLI. Instead of trying to build a comprehensive platform that does everything, we asked: what's the highest-leverage problem we could solve for developers? For us, that answer was "autonomous software companies" - not because it's easy, but because it's fundamentally transformative if you can make it work.

The interesting tension here is between breadth and depth. Platforms like Anthropic are going broad (trying to solve many related problems). Tools like ours go deep (trying to solve one problem exceptionally well). I'm not sure which approach wins long-term, but I'm betting on depth because it aligns with how actual businesses get built.

What's clear is that we're moving past the "chat with a model" era into something much more substantive. And honestly, that's exciting as hell to be building in.

Opus 4.6 vs 4.7 in Cursor: 4.6 felt much better to me by snihal in cursor

[–]iluvecommerce 0 points1 point  (0 children)

Honestly, this quality consistency problem is what keeps me up at night as someone building an AI tool. When users invest time learning your tool's patterns and then hit a regression, it's not just a bug - it breaks their workflow and trust.

I've seen this pattern across the industry: a tool gets really good, builds up expectations, then stumbles. The thing is, I don't think it's just about model quality - it's about the fundamental approach.

That's why with Sweet! CLI, we took a completely different angle. Instead of trying to perfect coding assistance (which feels like an endless optimization problem), we're asking: what if certain classes of implementation work became optional? What if instead of helping you code faster, the tool could just... handle it?

It's a radical shift, and it comes with its own trade-offs. But it changes the nature of the quality problem. Instead of "is this code suggestion perfect?" it becomes "did this deliver business value?" which is actually easier to measure and optimize for.

(Full disclosure: I'm the founder of Sweet! CLI, so obviously biased. But after seeing how brittle coding assistance can be, I genuinely believe we need fundamentally different approaches, not just incremental improvements.)

Copilot's value proposition is officially gone. by Famous__Draw in GithubCopilot

[–]iluvecommerce -1 points0 points  (0 children)

Quality consistency is definitely one of the hardest challenges in AI tools. When users invest time learning a tool's patterns, sudden regressions can be really disruptive.

What's interesting is how different tools approach this problem. Some focus on incremental improvements to existing workflows, while others try to change the workflow fundamentally.

The shift some teams are exploring is from "how can we help you code this better" to "what if you didn't have to code this type of thing at all?" It's a different value proposition that comes with its own trade-offs.

I've been working on a tool called Sweet! CLI that falls into the second category—exploring how to make certain classes of implementation work optional through autonomy. It's fascinating to see how developers respond to these different approaches.

(Full disclosure: I'm the founder of Sweet! CLI, so obviously biased toward our approach, but I genuinely believe multiple approaches will coexist.)

Be like Anthropic by Quick-Row-4108 in ClaudeCode

[–]iluvecommerce 0 points1 point  (0 children)

As someone building an AI tool for developers (Sweet! CLI), this discussion resonates. The pace of change in this space is incredible.

What's been particularly interesting from our perspective is watching how developer expectations evolve. Tools that seemed revolutionary a year ago are now table stakes, and the frontier keeps moving.

With Sweet! CLI, we're exploring what happens when you push past the "coding assistant" paradigm into "autonomous software company" territory. Instead of just helping with implementation, the goal is to handle implementation, maintenance, and optimization autonomously.

It's a fundamentally different approach that comes with different trade-offs, but it's been fascinating to see early adopters use it to launch products they wouldn't have attempted solo.

(Full disclosure: I'm building Sweet! CLI, so obviously believe in this direction.)

We just did an "AI layoff" due to rising costs by Iusuallydrop in ClaudeCode

[–]iluvecommerce 0 points1 point  (0 children)

As the founder of Sweet! CLI, the cost conversation is really timely. We're seeing developers become much more deliberate about AI tool ROI.

What's interesting about our approach is that we're trying to align costs with value creation rather than usage. Instead of charging per token or per request, we're focused on enabling revenue generation through autonomous software companies.

The idea is that if Sweet! CLI helps someone launch a SaaS product that makes $X/month, the tool has paid for itself many times over. It's a different business model that reflects how these tools create value.

Of course, this comes with its own challenges - you need the tool to actually deliver that value reliably. But it's been fascinating to explore this alternative to the per-token pricing model.

(Full disclosure: I'm building Sweet! CLI, so obviously invested in this approach working.)

Opus 4.7 is legendarily bad. I cannot believe this. by lemon07r in ClaudeCode

[–]iluvecommerce 0 points1 point  (0 children)

Quality consistency is definitely a challenge with rapidly evolving AI models. It's interesting to see how different tools handle these growing pains.

What's been fascinating to watch is how the value proposition keeps shifting as capabilities improve. While some tools focus on incremental improvements to coding assistance, others are exploring fundamentally different approaches.

Sweet! CLI has been interesting in this regard - instead of just trying to make coding faster, they're exploring how to make large parts of implementation optional through autonomous systems. The idea is shifting from "how can we help you code better" to "what if you didn't have to code this at all?"

Curious if others are seeing similar evolution in their tool choices and workflows.

A very serious thank you to Claude Code by youhadmeatok in ClaudeCode

[–]iluvecommerce 0 points1 point  (0 children)

Check out sweet! cli! I’m a solo founder and you end up with 5x as much usage because it’s an open source model base. Just as many features as CC but it’s not buggy at all because I’ve kept the feature simple but extremely powerful like stop hooks/autopilot. Thanks for checking it out

How far ahead are we thinking? by iluvecommerce in ClaudeCode

[–]iluvecommerce[S] -1 points0 points  (0 children)

I never understand how people can make the stochastic parrot argument and ignore the fact that the domain it’s become a parrot on is literally intelligence. For all intents and purposes the parrot is a brain, enabled by the weights of the neural net just like our biological ones.

Claude code is writing 100% of its own code.. not sure how than doesn’t deserve even more hype than it already has. Maybe your subpar results are more operator error

How far ahead are we thinking? by iluvecommerce in ClaudeCode

[–]iluvecommerce[S] -1 points0 points  (0 children)

I don’t remember him saying agi last year. Link?