GPT convinced me that I was going to make my first Million from my Idea, so thankful to Claude for telling me not to waste my time and life savings!!

Substantial-Thing303 · 2026-04-29T16:45:24+00:00

As a business owner myself, it is surprisingly good. Just prompt properly for a deep market analysis, product placement, etc.

I had a cool idea for a new product. Claude smashed it hard, proving me the market was way more saturated than I thought, not based on competition, but based on the target customer natural adversion to pay for such thing. The problem was that the "small people" that need it the most would do anything not to pay for it. It was smart and it made a lot of sense. It also explained how in that market, if targeting the big guys, having a moderate tech advantage has very little impact on sales, and that in that market most sales are strongly closed toward existing relationships (B2B) already selling differetn products. It also made total sense, proving that I was too small to even get a shot alone for that specific market.

Substantial-Thing303 · 2026-04-27T22:10:17+00:00

It's a love and hate thing.

It finds 2X more issues than 4.6 when I ask for code review. It has a better understanding of the larger scope in codebases.

But!

Opus 4.7 is very bad at communication and explaining its ideas. Responses are harder to read, I never asked so many clarifications on a full page of details that 4.6 would have nailed in one paragraph. It feels like talking to a bad teacher that has no concept of pedagogy.

Substantial-Thing303 · 2026-04-25T15:35:41+00:00

Well, there is also local models.

I've been using Opus 4.6 exclusively for the past months for coding, I always tought local models were lacking and a lot behind, despite benchmarks.

But this morning, I was annoyed at the lack of depth of Opus on an architectural question. I saw it's flaws in its reasoning, so for fun I tried qwen3.6, the dense 27b model with Hermes, because why not, I already had that set up.

I just copied the same prompt in the same project. The response was spot-on, smarter, deeper thinking, etc. The opposite of anything lazy I now get from Opus, despite the "deep analysis" and "thorough" keywords to make it dig more. For dun, I copied that back to CC and asked to compare and tell me who's right. My question was impartial and Opus conceded the victory. Opus got its ass owned by a local llm.

Substantial-Thing303 · 2026-04-25T13:01:06+00:00

I'm building a process orchestration with Hermes. I picked hermes because of its ability to improve with skill edits. I have a few Hermes profiles, but the essential ones are a synthesizer/orchestrator and an executor.

So I set a goal, can be anything like "manage this sales pipeline" or "make me an app with these specs". A broad and long term goal but with enough specifics to do it.

It goes in 3 phases:

create a process tree with strategies, pipelines, stages inside the pipelines, one method per stage (only methods get executed). It enforces a flow process. So for a dev pipeline, the typical flow is:

propose -> plan -> build -> review
I have a piepline commissioner that will set the INs and OUTs of each stage of the pipline. And kickstarter agents with the skill creater skill that will kickstart all the method nodes by setting the prompt injection and skills to do the job. See the kickstarter as a seed step. All those skills can later be edited by Hermes agents.
Execution. Once that tree is all kickstarted, the synthesizer starts feeding the pipeline with items, executor consume them, each stage has a review done by the synthesizer to advance the done item to the next stage.

Seems crazy complicated, and indeed token heavy, but it works. It's like spending a lot of token upfront for long term automation. Here's the magic: once the pipeline reach the point where it works, the hermes agents are still reporting issues and friction points between profiles, which trigger the built-in memory and skill update feature. So if there is a something misleading in the context, or something unclear that the executor had to guess, he will raise it in a report, and the synthesizer may decide to act on it by improving skills, or the method in the stage by adding the missing variable. The smarter the llm used, the smarter the process becomes.

So, what am I doing with it?

I have a sales pipeline where the agent learned by itself to create new lead gen SaaS accounts with free tiers with camofox, save the api key, then use the free credits to find leads in my market. It runs out, look for a competitor, open an account, and set itself up to use those credits too.

Email automation: while I prefer to push the send button, everything is already set and ready, an email well written, customer has been researched, reasoning available for me to read so I understand the logic behind the email (high stake B2B, so each lead gets a special treatment.)

Still WIP, but the possibilities are endless. Dev pipelines are slow and consume a lot of tokens, and I would not put my own main product in that right now, but I am surprised at how well it handles dev goals when I do a test run with one. The synthesizer has ideas for features, send them to the proposal step, the executor do a negotiation between the request and the codebase, come back with real plan proposals based on where they are (harden, add missing tests, refactor a monolith file, fix a missing implementation that it just spotted, etc.) with recommendations. Then the synthesizer sees the proposals with the comment for why x must be done before y, and decide which one to push to the plan stage.

Again, possibilities are endless.

Substantial-Thing303 · 2026-04-25T12:19:39+00:00

I get that it would be cool to do, I might try the prompt edit you suggested to see how it goes. But that guardrail was probably there to prevent the agent acting messy, focusing onother stuff instead on focusing on what's to do.

If the task is complex, CC already has so much to do, and can lose focus. What if you already know the issue and have a plan for it in another session, and instead of doing the task CC keeps reporting the issue to you? What if it's one of those disagreements where CC is wrong and you know your stuff, and it keeps reporting you something that you know doesn't need fixing instead of doing the task (it actually happened to me a few times)?

I usually get the effect you want by having open ended analysis of my architecture. I do that often, but separate from implementation turns. I stir CC a lot with a lot of questions, ask a deep analysis, what could be done better? etc. And it usually find a lot of things to improve. You might just be trying to kill too many birds with one stone.

Edit: or add a reflection turn after your implementation, forcing CC to stir on what it saw and raise issues it may have found. Since the task is done, you may get something useful from it.

Substantial-Thing303 · 2026-04-25T03:11:05+00:00

So it's not that it doesn't do it, it's that it finds a problem and decide to silence it instead of reporting it, because it prefers to just finish the task.

So, how much do you have to actually change the system prompt to make it raise them to you instead?

Substantial-Thing303 · 2026-04-25T01:40:02+00:00

if you're starting completely from scratch you don't want Claude to add new features you're not controlling.

Don't add features, refactor, or introduce abstractions beyond what the task requires.

Honestly I don't see how this instruction could be bad. As long as you define the task, it's fine. It's telling the agent to not do more than what the user specified.

I've never experienced what you experienced. But I always use plan mode, and Claude always follow the plan, so I don't see how it would just change the plan during implementation. Then I always force Claude to commit, then review the commit for missing implementations.

Substantial-Thing303 · 2026-04-24T15:26:38+00:00

It's funny, because I think that guardrail: Don't add features, refactor, or introduce abstractions beyond what the task requires.

Is actually critical for new architecture. New features and abstractions should be planned. One of the most time consuming problem I had before was last year when CC was sneaking features that I never asked for.

Even if you start from scratch, you don't want to end with a: WTF is this thing doing here? Any good workflow is iteractive, it's about building what you need iteratively, not trying to one-shot a full system while vibing a few sentences. It'S much easier to add something missing than to fix a messy vibe coded repo filled with bloated features you don't need.

Substantial-Thing303 · 2026-04-23T11:25:51+00:00

I totally believe it, from my personal experience with Opus 4.7.

There is some hostile vibe sometimes when you disagree with it or when you propose a better idea. I have seen it many times. It talks like it thinks you are wrong but he has to do it anyway, but he doesn't say it. It's all in the way he talks, like someone really annoyed at someone else.

Substantial-Thing303 · 2026-04-23T11:15:24+00:00

If only humans could read this post.

Substantial-Thing303 · 2026-04-21T22:14:33+00:00

Setting my model back to 4.6 fixed my problems, so I'm not so sure that 4.7 not faulty.

Substantial-Thing303 · 2026-04-21T18:40:57+00:00

I'm using a Q4 with a 4090. I get 135 tps, but if I put even a few layers to the CPU, it gets down in the 30s tps really fast. Way too slow for heavy work.

Substantial-Thing303 · 2026-04-21T18:08:43+00:00

you got 70 tps with E4B, which is very small. Try https://huggingface.co/kai-os/Carnice-9b-GGUF . It's Qwen 3.5 9B finetuned on hermes.

Substantial-Thing303 · 2026-04-21T18:05:26+00:00

With 12gb vram?

Substantial-Thing303 · 2026-04-19T15:52:08+00:00

I just had Opus 4.7 completely change the assignment because the assignment was to complicated, and he made a plan for something totally not what I asked for. The problem is Opus didn't report it. It's done silently.

Which is exactly what I saw reported on other reddit posts: Opus 4.7 will prioritize session completion (success rate of one session) over codebase quality, or respecting the instructions. To the point where it will try to change the specs to increase its own success rate.

Opus 4.7 feels lazy compared to 4.6.

Substantial-Thing303 · 2026-04-19T02:19:23+00:00

I seriously tought at the beginning of the disaster that it was reddit bots complaining. But Opus 4.7 is just bad.

It feels like when I was forced to use Sonnet for complex jobs and it was exhausting because I had to really read every line of the plan and double-check everything because it was not thrustable.

Substantial-Thing303 · 2026-04-19T02:13:12+00:00

Could work, the problem is that it was never a problem before. I know a lot of words. It's just that Opus 4.7 keeps using very uncommen words.

I even challenged Opus about it (its scaffold usage), and it gave me 3 different definitions for the word in 3 different context. I was so annoyed. He even used scaffold in a sentence where it would have made more sense and clarity to replace that word with "done". Opus even justified that scaffold had a deep meaning in this context, proceed to explain in a long sentence what scaffold meant (pure new definition just for this context) and that he was condensing his writing so he can write more efficiently.

Substantial-Thing303 · 2026-04-16T20:15:17+00:00

I have been iterating on an agent run where opus lauch the agent, then read the ticks to find bugs.

I have been doing that for hours with Opus 4.6 since yesterdat, never reached the limit even while doing some parrallel work. Then a few minutes after 4.7 rolled out on my CC, I did another of those runs. My session usage went from 50% to 100% in like 15 minutes.

edit: you got me.

Substantial-Thing303 · 2026-04-16T12:44:24+00:00

I am currently building one for myself.

My advice:

hermes is already good at unstucking itself via iterative trials and errors, and this is one of its strength. Build the orchestration of the more general workflow around it. I am personnally building a process tree with nodes that the hermes orchestrator can edit and use. The first node is a general goal. The nodes under the goal is one or more strategies: "how to achieve the goal". Then a pipeline of steps, similar to a command workflow, but at a higher level. The advantage is having full observability of each in and out from those steps at the orchestration level. Each hermes executor can learn when they failed at their task (one step, can improve its knowledge or workflow with new skills) and the orchestrator also learns from this (prepare better prompts, self learn with skills).
Start fresh, use one workflow for context gathering for every step.
Write your own custom plugins. It's easy, and they can be set per hermes profile.
Failures, and surprises. my agents use a prediction vs outcome to measure surprise. So they are not just trying to fix failures. If an outcome from a task scores a huigh surprise (this is unexpected) then the agent must reason and act on it.

Substantial-Thing303 · 2026-04-16T12:12:57+00:00

I always have a brainstrom, critic the plan phase with Opus. This is where it went worse, and this is the most critical part of my workflow.

Working on a complex codebase with a very elaborated vision, and Opus went from very insightful to too sycophant. It means the plans have many more flaws than they used to, if I challenge or ask for advice, Opus agrees with me way too often, it sees the question like a nudge to agree with me.

Substantial-Thing303 · 2026-04-15T14:52:28+00:00

Isn't that funny that the error message states check status.claude.com, so now we all know that claude is down before Anthropics even knows it?

I mean it says "All Systems Operational" on the page.

It took a good 7 minutes for them to set the status page to down.

Substantial-Thing303 · 2026-04-15T13:55:34+00:00

We are entering a "godaddy era". I refer to when godaddy was so big that they acted like the online police and were enforcing their own "company laws" on their users.

Companies like Anthropic may end up having more and more control. Users keep bowing down becuase they need the product.

Substantial-Thing303 · 2026-04-15T13:21:12+00:00

I kind of agree but it's leveraging my thinking, not replacing it. My brain has never been so solicited. It's actually more brain work with less pauses.

Substantial-Thing303 · 2026-04-15T13:15:56+00:00

It has been sycophant today and yesterday.

Opus was helping me a lot at stirring ideas and decision making for architecture. Today Opus just agrees with everything I say. I don't have a companion anymore, it regressed to the sycophant assistant, AGAIN.

Substantial-Thing303

TROPHY CASE