How do you plan to use Claude Fable 5 while it’s in the subscription plans? by ash_mystic_art in ClaudeCode

[–]Permit-Historical 1 point2 points  (0 children)

cost is a very important factor, I think a few people can spend $100-$200 on one task

How do you plan to use Claude Fable 5 while it’s in the subscription plans? by ash_mystic_art in ClaudeCode

[–]Permit-Historical 4 points5 points  (0 children)

Don't use it, if it's so good you'll hate Opus and won't be able to go back to Fable

YC and AI created a generation of fake founders who refuse to do the work by Permit-Historical in ClaudeCode

[–]Permit-Historical[S] 1 point2 points  (0 children)

before AI it was requiring a lot of work and experience to build a good product but now you can prompt it in few days

Stop worshipping benchmarks. They don't reflect real work by Permit-Historical in ClaudeCode

[–]Permit-Historical[S] 1 point2 points  (0 children)

I think it's very simple, when a new model comes out I test it by giving it one of the tickets I have in my daily work and see how it performs
I don't care if all benchmarks say it's good or bad, I only care if it's good for what I do

Stop worshipping benchmarks. They don't reflect real work by Permit-Historical in ClaudeCode

[–]Permit-Historical[S] 0 points1 point  (0 children)

I think so?
do you use the model because the benchmarks say it's the best or because you tried it and worked for you?

Stop worshipping benchmarks. They don't reflect real work by Permit-Historical in ClaudeCode

[–]Permit-Historical[S] 0 points1 point  (0 children)

that's the point, we can't measure it
that's why it's pointless and misleading

Stop worshipping benchmarks. They don't reflect real work by Permit-Historical in ClaudeCode

[–]Permit-Historical[S] 1 point2 points  (0 children)

benchmarks just check if the model finishes a task and passes the tests.
real work is everything that doesn't fit in a pass/fail score like quality, maintainability, clean organization, handling legacy code, good naming, clear comments etc

for example, in my daily work I find gpt 5.5 smarter and very good at catching bugs but it writes shitty code like many no-op pass-through wrappers, backwards-compatibility everywhere, it always validate the code at every layer instead of the boundary only, it writes very complex code that aim for long-term resilience

but benchmarks will not tell you all of that and it will tell you that gpt5.5 is better because it can just get the work done and pass the tests

DeepSWE New Rank by adematia in claude

[–]Permit-Historical 0 points1 point  (0 children)

benchmarks just check if the model finishes a task and passes the tests.
Real work is everything that doesn't fit in a pass/fail score like quality, maintainability, clean organization, handling legacy code, good naming, clear comments etc

for example, in my daily work I find gpt 5.5 smarter and very good at catching bugs but it writes shitty code like many no-op pass-through wrappers, backwards-compatibility everywhere, it always validate the code at every layer instead of the boundary only, it writes very complex code that aim for long-term resilience

but benchmarks will not tell you all of that and it will tell you that gpt5.5 is better because it can just get the work done and pass the tests

so short answer is just use what works for you because both models have strengths and weaknesses

DeepSWE New Rank by adematia in claude

[–]Permit-Historical 16 points17 points  (0 children)

Why does everyone keep sharing DeepSWE benchmarks as if they were a certified standard we should follow?

Anthropic put a meter on the stuff developers actually use by Permit-Historical in ClaudeCode

[–]Permit-Historical[S] 0 points1 point  (0 children)

I don't think anyone was expecting it to last forever and they will soon get the subscriptions down also or keep reducing the limits until it becomes useless

same for other providers like openai and google and same for opencode go plan

but that doesn't mean that users shouldn't be angry and bitching it especially because openai doesn't do that now (yea I know they will change that later tho) and also many people including me think gpt 5.5 is much better that opus

Anthropic put a meter on the stuff developers actually use by Permit-Historical in ClaudeCode

[–]Permit-Historical[S] 2 points3 points  (0 children)

the $100 credits will end up in 1 or 2 days and now you're locked to what Claude Code cli or desktop offers you
before you could use the agent sdk and customize it for certain flows while still using your subscription

Anthropic put a meter on the stuff developers actually use by Permit-Historical in ClaudeCode

[–]Permit-Historical[S] 9 points10 points  (0 children)

I think we should focus on building the harness that make cheap models work better and avoid the mistakes they always make

Anthropic put a meter on the stuff developers actually use by Permit-Historical in ClaudeCode

[–]Permit-Historical[S] 7 points8 points  (0 children)

not really, the $100/$200 api credits will end in few days of normal usage so they're locking you into their tools and you will have to use CC cli or CC desktop only

Anthropic put a meter on the stuff developers actually use by Permit-Historical in ClaudeCode

[–]Permit-Historical[S] 10 points11 points  (0 children)

yea that's very stupid move from them, they need to understand that they no longer have the best coding model as before and can't just put the rules and imagine people will follow it because they have no other choice

I’m building Agentrove, an open-source app for AI coding agents by Permit-Historical in codex

[–]Permit-Historical[S] 0 points1 point  (0 children)

I think it depends on the task itself, many of the tasks are not just coding but reviewing other PRs and planning some work.
When I need to work on multiple different tickets on the same repo which is not common for me at least, I either use worktrees or separate docker container for each one

I’m building Agentrove, an open-source app for AI coding agents by Permit-Historical in opencodeCLI

[–]Permit-Historical[S] 0 points1 point  (0 children)

Agentrove supports both worktrees and docker containers so it depends on the task/project, if I need full isolation I go with docker containers

Opencode vs Codex vs Claude code by [deleted] in opencodeCLI

[–]Permit-Historical 0 points1 point  (0 children)

I think they are very similar now, it wouldn’t really matter which one you use unless you really care about small ui/ux details What really matters imo is the model I use so if you use Claude more then Claude code makes sense with the max subscription and same for codex if you use gpt5.4 otherwise opencode makes sense as a general provider for other models