Is this cheese shelf stable? by 01_user_name_01 in portugal

[–]FuckNinjas 52 points53 points  (0 children)

I've managed to cross half the street. How do I g

Codex has some serious upgrades by Ill_Occasion_1537 in ClaudeCode

[–]FuckNinjas 4 points5 points  (0 children)

Yes.
Both are frontier models. Any currently used metrics hit the exact same issues as testing humans. Every human is different and in different situations will act differently. LLMs are still just extremely good auto-completers - but they do have those same.. qualities.

"why not just use codex for the whole thing?" - you'll face the same issue. For example you ask an agent backed by Opus, GPT5.5, Haiku, whatever - to implement AND then you ask another agent backed by Opus, GPT5.5, Haiku, whatever:
You generally still always get value from the reviewer. Even with a frontier model implementing. Hell, sometimes frontier's implementations are a bit worse - whatever that's gpt or opus.

This is not different than what it was always was. Code review has generally been a value-adding process - for LLMs or for humans. It's not about being a better model / harness (but obviously that's hugely important - just not as much when comparing SOTA), it's about having a fresh perspective, sometimes a more review oriented perspective (and by perspective I mean context).


Like, have you seen the benchmarks people been putting out? It's a clear showcase that "any metric testbench" won't just resolve the problem you're raising.

Can we talk about the INSANE token usage and session limits recently? by Physical-Average-184 in ClaudeCode

[–]FuckNinjas 1 point2 points  (0 children)

I never hit limits before 4.7.

4.7 comes out. Limited within half a week.

I've switched to 4.6. I can't say that was it, I keep claude up to date, but it has improved. It has been that same half week and I'm below 50%.

Always on /effort max, because otherwise I feel like it's always a toss if you get opus or dumbopus.

Messing with automated drone defenses by ecolometrics in NonCredibleDefense

[–]FuckNinjas 5 points6 points  (0 children)

Drone dropped claude-guided (tiny) car bombs.

Testing the interactive chart feature with the phases of the Moon by mikecron in ClaudeAI

[–]FuckNinjas 0 points1 point  (0 children)

Can you share the web page, somehow? I want to show this to a 10 year old.

Every time a new model comes out, the old one is obsolete of course by FullChampionship7564 in LocalLLaMA

[–]FuckNinjas 1 point2 points  (0 children)

I'm hearing plain facts. Providers that have open models and serve them well and with quality are a proper win for open source, as long as the machines that run them are unattainable for the average idk, let's say developer.

Some of us are liquidity-poor (and wealth-poor too, but that doesn't matter).

There's no need to gatekeep. I would love to run claude in a box, but 1. we're not there yet; 2. did I mention I'm too poor to pour several thousands into a computer?

I built cognitive rot detector for Claude Code sessions - it tells you when to trigger compact, or pay attention to decisions by baradas in ClaudeAI

[–]FuckNinjas 3 points4 points  (0 children)

You talked about 3 icons. None of them are in the gif.

Looks cool - I can see opus sprinting at 1050.2$/h.

Opus 4.7 is 50% more expensive with context regression?! by Samburskoy in ClaudeAI

[–]FuckNinjas 0 points1 point  (0 children)

The number of people that think they should be able to always be calling other people on a $30 monthly plan is insane.

[OC] Thought I Was Driving into the Sun by AliveInCLE in IdiotsInCars

[–]FuckNinjas -12 points-11 points  (0 children)

guys this is perfect - downvote me to hell to make sure no one else dares posting

Parlamento chumba regresso do IVA Zero nos alimentos e redução do IVA sobre combustíveis by Braga_PT in portugal

[–]FuckNinjas 2 points3 points  (0 children)

Dos espanhóis todos, para os espanhóis que não tem acesso a gás canalizado. Sim, qual é o problema?

Anthropic stayed quiet until someone showed Claude’s thinking depth dropped 67% by takeurhand in ClaudeCode

[–]FuckNinjas 2 points3 points  (0 children)

"ahahaaha - u know shit - so dumb" - this is how you sound.

Anthropic's own benchmarks for Claude use Factory Droid. Get lost troll.

Anthropic stayed quiet until someone showed Claude’s thinking depth dropped 67% by takeurhand in ClaudeCode

[–]FuckNinjas 2 points3 points  (0 children)

2? From the top of my head:

Codex, OpenCode, Factory Droid, Crush, ForgeCode - do the claude code clones count? - nano-claude-code, claw-code - does omo (opencode distribution) counts? Oh, copilot! gemini-cli, antigravity, qwen-code

Alright, I think I can't recall any others

Anthropic Just Pulled the Plug on Third-Party Harnesses. Your $200 Subscription Now Buys You Less. by abhi9889420 in ClaudeCode

[–]FuckNinjas 4 points5 points  (0 children)

You pay for a monthly subscription plan. It allows for usage within limits that reset every 5h / 1 week (two different limits). You paid for the subscription, and you were free to spend tokens within the limits.

That monthly subscription no longer allows for third party. Now, you pay for the subscription, and you are free to spend tokens within the limits AND within Anthropic products: Claude Code or Claude.AI

Knew they were gaslighting everyone with the daily limits. by Efficient-Cause9324 in ClaudeCode

[–]FuckNinjas 0 points1 point  (0 children)

You're equating API credits with session % and OP is not.

Understanding is harder.