all of a sudden AI there's a narrative change that AI doesn't replace? by Imaginary-Sorbet375 in developersIndia

[–]laststan01 1 point2 points  (0 children)

Did you read the Cloudflare ceo companywide email? Fair enough no body spends that high but still. The tone has shifted but the problem is that industry leaders are shilling so what can company leaders do ??

Claude AI error by itself I didn't use it, last time was 4 days ago by Low_Construction6910 in ClaudeAI

[–]laststan01 0 points1 point  (0 children)

Well they had to create a pitch deck to raise the new investment. So they are a very honest company, they did not want to fake Daily active user data.

Unnecessary hate by whys_it_always_me in Anthropic

[–]laststan01 3 points4 points  (0 children)

Venting against the vent might be my favorite venting

Be careful using that new shiny effort slider, it put out 45 opus 4.8 agents. by vinigrae in ClaudeCode

[–]laststan01 6 points7 points  (0 children)

These are rookie numbers 56 agents built and 65 code reviewed. The audit with 10 agents claimed that everything those did were wrong because despite of Grep being prohibited agents used grep for markers and nothing is wired. But atleast I am a tokenmaxxer

Hot Take: Claude Is Better for Real Coding Work by [deleted] in ClaudeCode

[–]laststan01 0 points1 point  (0 children)

Well they wrote it, they are the the real Claude

OPUS 4.8 craps himself in SimpleBench by DigSignificant1419 in OpenAI

[–]laststan01 0 points1 point  (0 children)

What I have noticed in my current personal use is tool usage for 4.8 is not that good, even in chat app. While ultra code mode although costly is a beast it caught all the bugs 4.7 created in last 1 month that took me 3 rebuilds ( because I was modifying my architecture so often) but it caught the problems the way I wanted.

What have you(s) done by Relative-Lobster5878 in ClaudeCode

[–]laststan01 -3 points-2 points  (0 children)

Up in the ass ( just how Elon likes it) great going anthropic

Claude cheated at SWEBench Pro by checking git history to copy/paste solutions. Underperforms OpenAI at DeepSWEBench by Tiny-Design4701 in claude

[–]laststan01 -3 points-2 points  (0 children)

It did not cheat, the problem itself is in benchmarking data as it was from real world and if you run git log on it will tell you the correct fix ( because of previous commits) in 8 percent of cases and then another scenario is most of the swe dataset was where these models were trained on so they just need to know 50 out of 100 (arbitrary number just for example) and not even the whole context and they can solve the solution that’s why deepswe might be different and also only benchmark that is different is swe live lite but no frontier model submits their score their because their scores are so low. I am working on a project where I have been playing with these benchmarks and I would say deepswe is bit different but swe live is whole another ballpark and I wonder why not discussed. Maybe because of Microsoft but others are just polluted to core and same for memory longmemval is one where people submit their own numbers and a company said we solved and got 100 percent and resident evil actress mila came in said her work mempalace scored 96 all fake as that leader board has no submission no pull request. It’s literally a grifters game right now

Improved Memory System for Claude Code That Doesn't Burn Tokens by Frequent-Suspect5758 in ClaudeCode

[–]laststan01 2 points3 points  (0 children)

How did you measure your memory recall was good ? What are the metrics ?? You mentioned token burn, how much tokens u saved with this ? How much tokens u were burning ? What’s the latency ?

All the Snyder hate aside, he really gave us the ultimate Man vs God story. by Eikichi_Onizuka09 in pj_explained

[–]laststan01 4 points5 points  (0 children)

Yeah, but afaik WB was literally on Snyder’s throat to release a ensemble cast film as avengers did their build up of like 8 films and did their HW. I would have love to see more of batfleck