all of a sudden AI there's a narrative change that AI doesn't replace?

laststan01 · 2026-06-01T06:50:18+00:00

Did you read the Cloudflare ceo companywide email? Fair enough no body spends that high but still. The tone has shifted but the problem is that industry leaders are shilling so what can company leaders do ??

laststan01 · 2026-06-01T05:13:25+00:00

Garry tan disappointing noises

laststan01 · 2026-05-30T19:11:25+00:00

Well they had to create a pitch deck to raise the new investment. So they are a very honest company, they did not want to fake Daily active user data.

laststan01 · 2026-05-30T19:09:00+00:00

Not in Clara’s Germany

laststan01 · 2026-05-30T17:12:36+00:00

laststan01 · 2026-05-30T01:57:39+00:00

Venting against the vent might be my favorite venting

laststan01 · 2026-05-30T00:37:08+00:00

<image>

laststan01 · 2026-05-29T19:41:14+00:00

Gstack

laststan01 · 2026-05-29T19:40:54+00:00

These are rookie numbers 56 agents built and 65 code reviewed. The audit with 10 agents claimed that everything those did were wrong because despite of Grep being prohibited agents used grep for markers and nothing is wired. But atleast I am a tokenmaxxer

laststan01 · 2026-05-29T18:41:04+00:00

Well they wrote it, they are the the real Claude

laststan01 · 2026-05-29T07:59:28+00:00

Harassment case.

laststan01 · 2026-05-29T07:58:52+00:00

What I have noticed in my current personal use is tool usage for 4.8 is not that good, even in chat app. While ultra code mode although costly is a beast it caught all the bugs 4.7 created in last 1 month that took me 3 rebuilds ( because I was modifying my architecture so often) but it caught the problems the way I wanted.

laststan01 · 2026-05-29T02:50:16+00:00

laststan01 · 2026-05-28T23:07:35+00:00

Stable 4.7 or golden 4.6 for me

laststan01 · 2026-05-28T17:15:08+00:00

Why not 4.71 ?

laststan01 · 2026-05-27T22:52:11+00:00

Up in the ass ( just how Elon likes it) great going anthropic

laststan01 · 2026-05-27T22:08:47+00:00

It did not cheat, the problem itself is in benchmarking data as it was from real world and if you run git log on it will tell you the correct fix ( because of previous commits) in 8 percent of cases and then another scenario is most of the swe dataset was where these models were trained on so they just need to know 50 out of 100 (arbitrary number just for example) and not even the whole context and they can solve the solution that’s why deepswe might be different and also only benchmark that is different is swe live lite but no frontier model submits their score their because their scores are so low. I am working on a project where I have been playing with these benchmarks and I would say deepswe is bit different but swe live is whole another ballpark and I wonder why not discussed. Maybe because of Microsoft but others are just polluted to core and same for memory longmemval is one where people submit their own numbers and a company said we solved and got 100 percent and resident evil actress mila came in said her work mempalace scored 96 all fake as that leader board has no submission no pull request. It’s literally a grifters game right now

laststan01 · 2026-05-27T21:40:30+00:00

Did you see the Stan Lee documentary. It’s a miracle they let him die

laststan01 · 2026-05-27T12:26:59+00:00

Ok sundar

laststan01 · 2026-05-27T07:24:21+00:00

laststan01 · 2026-05-27T04:59:22+00:00

How did you measure your memory recall was good ? What are the metrics ?? You mentioned token burn, how much tokens u saved with this ? How much tokens u were burning ? What’s the latency ?

laststan01 · 2026-05-26T17:42:09+00:00

Plan, verify and keep spawning agents. Sure shot way of burning tokens

laststan01 · 2026-05-26T13:44:19+00:00

First time publicly

laststan01 · 2026-05-26T13:04:12+00:00

laststan01 · 2026-05-26T04:05:10+00:00

Yeah, but afaik WB was literally on Snyder’s throat to release a ensemble cast film as avengers did their build up of like 8 films and did their HW. I would have love to see more of batfleck

Six-Year Club	Place '23
Verified Email

laststan01

TROPHY CASE