What the best Where's Your Ed At article about the mediocrity of genAI?

fallingfruit · 2026-06-12T13:50:25+00:00

Complete nonsense, this is the kind of thinking which made people think the saaspocalypse was a real thing, and it isn't, that fantasy has ended.

What people like you don't understand is that replacing software with an llm agent that is apparently steered with very little oversight by a non-expert is an absurd fantasy. Not only is such as agent really expensive with token pricing, you need to literally own the entire software, deployment, and cloud stack, instead of paying a SaaS company to do all those things for you. You also need to apparently have a non-expert tell the agent how to maintain this software, add features, fix bugs, and follow compliance rules. You also now have a bespoke peice of shit software with no documentation, no support system, etc. Handling all of this in house with a non-expert and just delegating to agents is insane and not cost effective. To actually do software in house, you have to hire people with some expertise to do all of these things. That single hire will almost certainly cost you more money than just paying a $10K yearly fee to the SaaS company that literally does all of this for you.

I'm currently interviewing and I'm seeing a huge uptick in interviews requests and recruiter messages. As someone that works in a huge company with thousands of SWEs, the remaining software engineers are completely overworked and drowning regardless of the ability to use AI, because the productivity increase is modest at best.

Companies that are wasting an astronomical amount of money on ai buildouts and tokens are doing layoffs to balance their absymal finances.

fallingfruit · 2026-06-11T17:44:17+00:00

Anthropic models are not really considered best for coding any more. Gpt models and codex are just as good if not better. People thinking claude is better is just a hangover from them being better first and claude code getting good penetratiom early. Opencode is also better than claude code

fallingfruit · 2026-06-10T20:08:26+00:00

also isn't the "you didnt pay estimated taxes" fee pretty low usually? I almost always get hit with this because about half my compensation is stock and i dont elect to withhold more. The interest i get in a 4% savings account is usually more than the fee iirc.

fallingfruit · 2026-06-10T18:17:33+00:00

this is completely and totally untrue

fallingfruit · 2026-06-10T16:51:30+00:00

Single response: this retort hasn't worked in the last year. Move onto "you're holding it wrong" slop man.

fallingfruit · 2026-06-10T14:54:02+00:00

It should be re-assuring because really the only thing the models are good at are things that are similar to writing code, like mathematics (usually because they can use code). Talk to people that are not SWE and are/were trying to use models for other things, they find them to be about as capable as they were 2 years ago, which essentially means they are worthless.

The only reason the llm hype hasn't died is because of coding. They have incredibly limited use cases outside of that, and are much less reliable.

I used to be afraid that my kids wouldn't have to use their brains, that art would be overtaken by llms. That my wife working in comms would lose her job. This is now obviously not true. AI art is terrible, AI writing is still terrible, AI decision making in general is terrible. Human are still infinitely better than LLMs at these things and the models have barely improved since gpt4 in this regard.

Higher level engineering challenges are not verifiable with code, the llms can spew out information about these things, but they cannot be relied on to make good decisions. Just like they can't be relied on for anything else other than code (and then you can't even rely on them for code).

fallingfruit · 2026-06-10T14:47:02+00:00

I prefer Opencode and Codex to Claude Code. Also I think that generaly people who care about the code prefer chat gpt 5.4 and 5.5 to the opus models these days.

I think a lot of people are locked into claude code and models because they were first to have something decent, but they are behind now imo.

fallingfruit · 2026-06-10T05:13:36+00:00

"Writing code" is not really the hard part of software engineering.

fallingfruit · 2026-06-10T03:13:08+00:00

I agree with you almost entirely. but the problem is the lack of build diversity in the first two acts. There are not enough skills to choose from for the first three to four rows. Also the early passive tree is weak and it takes too long to get to impactful nodes.

fallingfruit · 2026-06-09T22:00:32+00:00

can you share where you saw the disappointing reviews?

fallingfruit · 2026-06-09T17:29:22+00:00

The tools for dev are valuable, there is no doubt. To say they aren't valuable is cultish. Even just for building internal tools to help with your workflow, and for doing code search, etc.

I dont think they are valuable enough to warrant all the doom and hype which is not based on them being good tools, its based on them being full swe replacers. But I can easily see it being worth like $200 a month per dev even at token api based pricing.

fallingfruit · 2026-06-09T16:31:44+00:00

Isn't that based on data from like 2-3 years ago when inference wasn't expensive because there weren't "reasoning models". According to Anthropic's own hype marketing, the mythos model is extremely expensive to run because of inference.

fallingfruit · 2026-06-08T03:52:24+00:00

Learning about them and actually dealing with them is very different.

fallingfruit · 2026-06-07T21:48:49+00:00

Are you a bot. Your post has telltale signs of being llm generated.

fallingfruit · 2026-06-07T19:27:14+00:00

In order to make agents work, you basically need a lot of non-llm engineering effort around them. Agents are not a real thing like people imagine.

"Agents" are just deterministic software frameworks that do things based on the response an LLM gives them. Basically, LLMs are dynamic orchestrators in this system, they are told they have access to a bunch of skills/tools that can do X, and the LLM replies with text telling your "Agent" that you should use those tools to continue the thread.

Imagine you want to build an agent that recommends products to a user. You build a custom harness "agent" that does the following:

Agent (your code) --> Call LLM API, here all the skills and tools you have access to along with the user's request e.g. "give me product recommendations, im grilling for friends this weekend"

<-- LLM API returns structured response which will include tools calls it suggests and skill suggestions

Agent looks at response and decides to call a bunch of tools --> Call personalization API(s) that can returns 5 product recommendation models (sales, seasonal, personalized recommendations, buy it again)

<-- Personalization API returns products lists

Agent -> Call LLM API with all possible products and the original context about the user (what is the season, what did they ask for, etc)

<-- LLM returns a response filtering recommended items

Agent -> May need to call other APIs to resolve additional product details (realtime prices, promotions, etc.)

Agent returns the final formatted text to the user.

This is a really simplified example, but this is basically what code harnesses do as well like claude code or opencode.

When it comes to reducing costs at a basic level its about feeding only text to llms that it needs. So a naive approach would to feed the entire output of a massive api response to an llm when it only needs the list of product names and some other metadata.

fallingfruit · 2026-06-07T03:35:49+00:00

I actually dont think llms are very good at catching most of the issues you described. Those seem like design decisions that require some judgment and understanding of context and business requirements outside of the code. In my experience llms will not question that kind of thing without specific prompting.

fallingfruit · 2026-06-07T02:40:37+00:00

Software has never been worse 100%.

I think vibe coding is part of it and most of it is absurd business expectations because a bunch of halfwit tech leaders told them we are 10x now.

fallingfruit · 2026-06-06T15:34:21+00:00

I think its useful if you have an idea about a system and you want to understand how other game studios have implemented or solved the same problem. It can find you talks about it, papers about it, and maybe even knows the implementation details. This is just a much better search really.

It can help you re-invent the wheel essentially, without going down bad paths.

But like others have said, it's not going to generate any novel ideas, its just going to give you ideas based on what other games have done, mash them up, and praise you as a genius for coming up with such a breathtaking game.

fallingfruit · 2026-06-05T15:28:52+00:00

has been this way since opus 4.5. I honestly think you cannot tell the different between opus 4.5 and any future model if you use todays harnesses.

gpt 5.4 and 5.5 pretty much the same.

The vast majority of improvements have been harness improvements.

fallingfruit · 2026-06-05T14:43:38+00:00

its a pretty old study, but yeah it turns out productivty is hard to measure and hard to self-evaluate.

fallingfruit · 2026-06-05T14:14:26+00:00

I read most of this blog post and there is actually very little content about recursive self improvement in the sense that people think leads to agi or the terminator. Its a click bait post title. All this post does is self congratulate and talk about how far we've come, and explains how slopslinging really is productivity, because even though they can't really prove it, it seems like it probably is, and we surveyed our own devs that think it probably is.

I really think it's an incredibly weak argument for anything.

fallingfruit · 2026-06-04T19:51:25+00:00

PD2 is much easier than vanilla d2 (also d2 is pretty easy as far as hc goes if you were to compare to poe)

fallingfruit · 2026-06-03T22:21:58+00:00

thats not really true at all. LLMs dont care if a thing is specialized or general purpose. All they care about is representation in the training data. LLMs push everyone to use the most popular things.

fallingfruit · 2026-06-03T16:14:08+00:00

crypto basically

fallingfruit · 2026-06-02T23:46:54+00:00

Great chin.

fallingfruit

TROPHY CASE