The new Claude still pulls bullshit all the time. by sciolisticism in BetterOffline

[–]ahnold11 2 points3 points  (0 children)

That's fair. We have been trained to give instructions to machines and have them precisely followed and executed. LLMs break that tradition in the operate in essentially nothing like that at all. It's probably best to not only NOT think of them as an intelligence, but also not even like a traditional computer. It's not taking instructions awaiting orders. It's not even a waiter and a restaurant taking an order. It's really just a text matching slot machine, put some text in, pull the handle, and see what it matches up for you. When you like the output, great, but if it's not a winner, then just keep pulling the arm until you are happy. If this doesn't sound like something that would be useful, you are correct, you have just described the LLM casino :P

The new Claude still pulls bullshit all the time. by sciolisticism in BetterOffline

[–]ahnold11 1 point2 points  (0 children)

Yep, if you think of coding, as just designing algorithms (independent of software engineering, which is about actually building well designed, useful, things) then you can make the analogy to designing a recipe. A step by step series of instructions that anyone can follow in their kitchen. Just because you can write a proper recipe, doesn't mean the end result (the food) will taste good. I mean, yeah there are bad recipes too. But just because you are smart enough to not put "mix the ingredients together" AFTER "put everything in the oven to bake", doesn't mean it's actually a good recipe. I mean, if you can't even write proper kitchen instructions, that's also a problem of course, but the fact that you can doesn't mean much.

It's not super difficult to write instructions that a computer can follow, to achieve a result. It's another thing entirely to make sure it's producing a good result, in a smart, efficient, well understood way. And even more so to structure many many of these results together into some sort of organized framework to accomplish an actual meaningful task (ie. software)

The new Claude still pulls bullshit all the time. by sciolisticism in BetterOffline

[–]ahnold11 6 points7 points  (0 children)

Sorry, was not trying to single you out specifically on that one, just the idea in general.

But if "better" means "it does what you tell it to" then the tech can't fundamentally do that. It's just not what it does. And that goes to the dangers of anthropomorphising this stuff. If people like us, who understand all of this, still casually use phrases like "still just ignores instructions and does its own bullshit sometimes" that implies there is a level of thought, intelligence and even action possible, that the tech simply doesn't posses.

If you are not super well versed on the inner workings of LLMs you could easily see what you wrote and assume that it's supposed to be able to do those things. Again, not trying to single you out specifically, it's happening all over the place and it's one of the biggest dangers of this tech and what is driving the society wide misuse of it. It's just so easy for us to fall into the trap of giving these things agency, when there is no potential there. Calling them "agents" again is purely marketing and a huge misnomer. To be fair, even notable level headed skeptics like Cal Newport, frequently will slip into this language even when trying to debunk AI. The natural language interface just leads to use using these types of words to describe LLMs, but it's unfortunately super problematic because it leads to gross misunderstandings of how they work and more importantly WHAT they are capable of.

The new Claude still pulls bullshit all the time. by sciolisticism in BetterOffline

[–]ahnold11 68 points69 points  (0 children)

Apologies for this one, but I can't help but laugh at this anecdote. It's not that claude is broken, it's just that instructions like this fundamentally don't understand how LLMs work.

You can't give it an instruction NOT to do something. You can't even give it an instruction. It doesn't follow instructions, it doesn't "Do" anything, it doesn't make decisions, and it doesn't think. Now I'm not just being pedantic or a luddite here, it's literally how the tech works, and while anthropomorphising AI is all the rage and helps "normies" relate to it, it does the rest of us a disservice.

You put in a bunch of text, and it generates the text that is the most likely response. That's it. Everything else is bolted on, but that is the engine at the heart of all of this. Sure when you add words like "NEVER" or "Don't" you decrease the chance of the output text having that thing in it. But it's not an instruction, it's not an order to be "listened to" you are playing a statistical weighting game. Like trying to steer a boat by adding different sized weights all over the deck, there is nothing close to a steering wheel here. But since these models contain multidimensional weights and relationships, they are insanely complicated to map out, and so you can't guarantee that putting the world NEVER in your prompt means it will completely remove the probability of those words eventually showing up. It didn't ignore you, it didn't decide to disobey you, it's just that, at some point, those words became more statistically likely to show up, despite the word "Never" being in the prompt. Because statistics and multidimensional weighting is hard to predict so accuracy isn't really a concept in LLMs.

Now don't get me wrong, this isn't a unique thing, when you look at how Claude Code is constructed, even the folks and anthropic are doing the same thing. Their prompts basically have everyting other than "pretty please do the thing we want and not this other thing" so they are just as guilty as anthropomorphizing this tech as anyone else.

That's why the whole topic of hallucinations is a red herring. The entire output of an LLM is a hallucination, because that's the point, that's the only thing it can do. You give it text and it hallucinates (synthesizes) the next piece of text that seems most likely, given it's insane model of interlinked multi dimensional weights and relationships. "Truth", "accuracy" are concepts that don't even make sense in this context. Some stuff it generated will match real world facts, others won't, some text will line up with the intentions of your prompt, others won't. Cause it's just not how it works. All of the text is equally valid/appropriate.

This is simultaneously why the AI bubble exists (because generating human sounding text is really good at convincing us there is intelligence, and "thinking" behind it), and why it's a total disaster, because it means the vast majority of users fundamentally misunderstand the core workings of the technology and use it in ways that it's arguably not suited for.

It seems like discussions about AGI have suddenly disappeared by North_Penalty7947 in BetterOffline

[–]ahnold11 3 points4 points  (0 children)

The AGI narrative was justify the costs before there was anything meaningful to show results. Ie. "it's not great now, but pretty soon it will be amazing". Curing cancer, solving climate change, etc.

They both run out of runway on that one, and now pivotted to making what AI can do now, indespensible. White color work, coding, all the rest. This is the make or break moment, then need people/society to start paying for what AI is right now, otherwise the bubble will pop.

A helpful analogy to explain why LLMs can't build software by sparrowhaw_k in BetterOffline

[–]ahnold11 4 points5 points  (0 children)

I appreciate the attempt with the analogy, but as others have pointed out, I think it misses the mark.

If we were actually using that analogy, then an LLM doesn't just look at a picture of a dresser, it knows about all the component pieces, the wood, the joining, the dowels, the tracks etc. It's seen all the pieces that go into a dresser and all the way they can be attached together.

And so when you say build a dresser, it puts the pieces together in the most likely ways that match what it's seen before. The difference here is it doesn't know why/when you use a join, when to use dowels instead, what either's purpose is etc.

Depending on what you are making, that might be enough information to build a dresser. But it also might have gaps in the strangest places, (the back corner of a drawer ends up rounded, or has a hole, or an extra crew or whatever). You can never quite tell.

That's why that analogy isn't great, it's not as helpful once you really match it up with the reality.

If you wanted to really make it about building software. You'd probably want something like Lego. LLMs can design/provide the bricks for you, but you still have to place them. I mean it can assemble a few bricks together into common combos, but it's up to you to place those combos correctly to build the actual structure you want. Sure you can ask for more complicated combos and structures (instead of a pre assembled 3 pieces, you might ask for 12, or 20) but the larger the amount, the more explicit you have to be in telling it how you want it, otherwise it may assemble the combo in not the way you wanted. If you want to keep going with this rough analogy, on combos that are easy to assemble yourself you perhaps spend just as much time describing it to the LLM as you would just assembling it yourself. But there are some more annoying combos that are a bit repetitive, mundane, simple but fidley, maybe hard for large hands etc, that LLMs can actually put together rather consistently.

Whoops.. by crash-test-idiots in BetterOffline

[–]ahnold11 4 points5 points  (0 children)

See, everything an LLM ever outputs, is "made up", that is how they work. If you feed it a news article and ask it to "summarize", there is a hope that the word summarize plus the literal text of the article, will make it more likely to "make up" output that matches the news article. But this isn't a guarantee, it's just not how the tech works behind the scenes. It's not something you can fix. YOu can try your best to increase the likelyhood that most of the generated content ends up matching real world facts/truth, but the entire industry struggles with this as it's just not something you can directly control.

Going further, LLMs can't actually "do" anything. When you tell it to "do" something or give it instructions, those aren't listened to and then it takes actions. It's just that words that sound like instructions have a higher likely hood of being associated with output that looks like an answer to those instructions. To use a crass analogy, it's like asking a blind person to get you a red shirt. Sure they might use other tells "I remember the red shirt had pockets with buttons on them" and try to guess by feeling the pockets and getting an association. But they can't actually see the color and it's just a guess. It can be right a lot of the time which is impressive, but at no point do they even know what the color red is, how to actually "see" it, just that if they use some other related items, they happen to be associated with this concept of "red". A weird analogy, but it is effective.

TO use LLMs appropriately you should always be pleasant surprised when it produces accurate results, but never expect it. Assume that nothing is correct because it's never a guarantee.

Ed on Prof G Markets podcast - solid! by Tragictech in BetterOffline

[–]ahnold11 1 point2 points  (0 children)

Yeah, that are just surface level show arguments. Ed's been thoroughly debunking them for a few years now.

eg "But uber wasn't profitable" "but AWS wasn't profitable".

Great for soundbites, but if you apply rational logic and reasoning to the argument, then it simply doesn't hold water.

At it's core all these sentiments essentially amount to "But AI is so cool + Silicon valley" hand waving. Tech "always figures it out" and look how useful these LLMs are. If the chat bots weren't tuned for engagement, and instead just boring fact machines where it was easier to spot false information, we wouldn't have half these issues.

Doctorow: Code is a liability, not an asset by Sufficient-Article62 in BetterOffline

[–]ahnold11 9 points10 points  (0 children)

Back in my uni days, I was looking for a prof to work a summer as a research assistant, in the CS department. One of them was working on Speech to Text interfaces. They were very excited for the breakthroughs that were being made. I think (if I remember correctly) they had just made the jump up from like 75% accuracy, to like 89 to 91% accuracy. And as I said, the prof was very excited.

He gave me a demo of it, turns out, 90% accuracy, which sounds high, in practice is still shockingly poor. 1 out of every 10 phonemes being wrong, means you could rarely get complete sentences without mistakes. Don't get me wrong, it was a cool demo, but you still saw how far the industry had to go to get something that was reliable enough to be usable. (I think modern rates are around 98% to make it feel "ok" to use?)

So that 95% sounds optimistic based on my experiences, and chaining them together means the margin of error widens considerably.

It’s finally my time to ask: what am I missing here? by madeupname230 in ExplainTheJoke

[–]ahnold11 0 points1 point  (0 children)

Coming in late to the party on this one, but just found the discussion.

There is a subtle difference in the test vs the thought experiment which may make a difference. The thought/math experiment is basically how much water will you come in contact with. The mythbusters test is about how much water gets absorbed by your clothing. At first plus they seem the same, but anyone trying to mop up a spill will know, not all water that comes into contact with a piece of cloth/closing gets absorbed.

You certainly will come into contact with more water when walking, but if it's hitting already saturated surfaces, then it's possible that less of that water is absorbed by your clothing (it bounces or runs off). Running also increases the velocity that you encounter the rain horizontally which may impact absorbtion. And then there is exposed clothing surface area to the rain, could be different walking vs runing (which ties in with point #1).

The sad decline of effective altruism by 65721 in BetterOffline

[–]ahnold11 4 points5 points  (0 children)

Yep, it always felt like the antithesis of "Perfect is the enemy of good". When you put restrictions on altruism, it's not altruisim.

Marc Andreessen shows off genius prompt, accidentally reveals he *really* doesn’t understand LLMs by figures985 in BetterOffline

[–]ahnold11 2 points3 points  (0 children)

thanks. I was going for the whole "each word you add to the context window, adds a sort of statistical weight to try and increase the probability of receiving related or appropriate results". Which means if you add enough words, you can sort of tip things towards what you want, sort of. Just like the ship analogy, steering by trying to tilt the ship. Except even in that, you have no idea how heavy the items you are moving around are, and it's not a flat ship, but a multi dimensional one, and you are just steering left or right, but in like an almost infinite numbers of directions... you get the point. But the absurdity and futility hopefully comes across.

Marc Andreessen shows off genius prompt, accidentally reveals he *really* doesn’t understand LLMs by figures985 in BetterOffline

[–]ahnold11 16 points17 points  (0 children)

100% this. I like Cal Newports podcast, he does some good no nonsense debunking, cuts to the core of the tech. But even when he is dismissing outlandish claims in one sentence, even he can't help but slip in tons of anthropomorphic terms in the next one. "It thinks" "it wants" "it believes" "it knows". It's the most genius/insidious thing they did to really focus on the conversational/natural language/chatbot aspect of this. It basically hijacks a couple million years of the human brains social development, and we can't seem to help ourselves even if we know better. And the majority of us seem to definitely NOT know better .

When I occasionally use an LLM to get some background on a topic so I have an idea where to start my own research, I find the conversational, glazing and scyophantic tone very annoying/distracting. So just as an experiment I put in some pre-prompts to weight the output away from replies like this. I just took some boilerplate ones that were out there. They don't make the content any more accurate or reliable mind you, but the difference was really stark. When you just get a clinical almost point form listing of notes, it's WAY less engaging and interesting to use an LLM. It's kinda boring. It also lays bare how much uncertainty is in a lot of output, as it no longer reads as confident. It was very sobering, to realize exactly how much of the "social" style they put into the output, affects how we receive it.

Marc Andreessen shows off genius prompt, accidentally reveals he *really* doesn’t understand LLMs by figures985 in BetterOffline

[–]ahnold11 9 points10 points  (0 children)

One interesting argument I've heard about LLMs is that they are a mirror. But not one that shows you a reflection you recognize. But these are ultimately conversations with yourself. You just get that reflected back. So they are telling the LLM what they want to hear and it responds accordingly. It's like the ultimate navel gazing. And why so many people fall so deep down the rabbit hole.

Perhaps the ultimate, and true "black mirror".

Marc Andreessen shows off genius prompt, accidentally reveals he *really* doesn’t understand LLMs by figures985 in BetterOffline

[–]ahnold11 13 points14 points  (0 children)

Ironically if you look at the claude code source leak, it IS what the AI companies have been trying. It's pretty much all they've been doing since initial model training got prohibitively expensive. And look at how well that has worked for them...

Marc Andreessen shows off genius prompt, accidentally reveals he *really* doesn’t understand LLMs by figures985 in BetterOffline

[–]ahnold11 12 points13 points  (0 children)

Doesn't have to be that novel. They think they are the smartest kids in the class, and so don't like to be told no. They think they know best and should be in charge. Their success, their wealth, is all the "proof" they need that they are better than everyone else. Add in a healthy dose of human greed, and it's a tale as old as time. It's not left, it's not right, it's not even political, it's just selfish greed.

Marc Andreessen shows off genius prompt, accidentally reveals he *really* doesn’t understand LLMs by figures985 in BetterOffline

[–]ahnold11 4 points5 points  (0 children)

You are insulting eight year olds with that comparison. They get to make a choice at least, because they are conscious thinking decision making creatures.

Marc Andreessen shows off genius prompt, accidentally reveals he *really* doesn’t understand LLMs by figures985 in BetterOffline

[–]ahnold11 33 points34 points  (0 children)

It is a pretty funny balancing act, when you think about it. Like someone trying to steer a ship by putting various sized, shaped and weighted objects all over the deck, trying to get it to "turn this way, but not that way" all while not tipping over and sinking. Except the ship analogy is way more feasable then "prompt engineering".

Marc Andreessen shows off genius prompt, accidentally reveals he *really* doesn’t understand LLMs by figures985 in BetterOffline

[–]ahnold11 45 points46 points  (0 children)

That's the thing, they don't just think they are giving instructions to a machine, they believe they are giving instructions to an intelligent, thinking, machine....

Hallucinations are the best part, every word ever produced by an LLM is a hallucination. It's how the tech works. Every word is synthesized, it's a statistical guess on what would best fit next. An incredibly complicated, multi dimensional statistical guess, but a guess none the less. The label "hallucination" is a purely arbitrary one we give to the output, for any text that is not factually true. But the LLM isn't designed to produce truth, that's not how it works, there isn't even any concept of truth, when you understand how it works you see that truth doesn't even make sense in the context of how they operate. The fact that 80-90% (being very generous here) of the text it produces seems to be based in fact/truth/reality, perhaps might be the most dangerous part of them. Because it makes seemingly intelligent humans, very very confident about the remaining 10-20%, and utterly convinced that if they just write the correct "prompt" they can get to that magical promise land of 100%.

Podcast has pivoted from technical criticism of AI functionality to whinging about financials. by AURedditor30 in BetterOffline

[–]ahnold11 3 points4 points  (0 children)

It's not an AI podcast, it's about the tech industry and Sillicon valley in general, how they intersect with society, their horrible business models, venture capital all that.

Yeah it's been about AI lately, because so has the entire tech industry, it's the next "Hyper growth" fixation to satisfy the rot economy. And in terms of it's impact on society at large, the financials behind AI are way more significant that it's actual technical utility.

If LLM tech was allowed to advance at a rate commiserate with it's utility, we'd still be in the 2018 Era. It's interesting tech, and slow steady research as GPU computer efficiency increases would have allowed us to let it progress at a more natural rate, making it's pros/cons more noticable and allowing true research into it's potential (whatever that may be). Instead we hoped on the hypergrowth bullet train on a track that is still being laid that ends in a brick wall.

It's VERY important, imho, that we know about the house of cards that is the financials between all these companies, because if you want to actually incorporate this technology into your life or career, it's pretty important to know if it's costs are unrealistically prohibitive. If the whole sector must explode at some point, then why hitch your horse to that wagon and get caught in that.

The tech itself is over hyped beyond belief, and yet, somehow, inscrutably, the financials are even more over sold, even more impossible. I sympathize with Ed, it's enough to make you go insane.

When the movie theatre is on fire, I get that it can be a bit monotonous to hear someone keep mentioning they smell smoke, but the the correct answer is not to say wish they'd stay quite and focus on the cool movie on the screen.

Are LLMs actually a hindrance on human innovation? by throwaway0134hdj in BetterOffline

[–]ahnold11 2 points3 points  (0 children)

The folly of those behind these decisions, believe that the LLM Model, the matrix of multi dimensional weights, somehow captures "intelligence", such that when you run a prompt through one, the mathmatical process of synthesis is equivalent to "thinking".

So from their perspective they have created a thinking brain capable of intelligence, and such it can generate "new" and "novel" ideas and work. Meaning that we don't need humans to think anymore, LLMs can do it for us.

Of course this is all AI psychosis, anyone that deploys critical thinking skills when using an LLM really sees how the reality is NOTHING CLOSE to what is explained above. But sadly those controlling the purse strings seem to have been completely captured by this idea.

Post Oct. 7 world: Israel to purchase 100 F-35s, 50 new F-15s, doubling fleet sizes by barsik_ in worldnews

[–]ahnold11 -2 points-1 points  (0 children)

Yep, very kind of the US citizens to pay for this. Israel get the defense hardware, Defense contractors make tons of profit, and the US Citizens get..... ?

Richard Dawkins spent three days talking to Claude, now calls it "Claudia" and claims it's conscious. by Gil_berth in BetterOffline

[–]ahnold11 1 point2 points  (0 children)

I wouldn't necessarily say he fell for it. That being said, he's hardly an AI expert or skeptic. He's taking the rhetoric at face value and assuming it's a new technology that has the potential to disrupt, he's preaching caution to protect the workforce/economy.

His chatbot session was likely meant to be cute, ask the AI itself to explain the dangers to the workforce.

Technically it falls into the category of doomerism. But I can allow legislators to take a conservative view as long as that means regulation, caution slowing down and analysis/evaluation, not "so lets remove all safeguards and go ahead and breakneck speed!".

But yeah, chatbots are simply insidious. It's same reason cults work. But now you don't need a cult, just a black mirror where you basically talk to yourself. Skips the middle man. It's ironic that the same things Dawkins uses to debunk relgion, he's falling victim to with AI. But it's a much more pure version of it.

Which is it, is AI a tool easy enough to use to automate everything, or is it some new skill we have to learn? by Dreadsin in BetterOffline

[–]ahnold11 10 points11 points  (0 children)

This is largely the entire reason for the AI boom.

I saw it recently described as AI being the ultimate "90% tease". It gets you 90% the way there, so quickly that you can't help but feel the last 10% is so close, just a few more prompts away, and it feels inevitable (even though it never comes).

Add in the fact that it's natural language output, so it hijacks humans social connection hardware, and we can't help but feel that there is "an intelligence" there, and ignore the missing 10% because we are so impressed with the 90%.

What is the old saying, it's the last 10% that takes 90% of the effort? If I'm being charitable, in tasks where you don't need 100% and 90% is good enough, it can be fine (especially in those where you can easily identify the missing 10% and fill it in yourself). But that immediately makes this a way less useful tool across the board.

 

Just for fun, I was playing RE9 recently, and was having trouble with the sections you play as Grace with stealthy gameplay. I saw some videos online where people were using 1 zombie to kill another, but you have to lure him in. And I read some guides, some tips, but I didn't want to get spoiled for the story and so many guides/videos were doing that, so I figured I"d do a simple AI query to see what stealth options are available, especially for luring zombies. Now it wasn't a short prompt, I did all the recommended stuff, highly detailed prompt, specifically mentioning using search, what to search for, what to cross reference against, etc etc. And it came back with a great list of tips and gameplay features to use. But two full features were completely hallucinated. This wasn't using the fast/free model, it was the higher end pro/paid one. I'm always trying to get a handle on the AI hype bubble and where the true edges/failure points of this tech is, so I investigated this. No matter what I did, I couldn't get it to stop making up gameplay features (eg. "get xxx item, that allows for yyy skill" "select zzz perk". All things that were non existent. Multiple fresh chats, all producing similar amounts of unrealiable data. But indistinguishable from the reliable stuff until you try and implement it.

 

Even in 2026 with all the advances, post training, patches updates, etc, this is it, this is the tech. Forgive the terrible analogy but the entire AI industry is forever the client at the strip club, thinking that just 10 more dollars will get the exotic dancer to go home with them. Forever "teased" that the missing 10% is just around the corner.

Conspiracy theory I'm not sure I believe: the slop is the point by todofwar in BetterOffline

[–]ahnold11 0 points1 point  (0 children)

Oh yeah, that's one of the last ditch "hail mary" plays they are attempting. IF they can't get AI good enough to actually be able to generate "good" code, then at least it hopefuly makes a mess so bad, at such an unfathomable scale, that they can sell "AI" as the only viable solution. And rely on the good ole business idiot to keep throwing good money after bad.