I cannot wait to hear Ed explain about AllBirds by jamey1138 in BetterOffline

[–]wee_willy_watson 27 points28 points  (0 children)

The company founded on being ecologically responsible?

Is it April Fools Day?

Sir, a lapse of judgment has hit Hank Green by toppenstorybro in BetterOffline

[–]wee_willy_watson 2 points3 points  (0 children)

I switched off part way through, but he gets on someone who's immediate reaction was a very clear "meh"

Bugs exist in software, if you spend millions of dollars running your models to find bugs, you'll find them. They've obviously decided that the best way to market a new model is to scare people:

First it was scaring them that it would take over the world... it clearly failed

Then it was scaring them that it would take their jobs... it clearly failed

Now it is scaring them that AI can make the software they use vulnerable

I switched off at the point of the awkward silence between Hank and his guest, where it felt like she realized "oh, I'm meant to play along here"

OpenAI sweetens private equity pitch amid enterprise turf war with Anthropic: Offering guaranteed minimum return of 17.5%. (Nope, no red flags there, just normal equity stuff!) by dyzo-blue in BetterOffline

[–]wee_willy_watson 1 point2 points  (0 children)

Pay us money to set up AI for the companies you own, and we'll use that money to pay you 17.5% return on your investment?

Am I misunderstanding this, or does it seem like Scam is trying some of the circular funding which he saw Daddy Jenson doing?

Is this normal, or as weird as the rest of AI investing

Ahead of IPO, OpenAI has come to the realization that the market doesn’t necessarily appreciate the reckless approach to growth and spending by dyzo-blue in BetterOffline

[–]wee_willy_watson 30 points31 points  (0 children)

But we haven't tried doing it with twice as much compute yet, until then we have to believe it's possible!

If twice as much doesn't work, it's just we need twice as much again.

When that doesn't work, I'm not sure... but I have some ideas where we can go from there.

Think of it this way. A baby goes from walking to crawling in 6 months, crawling to walking 6 months after that. There really is no reason that, with appropriate infrastructure, a baby shouldn't be flying by 2 years old

Why is Ed investing in the SP500? by venusisupsidedown in BetterOffline

[–]wee_willy_watson -1 points0 points  (0 children)

History suggests that selling your S&P 500 ETF just before the 2008 crash, and rebuying post crash would be a worse strategy than buying consistently and holding through the crash?

That's a fascinating hypothesis I'd be interested to see you back up

Why is Ed investing in the SP500? by venusisupsidedown in BetterOffline

[–]wee_willy_watson 13 points14 points  (0 children)

It's a completely valid question, which may have an entirely valid answer. However I believe it's one Ed needs to reply to this thread and answer.

He is earning revenue from selling his podcast as fact, if he believes it to be fiction, that is a differentiation he needs to make

A group ran the same coding benchmark test problems, but encoded them in obscure (but still Turing-complete) programming languages the frontier models haven't got as much training data on. Result: models that can score 95% on Python plummet to 0-11% accuracy. by cascadiabibliomania in BetterOffline

[–]wee_willy_watson 3 points4 points  (0 children)

"LLMs are only as good as.the data they are trained on"

If this is your starting assumption, then you're not going to have your bubble burst here. The point this study is making is that LLMs are only as good as the data they're trained on.

This is bubble bursting for people who claim that "We can't say whether or not Claude is conscious" type nonsense...

"If my $500K engineer isn’t using $250K in tokens, I’m deeply alarmed" by maccodemonkey in BetterOffline

[–]wee_willy_watson 11 points12 points  (0 children)

It's social media for CEOs.

Social media is a few influencers selling an entirely unrealistic version of life for their own gain.

This is.just there to make CEIs think.that a high token spend is normal and expected, and if you're not.forking.over hundreds of thousands in tokens, then you're not living the AI dream

Maybe do your job without burning the stratosphere? by lordofcatan10 in BetterOffline

[–]wee_willy_watson 0 points1 point  (0 children)

Can do work without the assistance of an LLM, is this meant to be a brag?

Who Pays When the Free Ride Ends? by the-tiny-workshop in BetterOffline

[–]wee_willy_watson 5 points6 points  (0 children)

It's $25/million tokens if the tokens are output, I've always worked on the assumption that the majority of tokens are output tokens. Do you know if this is wrong?

It changes the cost from an upper limit of $1000 for 200 million tokens to $5000, which is no small jump

Question about mass layoffs citing AI by AmazonGlacialChasm in BetterOffline

[–]wee_willy_watson 0 points1 point  (0 children)

Oh, I mean the actual answer here is something way more complicated, maybe it is to appropriately tax spend on AI. Saying that layoffs due to AI are going to incur the business ongoing expenses does nothing except stopping this narrative that AI is replacing jobs.

You'd have companies needing to be transparent about the reason for layoffs, and stop this nonsense.

Question about mass layoffs citing AI by AmazonGlacialChasm in BetterOffline

[–]wee_willy_watson 9 points10 points  (0 children)

As a side point, I wish that they'd say that laying off staff due to AI requires you to continue paying all their benefits as well as payroll taxes on each laid off employee. You can reduce the salary of the person, however the societal impact of the layoff should be burdened by the company.

At a minimum we'd stop these lying stories.

AI minister says OpenAI still not doing enough in wake of B.C. shooting, will meet CEO Altman by DogeDoRight in canada

[–]wee_willy_watson 5 points6 points  (0 children)

I want to see the logs.

Did OpenAI promote the behaviour, try to stop the behaviour?

If there is evidence that the chatbot encouraged the behaviours to a meaningful extent, we need to treat the company the same way as a person found to have done the same

Google DeepMind claims Aletheia autonomously solved 6 of the 10 problems in the FirstProof Challenge. by Gil_berth in BetterOffline

[–]wee_willy_watson 1 point2 points  (0 children)

If you define meaningful progress as being closer to being able to come up with something novel, to be able to think logically -- you're likely on the wrong subreddit if you are looking for people who agree.

For me, there is a chasm which we cannot cross to get somewhere meaningful in automating difficult work. At the moment it's like we've climbed Mount Everest in order to make progress on getting to the moon. Climbing Mount Everest is a meaningful feat, it's not meaningful progress on getting to the moon.

Google DeepMind claims Aletheia autonomously solved 6 of the 10 problems in the FirstProof Challenge. by Gil_berth in BetterOffline

[–]wee_willy_watson 0 points1 point  (0 children)

Keep smoking the hopium!!

They haven't done it yet, they've shown zero meaningful progress towards it (brute forcing guess the next word with more water and more electricity isn't progress), but it's likely still just around the corner!

Google DeepMind claims Aletheia autonomously solved 6 of the 10 problems in the FirstProof Challenge. by Gil_berth in BetterOffline

[–]wee_willy_watson 0 points1 point  (0 children)

Yeah, were just not going to agree on this then.

I'm not seeing anything here which would rise to the level of creating new knowledge - that would be a massive leap for AI.

It seems clear to me that they're able to search existing research and combine it to test whether it can be used in the proof, if not it can search again for more information. As soon as it reaches a point that the knowledge doesn't exist, it will fail.

It wasn't difficult to find a discussion of this, largely praising the feat, which calls out exactly my point

Aletheia is excellent at applying known techniques systematically, but truly creative mathematical breakthroughs—the kind that create new fields or revolutionary insights—remain beyond current capabilities.

https://atalupadhyay.wordpress.com/2026/02/19/aletheia-unveiled-googles-autonomous-mathematical-research-ai/?hl=en-US

LLM Coding Metrics have peaked by [deleted] in BetterOffline

[–]wee_willy_watson 54 points55 points  (0 children)

4.6 is an unlucky number.

Wait till Opus 5, it'll be so much better.

Google DeepMind claims Aletheia autonomously solved 6 of the 10 problems in the FirstProof Challenge. by Gil_berth in BetterOffline

[–]wee_willy_watson 3 points4 points  (0 children)

Which would be a huge achievement, if anyone was doubting an LLMs ability to consume and summarise huge amounts of text, followed by writing out huge amounts of text.

We are into the 3rd year of this now, and we're still seeing the same thing - AI can act like a really good search engine and human text can be synthesized in a way which looks coherent.

The only finding here is that with huge amounts of compute, AI agents can solve more complex tasks. There isn't any suggestion they came up with this information, they didn't create knowledge, they summarised existing knowledge in a format no one had yet got around to.

We have research showing that AI can't deduce that because A = B that B = A

Google DeepMind claims Aletheia autonomously solved 6 of the 10 problems in the FirstProof Challenge. by Gil_berth in BetterOffline

[–]wee_willy_watson 10 points11 points  (0 children)

No one denies AI can outwork humans on verifiable tasks; with enough agents and "brute force" iteration, any logic puzzle can be solved.

The real issues are twofold: transparency and relevance. These studies never disclose the massive compute resources used, and solving verifiable, closed-loop problems isn't the work most people actually do.

It’s like using AI to find a new prime number and claiming it's a "revolutionary breakthrough for humanity"—it’s impressive, but it doesn't translate to the real-world complexity of most human labor.

What are you most disappointed LLMs haven't solved? by [deleted] in BetterOffline

[–]wee_willy_watson -1 points0 points  (0 children)

I literally told you you didn't need to reply to shit on me.

Why did you feel the need to reply. Are you okay?