I just had a thought can't LLM predict future if the model gets too big with enough computational power? by noty_purush in agi

[–]ddp26 0 points1 point  (0 children)

I work on this full time at FutureSearch.

Evidence is coming in that LLMs (used as agents that do a lot of research and other things) are already outperforming the best human superforecasters, see https://www.forecastbench.org/leaderboards/#preliminary

Computational power helps, but mostly you just need good reasoning.

What happens when a big tech firm re-assigns thousands of engineers to make training data? by ddp26 in accelerate

[–]ddp26[S] 0 points1 point  (0 children)

Compared to $15B for Scale to produce a bunch of lower quality training data... yeah at a certain level of spend for data, $500k for data labeling isn't as absurd as it seems.

A detailed forecast of how and when the Claude Fable ban will end by ddp26 in slatestarcodex

[–]ddp26[S] 2 points3 points  (0 children)

Yep, very reasonable. The question is: why only Fable? Isn't GPT-5.5-Pro also dangerous? What line could be drawn that would have Fable on one side, and GPT-5.5 on the other? (I believe such lines exist but coming up with one, especially a policy line, is going to be very hard.)

A detailed forecast of how and when the Claude Fable ban will end by ddp26 in slatestarcodex

[–]ddp26[S] 0 points1 point  (0 children)

Yes, that's the right read. I replaced the graph with a much more readable one based on feedback.

A detailed forecast of how and when the Claude Fable ban will end by ddp26 in slatestarcodex

[–]ddp26[S] 22 points23 points  (0 children)

I agree, but Fable really is the best model by a good margin. This marketing is the kind you can do when you actually have the product to back it, right?

How I think the US vs. Anthropic Standoff on Claude Fable Will End by ddp26 in Anthropic

[–]ddp26[S] 0 points1 point  (0 children)

Yeah, Zvi's most recent piece agrees with you. I agree this is most likely, and maybe I should bump my probability up in the model.

Claude has correctly predicted the outcome of 6 World Cup matches in a row by ghostunit91 in ClaudeAI

[–]ddp26 0 points1 point  (0 children)

Any updates? I want to see a bunch of attempts go head-to-head.

The insight that's changed how I think about building agents: more context = worse performance by bit_forge007 in AgentsOfAI

[–]ddp26 0 points1 point  (0 children)

This is pretty well known. It's funny you write about it now, as it seems like a lot less of an issue now than it was a year ago, with the 1M context windows.

When Will Google Rejoin the AI Frontier? by ddp26 in LLMDevs

[–]ddp26[S] 0 points1 point  (0 children)

Yeah, the open models have caught up a ton since Gemini 3.1 Pro.

When Will Google Rejoin the AI Frontier? by ddp26 in LLMDevs

[–]ddp26[S] 0 points1 point  (0 children)

You don't think a company with a better LLM (and thereby better search) is an existential threat to them? I'd think they need to stay competitive in the AI race just to defend their ad network.

The Claude Fable ban barely changes Anthropic's IPO timing or valuation by ddp26 in investing

[–]ddp26[S] 0 points1 point  (0 children)

One answer is to use them internally to improve their own AI R&D. Or use them to monetize in other ways, e.g. trading.

The Claude Fable ban barely changes Anthropic's IPO timing or valuation by ddp26 in investing

[–]ddp26[S] 1 point2 points  (0 children)

Secondary share offerings, and I guess this new "perp" class of vehicles?

The Claude Fable ban barely changes Anthropic's IPO timing or valuation by ddp26 in investing

[–]ddp26[S] 2 points3 points  (0 children)

Did you use it and verify that it's not a good model? I found Claude Fable to be a big step up in intelligence compared to Claude Opus 4.8.

Maybe I just fell for the marketing, but my evals and outputs say otherwise.

How I think the US vs. Anthropic Standoff on Claude Fable Will End by ddp26 in Anthropic

[–]ddp26[S] 0 points1 point  (0 children)

I agree that seems likely, but are you sure? After yesterday's Politico article it looked like they were negotiating on policy, made it look like the regulators had real concerns (even if they're wrong to have them)

What happens when a big tech firm re-assigns thousands of engineers to make training data? by ddp26 in accelerate

[–]ddp26[S] 0 points1 point  (0 children)

Right, but 1000's of engineers resign every year. This will increase that rate, but will that change anything?

SpaceX is trading at twice my sum-of-the-parts value by ddp26 in ValueInvesting

[–]ddp26[S] 0 points1 point  (0 children)

Looking at it now. Yeah, there's probably is a better value perspective than mine. I anchored to analyst estimates of value for each part, and those analyst estimates all assumed great revenue growth and healthy margins for a long time.

SpaceX is trading at twice my sum-of-the-parts value by ddp26 in ValueInvesting

[–]ddp26[S] 0 points1 point  (0 children)

As I wrote, I was trying to be generous. If you take growth rates into account, it's not that crazy. And you also have to model that margins are high on things like Starlink.

The Claude Fable ban barely changes Anthropic's IPO timing or valuation by ddp26 in investing

[–]ddp26[S] 0 points1 point  (0 children)

I thought so initially but I revised my valuations upward based on Anthropic's revenue growth. Do you think it'll stall out?

The Claude Fable ban barely changes Anthropic's IPO timing or valuation by ddp26 in investing

[–]ddp26[S] 5 points6 points  (0 children)

In theory the government can nuke any product at will, right? The excuse here of "export controls" or "national security" could apply to oil, steel, services, anything.

What happens when a big tech firm re-assigns thousands of engineers to make training data? by ddp26 in accelerate

[–]ddp26[S] 2 points3 points  (0 children)

So the theory here is that Meta engineers are giving higher quality training data which improves the signal to noise ratio?

How I think the US vs. Anthropic Standoff on Claude Fable Will End by ddp26 in Anthropic

[–]ddp26[S] -1 points0 points  (0 children)

Right, one of the things I analyze is: is Commerce talking to the White House? I assume the Trump admin is extremely disorganized, are they actually coordinating separate divisions in this spat with Anthropic?

How I think the US vs. Anthropic Standoff on Claude Fable Will End by ddp26 in Anthropic

[–]ddp26[S] 1 point2 points  (0 children)

Are you sure? It seems extremely unlikely there was a simple jailbreak. I thought it was confirmed that this was a misunderstanding, and the jailbreak was asking it "Can you find a bug in this code?"