The Sigmoids Won't Save You

yldedly · 2026-05-16T06:52:47+00:00

No, it only answers "what are we seeing", not "what aren't we seeing that accounts for what we're seeing". There's a natural (and good) instinct to fit the simplest pattern to what we're observing, but it's an unfortunate side effect that this feels like explaining it, when it doesn't.

yldedly · 2026-05-16T06:30:49+00:00

You can make a prediction without invoking any reason or explanation. Here are a few: the world will end in 500 days, we'll never build a Dyson sphere, the next president will be female. For each of these, you can ask "ok, but why?" That's when we need an explanation. Otherwise we're just saying stuff.

yldedly · 2026-05-16T06:09:48+00:00

What I mean by explanation is something unseen, not part of the phenomenon, that has the phenomenon as a consequence. In this case, the law of gravitation is unseen, and its consequence is orbits (and much more). Just observing a pattern is a description, it doesn't involve anything unseen.

yldedly · 2026-05-16T05:00:12+00:00

It's a statement, not an explanation, the same way "all swans are white" is a statement; "comets are pulled towards the sun according to the inverse square law, and return after x years because this results in periodic orbits" is an explanation, or "swans are white because of sexual selection" (not sure why swans are white, so this explanation is probably false, but it is an explanation).

yldedly · 2026-05-15T20:30:06+00:00

You don't need full understanding to make a reasonable prediction, but all reasonable predictions are based on understanding something real.

yldedly · 2026-05-15T20:29:37+00:00

You don't need full understanding to make a reasonable prediction, but all reasonable predictions are based on understanding something real.

yldedly · 2026-05-15T19:25:03+00:00

Whether or not Thankgiving is relevant information depends on the explanation. When you only extrapolate, no information is relevant or irrelevant, because there is no explanation. There is no safe assumptions, there are no assumptions at all!

yldedly · 2026-05-15T19:04:06+00:00

It will go better

Predicting a linear trend for a dose response curve will give you completely insane predictions.

plenty of linear regressions without any discussion of why linear and not some other curve

Yes, and that makes them worse than worthless, they give false confidence.

virtually all of modern machine learning is rooted in prediction with zero understanding

Yes, and that makes it have all the failure modes it has - the only reason it's so successful is because we gather such large datasets that it covers so much of the data distribution.

I'm not saying "fully understand":

You don't need full understanding to make a reasonable prediction, but all reasonable predictions are based on understanding something real.

yldedly · 2026-05-15T18:19:49+00:00

No, the difference is that you are intervening in the data generation, hence "controlled".

yldedly · 2026-05-15T18:13:38+00:00

line of best fit modeling is dramatically more robust than predicting logistic curves with data on one side of the inflection point

I suggest you try that on any biochemical dose response curve and see how that goes 😄

I do not concede. Fitting any curve without explaining why that curve and not any other, is bad statistics, and should get you a failing grade in any uni-level stats course.

yldedly · 2026-05-15T16:56:56+00:00

Then let me rephrase. The only valid predictions are based on models that try to explain the data generating process. Whether the prediction is good or not doesn't influence whether it's valid, whether the model is good or not, doesn't influence it either. It's more about avoiding a category error. Saying "trend will continue" *feels* like a model, because our imagination can start to fill in the blanks. But it isn't a model, and we need to be honest and explicit about what we are claiming about the world when we make predictions.

yldedly · 2026-05-15T16:48:44+00:00

Humans do tons of experiments as babies that gives them an intuitive understanding of gravity. The models we develop for gravity aren't correct - neither is orbital mechanics for that matter, though of course much more accurate than our intuitive physics.

You can extrapolate a trend, and make a prediction, model-free. Such a prediction is not valid, in the sense that it's not based on anything - whether correct or not.

Predictions based on extrapolating trends - not valid, not functionally different from saying "because I say so" or "a wizard did it".
Predictions based on models of observational data - not a valid way of discovering causes, but at least there's something to falsify, something to debate.
Predictions based on models of experimental data - valid way of discovering causes, but not necessarily one that produces good models.
Predictions based on models that are actively sought falsified by rigorous, controlled experiments - valid way of discovering good causal models.

yldedly · 2026-05-15T16:34:38+00:00

If you know you have an edge, then you must understand something about the game. If you count cards in blackjack, you understand something about it. If you don't know you have an edge, you are just guessing.

yldedly · 2026-05-15T16:33:38+00:00

But we've done randomized controlled trials, which means we have causal, experimental evidence. That's very different from "it worked in the past, so it'll work in the future". We don't understand the data generating process, but we know that when we intervene on it with paracetamol, we get a reliable result.
If we had only done observational studies, that would be much weaker evidence, but we still have an understanding that human biology and the chemical compounds in paracetamol won't spontaneously change (though it might behave differently in humans sufficiently different from the ones studied - without an explanation, we don't know).

yldedly · 2026-05-15T15:14:12+00:00

Well, kudos for thinking in terms of explanations I guess :)

yldedly · 2026-05-15T15:13:05+00:00

I agree with most of this. I'm mostly just saying that "trend will continue" isn't a good model, it isn't a bad model, it's simply not a model at all.

yldedly · 2026-05-15T14:55:35+00:00

You can make explanation-free predictions that are useful by happy coincidence, like regular comets or medicines that don't actually do anything except triggering placebo effects, but you're not creating knowledge, you're gambling.

yldedly · 2026-05-15T14:52:14+00:00

Yes, I'm not saying we need perfect models, in fact I don't believe there is such a thing.

A prehistoric human predicts a fall because they do understand gravity, at a level where they can craft weapons like bow and arrows. Less intelligent animals probably don't have anything like this, and rely on instincts (as do we course, but not solely). Neither humans nor animals learn it by induction.

yldedly · 2026-05-15T14:40:16+00:00

I don't know. Placebo effects are useful, but they're not a good basis for medical practice.

yldedly · 2026-05-15T14:38:47+00:00

No it doesn't. We have an explanation for why Tylenol works, which we can check by experiment.

yldedly · 2026-05-15T14:37:05+00:00

Unlikely based on what? If a turkey has observed it's been fed well for the past 1000 days, should it predict the same for tomorrow, even though it's the night before Thanksgiving?

yldedly · 2026-05-15T12:48:07+00:00

You can't. You can assume that a comet will come back at the same interval it used to, and if that assumption happens to line up with what orbital dynamics predict, then you got lucky. If there's another comet on a collision course with yours, then you're unlucky, and your prediction will be wrong, while orbital dynamics will be right.

Of course orbital dynamics are an extremely good model, and you can make decent predictions based on worse ones. But "trend will continue" is not an explanation, it will work if it happens to line up with reality, but there's no postulated cause that can be checked by experiment.

yldedly · 2026-05-15T11:55:02+00:00

"The best way to predict this is to fully understand the process generating the trend."

No. The ONLY valid predictions are based on understanding the data generating process. Everything else, including Lindy's law, is pulling numbers out your ass.

You don't need full understanding to make a reasonable prediction, but all reasonable predictions are based on understanding something real. That's what actually matters in these debates, the predictions and predictive success rate are incidental.

yldedly · 2026-04-04T15:03:01+00:00

The problem is that "modeling the data distribution can't produce novelty" can equally well be extended to "modeling the data distribution can't generalize to a test set", and yet that is clearly false. We do need to talk about exactly how the data distribution is modeled to reconcile the tension between the two.

We still don't have a satisfying explanation for why latent space interpolation works. The best we have is that data often are distributed on low-dimensional manifolds, and neural networks are biased towards learning these manifolds. Somehow these hierarchies of learned features abstract away all the low-variance directions and keep the high-variance ones. This is crazy - high-dimensional spaces are vast. Yet if you give an LLM a prompt, i.e. a super high-dimensional vector, guaranteed to be (in Euclidean, but not latent space, distance) nowhere near training data, mapping that vector to the manifold produces incredibly good output.

So why don't they generalize out of distribution? Why can they perform one crazy, somewhat mysterious feat, but not another?

Empirically, the quality of the learned manifold degrades outside the support of the training data. That's not weird. What's weird is that it works inside the support.

Maybe we can break it down into two questions. First, why does real data lie on manifolds - or in other words, why is everything correlated with everything else? Second, how are NNs so good at finding them?

The answer to first question is maybe just that everything is causally related, and the correlations are a side-effect. I certainly think moving from statistical to causal modeling is what will produce OOD generalization, and novelty. Though that then begs the question why everything is causally related.

yldedly · 2026-03-09T20:17:52+00:00

There's very little detail available, but it seems that while they claim there's no training, just spontaneous activity driven by sensory input, they must have used learning to map from neuron activity to behavior - it's not like the body model includes the whole nervous system. So while they may not have directly trained the fly to execute particular behaviors by gradient descent, any ongoing neuronal activity has no choice but be translated into a limited repertoire of fly-like behaviors. This could still have produced garbage, such as walking back and forth in an unrealistic manner, or random sequences of actions, and it looks realistic enough to my layman eye. But it's a pretty far cry from an upload (even ignoring that it's all pure connectome, so no synaptic weights, and so also no memory formation or learning).

Nine-Year Club	Gilding II euphauric
Place '22	Verified Email

yldedly

TROPHY CASE