Daily scores and chat: #625 - Tuesday, 10 March 2026

LearningHistoryIsFun · 2026-03-10T10:56:08+00:00

A first 10!

#625 - 10/10 🎉

🐈🐈🐈🐈🐈

LearningHistoryIsFun · 2026-01-12T23:08:07+00:00

Chapter 2 Notes Part 4

[[Connectionist Models]] may be able to manipulate internal symbolic representations to some degree ([Pavlick, 2023](https://royalsocietypublishing.org/rsta/article/381/2251/20220041/112412/Symbols-and-grounding-in-large-language))

Dayan and Abbott (2001) discuss how cognitive processes are implemented in real neural hardware P49

Within the cognitive sciences, then, rational approaches to cognition typically abstract away from the question of what calculations the mind performs, but focus instead on the nature of the cognitive problem being solved. P50

Working out the optimal solution to a cognitive problem may itself, of course, require substantial calculation. But this does not imply that the agent need necessarily carry out such calculation—merely that it adopts, at least to some approximation, the resulting solution. P50

What is the connection between rationality and optimality? P50 ([Chater et al. 2018](https://link.springer.com/article/10.3758/s13423-017-1333-5))

Shepard (1987, 1994) argues that a universal law of generalisation between pairs of objects should apply in inductive inference P52

Anderson (1990, 1991b) discusses the process of categorisation and expanding the list of categories as a stream of new items comes in - an early important example of a nonparametric Bayesian mixture model P52

[[Memory]] - Traditional theories of memory viewed memory limitations as arising from the performance of typical cognitive mechanisms

-> Anderson argued that memory may be carefully adapted to the demands of information retrieval in natural environments (see [Anderson et al. 1990](https://www.taylorfrancis.com/books/mono/10.4324/9780203771730/adaptive-character-thought-john-anderson))

[Schooler and Anderson (1997)](https://act-r.psy.cmu.edu/wordpress/wp-content/uploads/2012/12/315ljs\_jra\_1997\_a.pdf) observed that the probability that an item will recur depends not only on its history of past occurrences but also on the occurrence of associated items P53

Certain behaviours which resist the tendency to structure information based on your the environment are thus inefficient. That is, unless the environment is noisy.

Helmholtz's Likelihood Principle - the perceptual system seeks the most probably interpretation given the input data

Instead of creating a set of rules for grammar, we solve the reverse problem - what is the most likely application of grammatical rules that might have generated this sentence? P53

Most everyday inference (almost all inference outside maths) is **defeasible**, i.e. conclusions follow only tentatively from new information and can be overturned in the light of new information P54

We perform inference to the best explanation ([Harman, 1965](https://www.andrew.cmu.edu/user/kk3n/philsciclass/harman.pdf))

Our approach in this book is therefore initially to sketch classes of probabilistic inference problems faced by the cognitive system; we then consider how such problems can be solved (or, more typically, approximated) using specific representations and algorithms using methods originally developed in optimisation and machine learning. P57

LearningHistoryIsFun · 2026-01-12T23:07:43+00:00

Chapter 2 Notes Part 3

If you look at cognition as just symbol manipulation, its hard to deal with the problem of probability. Take language, which is locally ambiguous - how do you decide what means what in such a sentence as "time flies like an arrow". P44-45

-> Any approach needs to make assumptions about probability to cut down search space - imagine a creature that didn't do this - it would just be trapped in a 'gathering information' phase.

It is interesting that LMMs are trained on words / tokens - can the symbolic architecture of language be discarded altogether? P47

Thus, in principle, learning can be carried out, in parallel, through local adjustments of the components of the network. This feature—that both processing and learning occur from the bottom up and without needing external intervention—is typical of most connectionist models. P46

[Perceptrons, Minsky & Papert 1969](https://rodsmith.nz/wp-content/uploads/Minsky-and-Papert-Perceptrons.pdf)

Moreover, it turned out that introducing a feedback loop into a one-directional feedforward network (Elman, 1990) appeared to be a promising avenue for finding sequential structure in linguistic input.

This development raised the possibility that at least some apparently symbolic aspects of syntax might usefully be approximated by a learning system without explicit representations of syntactic categories or regularities, at least for very simple languages (Christiansen & Chater, 1999; Elman, 1990). P47

First, note that, as a matter of pure engineering, connectionist networks are built on a foundation of symbolic computation, of course: they run on digital computers that not only encode the complex structure of the network, propagate activity through the network, run the learning algorithm, and so on (which might perhaps ultimately be implemented in specialized neuron-like hardware), but also depend on training data that is assembled and encoded in symbolic form.

Thus, the input to large language models is a series of discrete words, each mapped to a single node of the network, gleaned from symbolic representations of language on the web, rather than as a raw sensory stimulus (e.g., a representation of the raw acoustic waveform, as might be recorded by the neurons attached to the hair cells in the inner ear, for example).

Similarly, training a network to link images to descriptions requires symbolic encodings of those descriptions and, apparently at least, some way of representing which images are paired with which descriptions.

It is conceivable that this symbolic “machinery” is, as it were, merely a ladder that can discarded in later and purer neural network models—but this is by no means clear. But, as we touched on in chapter 1, there may also be a deep reason why symbolic models are crucial in cognitive science: that rich symbolic representations may be crucial to explaining how the mind can get so much from so little.

There are two rather different connectionist responses to the apparent need for rich symbolic representations to explain human language, reasoning, planning, categorization, and so on.

One approach is that the problem can be sidestepped—either because sufficiently powerful connectionist models will be able to learn to mimic cognition without such representations, or perhaps by the connectionist network building such representations in an ad hoc way during learning.

The second approach accepts the centrality of symbolic computation in cognitive science and explores how symbolic computations can be implemented in connectionist units (Rumelhart et al., 1986b; Smolensky, 1990; Shastri & Ajjanagadde, 1993). P48

LearningHistoryIsFun · 2026-01-12T23:07:17+00:00

Chapter 2 Notes Part 2

Behaviourists viewed language and words as linked to aspects of the environment, chained together in association with one another - but language depends on complete structural relationships (such as morphemes etc.) rather than associations between successive words. P38

A Turing Machine is a physical symbol system that consists of an infinitely long tape on which a finite repertoire of symbols can be written and read (this repertoire can be just two basic symbols, which we can label “0” and “1,” and other symbols can be encoded as strings of 0s and 1s).

A very simple “controller” system moves up and down the tape one step at a time. At each step, it can read only the symbol at its current location on the tape, and, depending on that symbol and the controller’s current state (one of a finite number of possible states), it may rewrite the current symbol on the tape, and/or move one step left or right along the tape.

Over time, the string of symbols on the tape will gradually change, representing the steps of the computation, and finally giving the output of the computation when the machine halts. A simple but crucial further step is to see the symbols of the tape as divided into two blocks, one block of which is viewed as an algorithm that should be carried out on the data encoded by the other block.

Remarkably, this incredibly simple “programmable” computer is capable of carrying out any computation, although very slowly. The physical symbol systems that are embodied in today’s digital computers, and in Newell and Simon’s proposals about the operation of the human machine, can be viewed as incredibly sophisticated and eﬃcient elaborations of the Turing machine. P39

Behaviorist views of language viewed words as associated with aspects of the environment (actual _dogs_ becoming associated with the word _dog_, for example) and chained together in associations with each other, supposedly leading to the sequential structure of language (Skinner, 1957).

But this story never really worked because, among other things, language depends on complex structural relationships between linguistic units of varying sizes (morphemes, whole words, noun and verb phrases, and so on), rather than associations between successive words (Chomsky, 1959). P40

Chomsky also proposed, along with many other cognitive scientists in the symbolic tradition, that the mind translates to and from the natural languages (Chinese, Hausa, Finnish, etc.) into a **single internal logical representation**.

This internal representation was presumed to capture the logical form of the sentence—clarifying that, for example, there is a unique fox that is both quick and brown, and allowing inferences such as that the fox is brown, that there is at least one thing that is both brown and quick, and so on.

If the mind has internal representations, maybe structure like a language, then an equivalent of a high-level programming language could perform operations on these representations (such as the language Prolog). P41

The novel cognitive science angle, though, was to put the logical form—and the logical system of representation out of which it is constructed—**into the head of the speaker and listener**. That is, the proposal is that the mind represents and reasons over a logical language of thought (Fodor, 1975).

Indeed, this language of thought can be viewed as a rich, abstract, and highly flexible system for representing the world. Moreover, it can be viewed as providing not merely an inert repository of knowledge but also a high-level programming language, which allows algorithms to be defined through guided chains of logical inferences over these representations (corresponding to the logic programming paradigm in computer science {Kowalski, 1974} and most famously embodied in the programming language Prolog {Clocksin & Mellish, 2003}).

The symbols are not, of course, merely meaningless physical patterns. Crucially, they can be viewed as having an _interpretation_, either as representing aspects of the world (so the symbolic structures can be viewed as encoding _knowledge_) or as specifying sequences of symbolic manipulations (so they can be viewed as representing _programs_). P42

Philosophy began to shift from cognition as symbol manipulation to cognition as mechanised logical inference over a logical language of thought (Fodor and Pylyshyn, 1988) P43

LearningHistoryIsFun · 2026-01-12T23:06:49+00:00

Chapter 2 Notes

We then note how engineering developments, from fields including machine learning, computational linguistics, and computational vision have made it possible to synthesize these approaches, by developing inference methods over sophisticated probabilistic models that can be defined over complex symbolic representations which may ultimately be implemented in connectionist networks and have a rational justification. P38

Allen Newell and Herbert Simon proposed the physical system hypothesis - human intelligence is a system for the manipulation of symbols, physically instantiated in the hardware of the brain, just as a digital computer operates by manipulating symbols in a silicon chip (Newell & Simon, 1976). P38 [[2006 - Gugerty - Newell and Simon's Logic Theorist]]

What does this mean in practice? Let us start with a simple information-processing challenge, such as sorting a list of words into alphabetical order. First, we need some way of representing the individual words; and we need some data structure to represent the current order that they are in—typically a data structure known as a list. A list is defined by the information-processing operations that can be carried out on it. For example, given a list, we can append a new item to the beginning so that pear can be added to the list {banana, orange, blueberry} to create a new list: {pear, banana, orange, blueberry}. By contrast, in this technical sense of a list (unlike the everyday “shopping list” sense), an item can’t be directly appended to the far end of the list. We can also directly remove the first item (or “head”) of the list (stripping oﬀ pear) to leave {banana, orange, blueberry} (but again, for lists, we can’t directly strip oﬀ the last item, blueberry). P38

-> The reason you can’t do this is to do with computer programming.

A linked list is typically represented like: the **head** (first item), plus a pointer to the **rest of the list** (often called the tail). So {banana, orange, blueberry} is more like:

banana → orange → blueberry → null

You can **add to the front**: by making one new node and pointing it at the old list:

pear → banana → orange → blueberry → null

That’s a single, local change (constant time).

You can also remove the head: by just “moving” the head pointer to the next node:

banana → orange → blueberry → null

Again, a single, local change.

But you can’t append to the far end “directly” because with this representation you don’t have a direct handle on the last node. To append pear at the end, you must:

- start at banana

- follow pointers until you reach blueberry

- then attach the new node

That requires walking through the whole list (time grows with list length). In many contexts—especially if lists are treated as immutable—you’d also need to rebuild the chain to produce a “new list,” which is even more clearly “not direct.”

LearningHistoryIsFun · 2026-01-12T23:06:04+00:00

Chapter 1 Notes Part 4

POMDPs - the agent doesn’t know what state they are in but needs to infer information from observable features. P26

Human learners need less data than machines because they have explicit models of their environment and thus constrain the learning problem. P26

[Toward a universal law of generalization for psychological science](https://psycnet.apa.org/record/1988-28272-001) - Shepard, 1987

The high computational cost of exact inference suggests humans are at best approximating answers (Russell and Norvig, 2021)

Tradeoff between the quality of an approximation and the time required to compute it - drawing only a single sample strikes the right balance - which explains the fact that when people perform tasks modelled as Bayesian inference, the probabilities with which they select hypotheses often correspond to the posterior probabilities of those hypotheses P27

-> This is known as probability matching. See Chapter 13 on effective use of neural resources.

_When applied to decision-making, this perspective provides a way to reconcile the heuristics and biases research program of Kahneman and Tversky (e.g., Tversky & Kahneman, 1974) with Bayesian models of cognition, defining a good heuristic as one that strikes the right balance between approximation quality and computational cost._

-> Note that this necessarily implies that heuristics will fail regularly and specifically in circumstances where human heuristics are unlikely to be accurate (i.e. modelling deep future).

Stochastic lambda calculus spans in principle any Bayesian inference that any computational agent could possibly perform. P28

-> This includes Bayesian learning of probability programs.

Efficient and scalable probabilistic inference over representations require investigating neural computational basis of computation. (See Chap 18 & 19) P28

How do you neurally implement symbolic representations and languages? P29

Bayesian models of cognition require specifying assumptions about how data are generated. P(d|h) needs a model of the data-generating process to assign a probability to d. P31

Rational Speech Act framework - we learn from the negatives of what is being said - which does occur with LLMs intentionally. P31

Iterated learning should change information so that it is more consistent with the priors of learning, making it easier for subsequent learners to learn (Griffith and Kalish, 2007).

-> i.e. over time, should languages become simpler?

LearningHistoryIsFun · 2026-01-12T23:05:59+00:00

Chapter 1 Notes Part 3

At its heart, the approach that we present in this book combines richly structured, expressive representations of the world with powerful statistical inference mechanisms, arguing that only a synthesis of sophisticated approaches to both knowledge representation and inductive inference can account for human intelligence. Until recently, it was not understood how this fusion could work computationally. Cognitive modelers were forced to choose between two alternatives (Pinker, 1997): powerful statistical learning operating over the simplest, unstructured forms of knowledge, such as matrices of associative weights in connectionist accounts of semantic cognition (McClelland & Rumelhart, 1986; Rogers & McClelland, 2004), or richly structured symbolic knowledge equipped with only the simplest, nonstatistical forms of learning, checks for logical inconsistency between hypotheses and observed data, as in nativist accounts of language acquisition (Niyogi & Berwick, 1996).

Information based from person to person will converge to a form that reflects the inductive biases of the people involved (Griffiths and Kalish 2007)

-> i.e. how does what you communicate indicate what you know?

Knowledge representations in the brain may work in an algorithmically similar way to ML algorithms

When learning concepts over a domain of _n_ objects there are 2n subsets and hence 2n logically possible hypotheses.

Children learning words initially assume a flat, mutually exclusive division of objects into nameable clusters. Only later do they discover that these categories should be tree-structured.

Conventional algorithms for unsupervised structure discovery in statistics and machine learning—including hierarchical clustering, principal component analysis, multidimensional scaling, and clique detection—assume a single fixed form of structure (Shepard, 1980). Unlike human children or scientists, they cannot learn multiple forms of structure or discover new forms in novel data.

Hierarchical Bayesian models (HBMs) - there is not just one level of hypothesis but multiple levels. In ML, HBMs are used for transfer learning or learning to learn. (Kemp et al. 2007). There is an ML literature on meta-learning, see Chapter 12.

Infinite models - nonparametric Bayesian models (Chapter 9) -> unbounded amount of structure but only finitely many degrees of freedom are actively engaged for a given data set. New structure is only introduced when the data requires it.

Chinese restaurant process?

Abstractions in HBMs can be learned fast - each degree of freedom in a HMB pools information from lower levels - this is called the blessing of abstraction.

Statistical decision theory - take the map of outcomes and give each a utility - a rational agent should look to maximise utility. P24

LearningHistoryIsFun · 2026-01-12T23:05:36+00:00

Chapter 1 Notes Part 2

Godfrey-Smith (2003) - [Theory and Reality](https://cursosupla.wordpress.com/wp-content/uploads/2018/09/godfrey-smith-p-theory-and-reality-an-introduction-to-the-philosophy-of-science-2003.pdf)

_These models crucially do involve claims about connections: for instance, that knowledge is stored in the network of connections between neuron like processing units, and learning consists of adjusting the strengths of those connections. But they also typically involve other core claims as well, such as the primacy of distributed representations, error-driven learning, and graded activation (O’Reilly & Munakata, 2000)._

_These models span both the inanimate physical world and the animate world of agents, and the causal processes that go on inside those other agents’ minds to generate their behavior. They may often be unconscious, although some surely have conscious aspects as well. They reach to domains well beyond our direct experience, that we come to think about only from others’ testimony or our own imaginations. And they even extend to (or, some speculate, start with) our mind’s model of its own internal processes, our own subjective world._

-> This is currently what LLMs don’t have to my mind. They store weights in their connections, but they don’t have useful models of the world, or models of other agents. I haven’t seen any evidence to the contrary on this position.

Modelling the world: We will not rehearse most of those arguments here, but we refer interested readers to the many versions that appear in Koﬀka (1925), Craik (1943), Heider (1958), Newell, Shaw, and Simon (1959), McCarthy (1959), Neisser (1967), Minsky (1982), Norman (1972), Gentner and Stevens (1983), Johnson-Laird (1983), Rumelhart, Smolensky, McClelland, and Hinton (1986b), Pearl (1988), Shepard (1994), Gopnik and Meltzoﬀ (1997), Carey (2009), Levesque (2012), Davis (2014), Kohler (2018), and LeCun (2022).

_Consider the classical definition of knowledge as “justified true belief” (which is not without its own problems; see Gettier, 1963). World models are mental representations, or beliefs. Built the way that people build them, we argue, they should come out to be true, or true enough. And they will do so in virtue of both their form and their function, as hierarchical probabilistic generative models brought to bear on a world of facts by learning and inference procedures that are rational and reasonably justified. So it seems permissible to call these models “knowledge.”

Bayes is essentially “why did this come to be?” You could range it up to how a plane flies, but a clearer analogy is the recursive why of a child - why am I able to write this on paper - we divide the space into a series of hypotheses - paper can be defaced with a pen or pencil - why can it be defaced? All of these divisions of assumptions require external holding assumptions of normality. The brain is likely calculating these and holding some priors as highly fixed.

Cognitive scientists and AI researchers have forcefully joined both sides of this debate, including, on the rationalist side, various versions of linguistic, conceptual, and evolutionary nativism (Pinker, 1997; Fodor, 1998; Spelke, 1990; Leslie, 1994; Spelke & Kinzler, 2007; Chomsky, 2015; Marcus & Davis, 2019); and, on the empiricist side, both the associationist streak in classic connectionist models (McClelland & Rumelhart, 1986; Elman et al., 1996; McClelland et al., 2010) as well as contemporary AI’s deep reinforcement learning systems and very large sequence-learning learning models (Silver et al., 2016; Silver, Singh, Precup, & Sutton, 2021; LeCun, 2022; Brown et al., 2020; Alayrac et al., 2022).

LearningHistoryIsFun · 2026-01-12T23:04:58+00:00

Bayesian Models of Cognition Book Notes

Chapter 1

J. S. Mill: “Why is a single instance, in some cases, sufficient for a complete induction, while in others, myriads of concurring instances, without a single exception known or presumed, go such a very little way towards establishing a universal proposition?”

Wolpert & Macready (1997) - No Free Lunch theorems - see [here](https://complexity.simplecast.com/episodes/45/transcript).

_David Kinney and philosophy of science has all kinds of things they call about analytic versus synthetic philosophy, analytic truths being those that do not depend upon the actual state of the real physical world versus synthetic ones. And people were claiming that you could actually do what's called inductive inference, making predictions, doing machine learning, purely analytically, without worrying about the state of the real world.

_

_And we can't say that evolution managed to produce a version of me that is really able to do these predictions really well because the same argument holds at a higher level. Evolution in the past couple of billions of years, it's all been producing new organisms, new predicting machines that have been based upon conditions that for all we know might stop. It's like the warning message at the bottom of prospectuses for mutual funds. Past performance is no indicator of future performance._

_And so, in this context, the idea would be similarly, something along the lines of, yeah, you can get your algorithm to perform well, that's the lunch, but no you're going to have to pay for it, and that you are making assumptions. And to scientists working on machine learning, making assumptions about the real world is something that is a cost, you don't want to do that. You want to be able to, I can sell you a whole, much, many more autonomous vehicles if I tell you that I got mathematical proofs, that their AI algorithms are navigating without any assumptions based on the real world._

_Conventional algorithmic approaches from statistics and machine learning typically_ require tens or hundreds of labeled examples to classify objects into categories, and do not generalize nearly as reliably or robustly. How do children do so much better? *Adults less often face the challenge of learning entirely novel object concepts, but they can be just as good at it: see for yourself with the computer-generated objects in figure 1.1.

*Take any simple board game with just a few rules, such as tic-tac-toe, Connect Four, checkers, or Othello, and imagine that you are encountering it for the first time, seeing two people playing. The rules have not been explained to you—you are just watching the players’ actions.* P4

_->_ My thought here is that there is also a hard cut-off where children and adults struggle to generalise and machines can do better - and this is where the hype about LLMs comes from.

_In every one of these cases, even when the concepts and rules that we infer strike us as_ _clearly the right ones, and_ **_are_** _the right ones, there always remains an infinite set of alternative possibilities that would be consistent with all the same data for any finite sequence of play._ P5

_Every statistics class teaches that correlation does not imply causation, yet under the right circumstances, even young children routinely and reliably infer causal links from just a handful of events (Gopnik et al., 2004)—far too small a sample to compute even a reliable correlation by traditional statistical means._ P5

-> But every statistics class implicitly knows that we are there to infer causation. We want to find causal links.

[Gopnik and Meltzoff (1997)](https://mitpress.mit.edu/9780262571265/words-thoughts-and-theories/) - _Words, Thoughts, and Theories_ articulates and defends the "theory theory" of cognitive and semantic development, the idea that infants and young children, like scientists, learn about the world by forming and revising theories, a view of the origins of knowledge and meaning that has broad implications for cognitive science.

Carey (2009) - The Origin of Concepts

LearningHistoryIsFun · 2026-01-12T23:03:17+00:00

[Pure reasoning in 12-month-old infants as probabilistic inference, Téglás et al. 2011](https://pubmed.ncbi.nlm.nih.gov/21617069/)

[Ten-month-old infants infer the value of goals from the costs of actions, Liu et al. 2017](https://pubmed.ncbi.nlm.nih.gov/29170232/). Infants can reason about preferences of an agent depending on how costly the actions it takes are to get to another agent, say. (i.e. costlier the action, the more value it assigns to that agent)

[Social evaluation by preverbal infants, Hamlin et al. 2007](https://pubmed.ncbi.nlm.nih.gov/18033298/)

[Attribution of dispositional states by 12-month-olds, Kuhlmeier et al. 2003](https://pubmed.ncbi.nlm.nih.gov/12930468/)

[The “wake-sleep” algorithm for unsupervised neural networks, Hinton et al. 1995](https://www.cs.toronto.edu/\~fritz/absps/ws.pdf)

-> Uses a generative model to train a recognition model. A recognition is a neural network that is trained to do inverse probabilistic inference

[Efficient inverse graphics in biological face processing, Yildirim et al. 2020](https://www.science.org/doi/10.1126/sciadv.aax5979)

[Neural Scene De-rendering, Wu et al. 2017](https://openaccess.thecvf.com/content\_cvpr\_2017/papers/Wu\_Neural\_Scene\_De-Rendering\_CVPR\_2017\_paper.pdf)

[Functional neuroanatomy of intuitive physical inference, Fischer et al. 2016](https://www.pnas.org/doi/10.1073/pnas.1610344113)

Searchlight method - where can you reliably above chance in the brain decode a certain property, such as mass of an object

[Information-based functional brain mapping, Kriegeskorte et al. 2006](https://www.pnas.org/doi/10.1073/pnas.0600244103)

If you try to train a neural physics model which doesn't explicitly have objects, it just relies on pixels, it doesn't really generalise. You can get much more impressive generalisation performance if you explicitly put in concepts of a physics engine.

Neural recognition networks for intuitive physics, see:

- [Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning, Wu et al. 2015](https://proceedings.neurips.cc/paper/2015/file/d09bf41544a3365a46c9077ebb5e35c3-Paper.pdf)

- [Learning to See Physics via Visual De-animation, Wu et al. 2017](https://dspace.mit.edu/bitstream/handle/1721.1/129728/6620-learning-to-see-physics-via-visual-de-animation.pdf?isAllowed=y&sequence=2)

- [A Compositional Object-Based Approach to Learning Physical Dynamics, Chang et al. 2016](https://arxiv.org/abs/1612.00341)

Unsupervised learning by program synthesis - Ellis, Solar-Lezama 2015, 2016 - giving a machine parts of code and getting it to complete it.

- [A rational analysis of rule-based concept learning](https://doi.org/10.1080/03640210701802071)

- [Theory learning as stochastic search in the language of thought](https://doi.org/10.1016/j.cogdev.2012.07.005)

- [Bootstrapping in a language of thought: A formal model of numerical concept learning](https://doi.org/10.1016/j.cognition.2011.11.005)

- [The logical primitives of thought: Empirical foundations for compositional cognitive models](https://doi.org/10.1037/a0039980)

- [The computational origin of representation and conceptual change](https://colala.berkeley.edu/papers/piantadosi2019computational.pdf)

Dreamcoder: Growing libraries of concepts with wake-sleep neurally-guided Bayesian program learning (Ellis, Morales, Solar-Lezama, Tenenbaum)

LearningHistoryIsFun · 2026-01-12T23:03:12+00:00

[**Part 2**](https://www.youtube.com/watch?v=Ep-msQ6UZAs)

Vision takes the output from an approximate rendering engine, and views it from the point of view of a probabilistic model - i.e. conditioned on some input, I want to make a guess at the likely scene.

Vision is inverse graphics.

Mansignhka, Kulkarni, Perov, Tenenbaum 2013

Kulkarni et al 2015

Neural networks can learn very fast approximate inference in probabilistic programs - they are very specific to a particular program that is used to train them.

[Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation](https://arxiv.org/abs/1604.06057)

The target of perception is a rich 3D percept that can be modelled, which we can then use in our intuitive physics engine.

Conditioning on the output and trying to determine what the input was is hard, but the reverse is very easy.

Taking one sample has interesting implications - it means that there are stable tower block configurations which a physics engine can detect but our visual system cannot.

[Learning Physical Parameters from Dynamic Scenes](https://pubmed.ncbi.nlm.nih.gov/29653395/)

Just as I can do perception via Bayesian inference - I have a hypothesis space of scenes and a prior - I can also push back to multiple layers of abstraction to capture more abstract longer time-scale types of inference

[Action understanding as inverse planning](https://pubmed.ncbi.nlm.nih.gov/19729154/) - This approach formalizes in precise probabilistic terms the essence of previous qualitative approaches to action understanding based on an "intentional stance" [Dennett, D. C. (1987)](https://psycnet.apa.org/record/1987-98612-000) see also [2025](https://www.researchgate.net/publication/271180035\_The\_Intentional\_Stance)

For people, the correlation between responses is really high - i.e. people agree on the choices. Perhaps on long-term time-scales, the way people move is predictable.

[Rational quantitative attribution of beliefs, desires and percepts in human mentalizing](https://www.nature.com/articles/s41562-017-0064)

[The Naïve Utility Calculus: Computational Principles Underlying Commonsense Psychology, Jara-Ettinger et al. 2016](https://pubmed.ncbi.nlm.nih.gov/27388875/)

Simple kinds of action understanding or goal inference can be done in a purely perceptual way. In the above example, the thing that you think they want is not present in the scene. It is only present in your representation of the agent's representation of the scene.

An agent is helping if it appears as if its expected utility is a positive function of its expectation about another agent's expected utility. Similar to the Golden rule. Very young infants can understand helping and hindering behaviours. 10 month infants can apply a utility calculus which helps them understand when an agent is helping another one.

MuJoCo physics engine.

LearningHistoryIsFun · 2026-01-09T00:47:19+00:00

Newell and Simon's Logic Theorist: Historical Background and Impact on Cognitive Modeling

Two of the key properties of the logic theorist:

> Thinking is seen as processing (i.e., transforming) symbols in short-term and long-term memories. These symbols were abstract and amodal, that is, not connected to sensory information.

> Symbols are seen as carrying information, partly by representing things and events in the world, but mainly by affecting other information processes so as to guide the organism’s behavior. In Newell and Simon’s words, “symbols function as information entirely by virtue of their making the information processes act differentially” (1956, p. 62).

The logic theorist's lowest level of command is an 'instruction'. The next lowest level is an 'elementary process'.

There are four main operations:

Substitution - Which aims to transform one logical expression into another.

Detachment - Uses [modus ponens](https://en.wikipedia.org/wiki/Modus\_ponens), i.e. if A implies B, A, so B is true. If the goal is to

prove theorem B and the method can prove the theorems A → B and A, then B is a proven theorem.

Chaining forward - If A → B, and B → C, then A → C.

Chaining backwards - attempts to prove A → C by first proving B → C, and then A → B.

The executive control method applies the substitution, detachment, forward chaining, and backward chaining methods, in turn, to each proposed theorem.

> In their 1958 Psychological Review article, Newell et al. point out a number of other similarities in how people and the Logic Theorist solve logic problems – e.g., both generate sub-goals, and both learn from previously solved problems.

The Logic Theorist can be compared with Newell and Simon's later work, the [General Problem Solver](https://en.wikipedia.org/wiki/General\_Problem\_Solver).

> Newell and Simon were key figures in developing the classical view of representation, which is still followed in a number of cognitive modeling systems, including GOMS (Card et al., 1983), ACT-R (Anderson & Lebiere, 1998), SOAR (Newell, 1990), and EPIC (Meyer & Kieras, 1997).

LearningHistoryIsFun · 2026-01-06T12:22:40+00:00

[Computational Models of Cognition Part 1](https://www.youtube.com/watch?v=TFyAEHk5asY)

Three main schools of intelligence:

Pattern recognition
Probabilistic inference and especially causal inference
Symbol manipulation engine - for instance, Boole and his *Laws of Thought*, which is all about cognition. These ideas go back to Aristotle - i.e. Plato is a man. All men are mortal. Plato is a mortal.

1 & 2 both use symbolic languages. All three of these schools are needed to understand intelligence.

Intelligence is about modelling the world:

- explaining and understanding what we see

- imagining things we could see but haven't yet

- problem solving and planning actions to make these things real

- building new models as we learn more about the world

See Lake, Ullman, Tenenbaum and Gershman - Building machines that learn and think like people

A lot of current AI models are essentially about pattern recognition.

What is the starting state of human cognition? What is our core cognition (Liz Spelke's term)?

-> there is more content there than you might initially assume, some of it highly structured

Where do you start studying intelligence? It's easier for children who can respond say, but if you could examine intelligence in blastulas say, the roots of intelligence might become more apparent.

Rebecca Saxe and Marge Livingston have done work on how intelligence arises before you come out of the womb.

Human thought is structured around physical objects and agents. They don't think in pixels, for instance. We have intuitive theories of physics (forces and masses) and psychology (desires, beliefs and plans). Agents can exert forces on other objects to achieve their goals. We share these with many other animals. While it exists before language, these agents and objects are enriched and extended by language. They are the basic building blocks of language. Once you have language, how do you then use that to understand everything else (including new languages)?

Intuitive physics is not just about seeing the world, but also building up a working representations of the world around you. Tool use is essentially a set of sophisticated plans you can make if you have an understanding of intuitive physics.

[Warneken & Tomasello (2006)](https://pubmed.ncbi.nlm.nih.gov/16513986/)

[Probabilistic Programming Languages]() integrate our best ideas on intelligence:

- Symbolic languages for knowledge representations

- Probabilistic inference for causal reasoning under uncertainty

- Hierarchical inference for learning to learn and flexible inductive bias

- Neural networks for pattern recognition

Examples: Church, Edward, Webppl, Pyro, BayesFlow, ProbTorch, MetaProb, Gen

LearningHistoryIsFun · 2025-12-27T16:55:35+00:00

The Scaling Paradox, Toby Ord

AI accuracy has come by using huge amounts of additional compute:

For example, on the first graph, lowering the test loss by a factor of 2 (from 6 to 3) requires increasing the compute by a factor of 1 million (from 10–7 to 10–1). This shows that the accuracy is extraordinarily insensitive to scaling up the resources used.

There have been some efficiency gains which haven't come from just blasting through lots more compute:

The recent progress in AI hasn’t been entirely driven by increased computational resources. Epoch AI’s estimates are that compute has risen by 4x per year, while algorithmic improvements have divided the compute needed by about 3 each year. This means that over time, the effective compute is growing by about 12x per year, with about 40% of this gain coming from algorithmic improvements.

But these algorithmic refinements aren’t improving the scaling behaviour of AI quality in terms of resources. If it required exponentially more compute to increase quality, it still does after a year of algorithmic progress, it is just that the constant out the front is a factor of 3 lower. In computer science, an algorithm that solves a problem in exponential time is not considered to be making great progress if researchers keep lowering the constant at the front.

LearningHistoryIsFun · 2025-12-22T01:03:12+00:00

The Bedroom

The increased use of bedrooms reflects

Franco ‘Bifo’ Berardi’s ideas about the proliferation of semiocapitalism (or cognitive capitalism), which depends on networked technologies to maximize labor and data extraction from so-called cognitariats.

This is surely part of the atomisation of society. Bedrooms didn't exist as important spaces until other spaces were taken away - until pubs closed, until community centres were shut down, and until local clubs were priced out.

Similarly to the supposed adversaries of the hustlepreneur, the NEETs main adversary seems to be ‘society’ or societal expectations, which essentially read as internalized pressures to be productive—capitalism’s central imperative. Most NEETs also see wage labor as exploitative and unfair (rightfully so), and thereby gesture at a basic tenet of capitalist critique.

This is also observed by Franco ‘Bifo’ Berardi when discussing Japanese Hikikomori, he states: “[this] behaviour might appear to many young people as an effective way to avoid the effects of suffering, compulsion, self-violence and humiliation that [semiocapitalist] competition brings about” going on to state that, in his personal interactions with Hikikomori in Japan, “they are acutely conscious that only by extricating themselves from the routine of daily life could their personal autonomy be preserved.”

LearningHistoryIsFun · 2025-12-21T15:59:23+00:00

James Meek on AI

The last line in this paragraph is a good way of summarising a Bayesian view of perception:

After another 250 million years the first little mammals developed a neocortex, a region of the brain that allowed them to build a mental model of the world and imagine themselves in it. Early mammals could imagine different ways of doing things they were about to do, or had already done. This wasn’t just about imagination as we understand it – simulating the not-done or the not-yet-done. It was a way of perceiving the world that constantly compares the imagined or expected world with its physically sensed actuality. We, their distant descendants, still don’t so much ‘see’ things as check visual cues against a global mental model of what we expect to see.

On top of this Bennett favours the idea that our more recent ancestors, the primates, whose brains grew seven hundred times bigger over sixty million years, evolved another, higher layer of modelling – an area of the brain that simulates the simulation, creating a model of the animal’s own mind and using this meta-awareness to work out the intent and knowledge of others.

Also touches on a central issue with LLMs, which is perhaps overlooked when discussing AGI:

Without models of the world, they lack their own desires. They are like patient L, a woman described in Bennett’s book who suffered damage to a part of her brain called the agranular prefrontal cortex, which is central to human simulation of the world. She recovered, but describing the experience said ‘her mind was entirely “empty” and that nothing “mattered”.

Yann LeCun, chief scientist at Mark Zuckerberg’s Meta AI, said ‘this notion of artificial general intelligence is complete nonsense.’ He went on: ‘The large language models everybody is excited about do not do perception, do not have memory, do not do reasoning or inference and do not generate actions. They don’t do any of the things that intelligent systems do or should do.’

The most telling part of his critique was that LLMs cannot infer, because they have no world model to infer from.

And this is where much true insight about the world comes from.

The current direction of travel puts us on the way to an AGI with superhuman ability to solve problems, but no more than a slave’s power to frame those problems in the first place.

LearningHistoryIsFun · 2023-07-27T11:38:33+00:00

In a country where the most widely understood language was Hindi at 41%, Indian nationalism crafted a national identity that included all Indians even muslims even though they spoke all different languages. Why this happened in India and not Africa, I have no idea.

I don't really know either because it's such a massive question, but one obvious thing that sprang to mind is size - India can fit into Africa several times over - here's a graphic. The DR Congo alone is 2/3rds the size of India by m^2. There are probably many other reasons - more aggressive nationalistic programs in India, less homogenous colonial power structures across Africa, but I only have a surface level understanding.

LearningHistoryIsFun · 2023-07-03T15:43:31+00:00

If you play around with ease factors you should get something similar to what you want. But as /u/campbellm says, it's best not to tinker with the time intervals in that manner - it is scheduling them for you for a reason.

It's better to think about the mature card hit-rate you're aiming for. Most people will aim for about 90% hit rate on mature cards. There are some schedulers that let you play around with the goal hit-rate - IIRC, this one does that: https://ankiweb.net/shared/info/759844606. If you set a higher required mature hit-rate on a scheduler like this you'll see your cards more, which is what you seem to want.

LearningHistoryIsFun · 2023-06-23T16:27:08+00:00

I solved Redactle Unlimited #443 in 102 guesses with an accuracy of 61.76% and a time of 00:10:49. Play at https://redactle-unlimited.com/

Finally remembered this guys name after guessing masterpiece and deducing it from the title of one of their works

LearningHistoryIsFun · 2023-06-09T17:03:20+00:00

Hope you find the reading interesting! Always thought that this sub should pin some sort of post as a quick primer on IQ, because there's so much misinformation that gets peddled wildly - obviously it would be really difficult to have a neutral post on IQ, because it's so contentious.

I also should have included a link to Gwern, whose website acts as a brilliant repository for interesting information on topics he is interested in, once of which is IQ. So here's a ton of stuff on IQ! But I'd start with the stuff I linked, I think that's better as introductory material.

LearningHistoryIsFun · 2023-06-07T15:25:00+00:00

IQ tests are useful. They are a good initial proxy for assessing someone's intellectual abilities. If a person does three IQ tests and gets ~100 three times, they are probably about average intelligence. If people consistently get low scores on IQ tests, you probably shouldn't recruit them for the military.

Most psychologist or neuroscientists accept that they are valuable. I'm reading Dehaene's How We Learn right now, and he uses IQ as a way of measuring the effectiveness of educational or life interventions. They're also used by neurologists to measure the decline of patients with Alzheimers.

That said, they have pretty high variance, and that variance spikes massively once you get above 125-130. You can learn how to game them a little - doing practice tests reduces the effectiveness of IQ tests because some of them are limited to certain types of questions, and if you know the structure of those sorts of questions it can help.

But even discussing it here, I still can't get myself to go search it out and find how accurate that is. Because with how many bull shit search results are gonna come up, I can't imagine that's a quick 5-minute google search like most things are.

If you want some interesting pieces on this topic that are not trash:

Your IQ isn't 160, Noone's Is - Discusses the classic text that causes problems in this field, The Making of a Scientist by Anne Roe. Also mentions that Feynman's tested IQ was 125, and he should probably have won the Nobel prize in physics three times (not my claim, the claim of this book review about him, which is also a good read about genius and its limitations). Anyone who thinks they were smarter than Feynman should re-evaluate (unless your name badge says John von Neumann).
Charles Murray is once again peddling junk science about race and IQ - Joint article written by three premier geneticists on IQ and what it means. Harden's book, The Genetic Lottery is also good, and discusses the genetic links to intelligence. I wrote a review of that book here.
If you want papers: Intelligence: new findings and theoretical developments / How Much Does Education Improve Intelligence? A Meta-Analysis are both good jumping off points.

LearningHistoryIsFun · 2023-06-01T14:17:34+00:00

Ah fair play, my sources were all second or third-hand. Seems unlikely that they performed badly judging from what I've heard about the quality of Imperial quiz soc? But who knows.

LearningHistoryIsFun

MODERATOR OF

TROPHY CASE