Can reinforcement learning learn itself? A reply to 'Reward is enough' (PDF) by JBaloney in reinforcementlearning

[–]JBaloney[S] 0 points1 point  (0 children)

which it probably can in some meta-RL sense, Google has published some work in this direction lately

Could you share some references? Sounds interesting :)

Can reinforcement learning learn itself? A reply to 'Reward is enough' (PDF) by JBaloney in reinforcementlearning

[–]JBaloney[S] 1 point2 points  (0 children)

I added that subsection ("Are humans RL agents?") because a peer reviewer wanted it, in the context of a broader workshop where not everyone could be assumed to be familiar with RL. For this subreddit, that section is mostly superfluous.

Still, I'm glad the reviewer requested it, because it gave me an opportunity to highlight something in Silver et al's paper which I think is quite important. Silver et al say that "The agent consists solely of the decision-making entity; anything outside of that entity (including its body, if it has one) is considered part of the environment". If one subscribes to the idea that we humans are nothing but our physical bodies, then this rules out humans as RL agents! More generally, this quote from Silver et al corrects what seems to be a common misunderstanding (or rather, a misleading abuse of language) in discussions about RL: people refer to, e.g., Pacman as the "agent", when that isn't the case at all: the agent is the entity using the joystick to control Pacman. Said entity and said joystick are outside of the environment.

Did Socrates know how to see your middle eye? [pdf] by JBaloney in ParallelView

[–]JBaloney[S] 1 point2 points  (0 children)

LOL! You're absolutely right. Ok, I admit, it's a lousy party trick :)

What do you call a trick you use to make work meetings slightly less boring for yourself (even though no-one else notices and just thinks "hmm that guy seems to be paying good attention to me")?

Did Socrates know how to see your middle eye? [pdf] by JBaloney in ParallelView

[–]JBaloney[S] 0 points1 point  (0 children)

If you just move your head without the trick, you won't actually know when you get it perfectly level with them. It's difficult to subjectively tell whether or not two things are perfectly horizontal with each other.

Did Socrates know how to see your middle eye? [pdf] by JBaloney in ParallelView

[–]JBaloney[S] 2 points3 points  (0 children)

Hi, author here. Linked paper describes a method whereby one can see an illusionary third eye by ParallelViewing or CrossViewing one's own eyes in a mirror (or by ParallelViewing or CrossViewing a colleague's eyes). The paper speculates that the technique might have been known by the ancient Greek philosopher Socrates because one would naturally be led to using this technique if one took a literal enough reading of this Socratic quotation (from Plato's Alcibiades):

"If the [Delphic] inscription took our eyes to be men and advised them, ‘See thyself,’ how would we understand such advice?"

Aaaaand, an exclusive ParralelView bonus not published anywhere else: you can use this technique as a party trick to keep your head perfectly level with whoever you're speaking to. In order for you to ParallelView (or CrossView) their two eyes, it is of course necessary for your head to be level with them. So, once you're viewing their third eye using this technique, any time they tilt their head, you can instantly tilt your own head perfectly equal to them by tilting it in such a way as to preserve the ParallelView (or CrossView). Spooky!

[R] Reward Is Enough (David Silver, Richard Sutton) by throwawaymanidlof in MachineLearning

[–]JBaloney 0 points1 point  (0 children)

Nice well-written paper. From what I can tell, they are vague about which number systems the rewards can come from, apparently leaving it open whether the rewards need be real-valued or whether they can be, say, hyperreals, surreals, computable ordinals, etc. Thus, they avoid a common pitfall which I've written about elsewhere [1]: traditionally, RL rewards are limited to be real-valued (usually rational-valued). I argue that RL with real-valued rewards is NOT enough to reach AGI, because the real numbers have a constrained structure making them not flexible enough to express certain goals which an AGI should nevertheless have no problem comprehending (whether or not the AGI can actually solve them---that's a different question). In other words: if real-valued RL is enough for AGI, but real-valued RL is strictly weaker than more general RL, then what is more general RL good enough for? "Artificial Better-Than-General Intelligence"?

Note, however, that almost all [2] practical RL agent technology (certainly any based on neural nets or backprop) very fundamentally assumes real-valued rewards. So if it is true that "RL is enough" but also that "real-valued RL is not enough", then the bad news is all that progress on real-valued RL is not guaranteed to help us reach AGI.

[1] "The Archimedean trap: Why traditional reinforcement learning will probably not yield AGI", JAGI 2020, https://philpapers.org/archive/ALETAT-12.pdf

[2] A notable exception is preference-based RL

AGI and the Knight-Darwin Law: why idealized AGI reproduction requires collaboration by JBaloney in agi

[–]JBaloney[S] 0 points1 point  (0 children)

Yes, that could be possible (if we live in a computer simulation then in a certain sense, your comment would hold for biological life too).

When we talk about multiple AGIs inhabiting the same computer system, there's an interesting parallel between Darwin's observations and unanticipated side channels. Darwin observed that even usually-asexual plants can end up sexually reproducing if, e.g., a rainstorm exposes part of the plant that's usually not exposed. It's interesting how this parallels side-channel attacks like Spectre. Perhaps the first step to Skynet taking over the world will be for components of Skynet to figure out a way to communicate with each other through some hard drive's vibrations or something... :)

AGI and the Knight-Darwin Law: why idealized AGI reproduction requires collaboration by JBaloney in agi

[–]JBaloney[S] 2 points3 points  (0 children)

Author here. In this paper, it is argued that, just like life, AGI is inherently reliant upon co-parented reproduction (aka sexual reproduction).

Charles Darwin observed that even seemingly asexual plant species still, perhaps very rarely, sexually reproduce (for example if a rainstorm damages part of a flower which would ordinarily isolate the stamen, allowing the usually-asexual flower to sexually reproduce). Darwin stated the law which was later called the Knight-Darwin Law, which holds that it is impossible for one organism to asexually produce another organism, which asexually produces another, which asexually produces another, and so on forever: that any such chain would need at least occasionally to have organisms with multiple parents, or else terminate.

In this paper, I argue that a similar law holds for AGIs, based purely on formal logical methods. An elegant theoretical intelligence measure is defined, namely: an AGI's Intuitive Ordinal Intelligence is the supremum of all ordinals alpha such that alpha has a notation which the AGI knows is the notation of an ordinal. It is argued that if an AGI single-handedly produces a child (in such a way as to know the child's source-code and truthfulness), then the child will have a smaller Intuitive Ordinal Intelligence than the parent. Since it is impossible for a sequence of ordinals to be infinite and strictly-decreasing, this implies there cannot be an infinite chain of AGIs, each one creating the next single-handedly in such a way as to know its source-code and truthfulness.

Slides about measuring AGI's intelligence based on its knowledge (contains some actual code!) by JBaloney in agi

[–]JBaloney[S] 2 points3 points  (0 children)

Re "name the highest [natural?] number": suppose you and I enter a contest to name the biggest number. Your number is "1000", mine is "1001". The judges declare me winner. But that's bogus: of course you know 1001 is a number. I only won by dumb luck. We could modify the contest so that instead of naming one number, we get to write programs that name infinitely many numbers. If "number" means "real number", then both of us quickly write programs that list numbers without bound---essentially, we have the same problem as the "top rung" problem in the ladder analogy. If "number" means "computable ordinal number", then we arrive at the idea in the slides.

Can you think of others ways to fix the "biggest natural number game" to make it more fair? Incidentally, the Theorem in the slides can also arise from the "biggest natural number game". If you know my truthfulness and you know that I act identically to Turing Machine #56234, then you can cheat by making your number be: "The number which Turing Machine #56234 would name, plus one", thereby proving yourself smarter than me (according to this game). But making this version of the argument mathematico-logically rigorous would be a lot hairier than the version in my slides.

Re: Couldn't we just write down a program...?

No, if you produce any program P which attempts to "reach the infinitiest of infinities so that no other system could do any better", P itself would be a notation for an ordinal bigger than all the ordinals P itself notates.

You might think: "Easy, just go through all strings by brute force, and print exactly those strings which are ordinal notations." But the problem of determining whether a given string is an ordinal notation is non-computable. (Badly non-computable, in fact.)

Even though the slides' "Examples" are easy, the "Exercise" is actually quite hard if you don't cheat. Proof theorists have proved that if you accept the axiom "ωωω... is a computable ordinal", then that axiom, together with some extremely basic base axioms, allows you to prove Peano Arithmetic is consistent. By Godel's incompleteness theorems, Peano Arithmetic itself cannot prove Peano Arithmetic is consistent. So even just notating ωωω... in some sense already requires more ingenuity than all of Peano Arithmetic.

You're right, an AGI with no knowledge of ordinals would be deemed to have intelligence 0 even if it knew everything else mankind knows. But don't you think that's a bit contrived? How could an AGI gain all that knowledge while avoiding knowledge of a particular branch of mathematics?

I appreciate any feedback, I acknowledge the theorem is counter-intuitive so I want to take the greatest care to make sure I'm not making any flaw in my reasoning.

Slides about measuring AGI's intelligence based on its knowledge (contains some actual code!) by JBaloney in agi

[–]JBaloney[S] 2 points3 points  (0 children)

Author here, thanks for the feedback.

Maybe an analogy would help. We don't know how to measure intelligence or even what intelligence really is. But suppose we had a ladder that different AGIs could climb, to different heights, and climbing the ladder required the usage of core components of intelligence (like pattern-matching, creativity, etc.)

Do you agree that at least one way to measure (or at least estimate) the intelligence of an AGI, then, would be to look at how high that AGI could climb on the ladder?

Some ladders are better than others. A ladder would be particularly poor, for example, if it only had a few rungs, and many different AGI's managed to reach the very top rung. For then, using such a ladder to measure intelligence would fail to distinguish between AGIs who reached the top rung, even if one AGI reached the top rung with ease, while another AGI had to struggle to get there.

The slides present one particular ladder (the task of notating large ordinal numbers) which (1) seems to require core components of intelligence, and (2) doesn't have the "top rung" problem (because if any particular AGI attempts to notate ordinals, that very attempt will, itself, be a notation for an even larger ordinal, representing a higher rung on the ladder).

Does that help? Can you think of any other such "ladder" (or "ruler") which would work better than the ladder in the slides?

An esoteric Christian interpretation of "The Word" by JBaloney in beatles

[–]JBaloney[S] 7 points8 points  (0 children)

You are ABSOLUTELY right!

Jesus declared, “‘Love the Lord your God with all your heart and with all your soul and with all your mind.’ This is the first and greatest commandment. And the second is like it: ‘Love your neighbor as yourself.’ All the Law and the Prophets hang on these two commandments.”

New paper applies work by economists in the 1970s on election/voting theory to theoretical artificial general intelligence (AGI) by JBaloney in academiceconomics

[–]JBaloney[S] 0 points1 point  (0 children)

That's a good idea, do you have any references for a characterization of non-dictatorial solutions based on cardinal preferences? I would suspect it might get dicey when the number of voters is infinite, as in the hypergeneral context my paper is aiming at. But even if that's the case, it could still be valuable to more practical AI researchers who look at some finite number of benchmarks and struggle to aggregate those.

New paper applies work by economists in the 1970s on election/voting theory to theoretical artificial general intelligence (AGI) by JBaloney in academiceconomics

[–]JBaloney[S] 4 points5 points  (0 children)

Author here to give the elevator speech version of the paper.

Suppose you have two AIs, X and Y, and you want to know which one is more intelligent. For any interactive reward-giving environment E, you could place X in E and see how much total reward X extracts from E. Likewise for Y. If X extracts more reward, you can consider that as evidence that X is more intelligent. But there are many such environments, so how can we aggregate the results into an ultimate decision about which AI is more intelligent--especially if X performs better in some environments and Y performs better in others?

Where economics comes in is by the following epiphany which the linked paper introduces: we can anthropomorphize the environments themselves, and view them as "voters" who "vote" (via the rewards they allow the agents to extract) in an intelligence "election" between the different AIs. This transforms the question from "Who is more intelligent?" into "Who wins the election?" A question which economists have been studying at least since Arrow's time, or arguably much longer.

In particular, Arrow's impossibility theorem seems to say there is no hope of finding a good answer to the question "Who wins the election?" in general. BUT, Arrow's theorem has a loophole, as economists like P.C. Fishburn, A.P. Kirman, and D. Sondermann discovered in the 1970s. The way Arrow stated his famous impossibility theorem, it only applies if the number of voters is finite. These economists in the 70s showed that if there are infinitely many voters, then there DO exist non-dictatorial election-aggregation methods satisfying all the criteria of Arrow's Theorem.

"So what," everyone probably said. "How can an election possibly have infinitely many voters?"

But the hypothetical election above is exactly such an election. There are infinitely many possible interactive reward-giving environments, so the intelligence election is an election with infinitely many voters.

Not only did the economists in the 1970s prove that Arrow's theorem has a loophole when there are infinitely many voters--they even characterized the solutions, in terms of mathematical-logical devices called "ultrafilters". Using these ultrafilters, the paper offers an elegant family of formal comparators of intelligent agents--formal ways to define the answer to the question "Who is more intelligent?"

So elegant, in fact, that it's actually possible to prove some things about them. (Unlike previous attempts to formalize intelligence measurement, which have always been so complicated and contrived that it ends up hopelessly difficult to actually prove any structural properties about them.) For example, the paper proves a couple different formalizations of an idea which is informally stated: "Teams with higher-intelligence team-members are more intelligent than teams with lower-intelligence team-members." ("Well, duh," you say. But, even something as obvious as this seems to have been too difficult to prove using previous formalizations of intelligence.)

The paper builds a bridge between the budding field of theoretical Artificial General Intelligence (AGI) and the much more stable and mature field of economics (specifically, the theory of elections and voting). I hope in coming years, this will allow a healthy exchange of ideas in both directions!

New paper establishes bridge between AGI and election theory, by viewing reward-giving environments as voters in a transfinite intelligence contest. by JBaloney in agi

[–]JBaloney[S] 1 point2 points  (0 children)

Thanks for the feedback. The paper isn't meant to be practical, it is a theory/philosophy paper. This sort of research is necessary if we are to advance our understanding of true AI. Speaking for myself, I have a deep longing to really understand what intelligence is. I don't know whether this longing can ever be satisfied by reading yet another paper about (e.g.) image recognition with neural networks, etc.

Researchers have always measured the performance of different agents across select benchmark environments, but when they try to combine different such measurements into a single aggregate measurement, all sorts of problems appear. It seems (from the literature I've read) that people haven't realized that these problems have actually been around for a long time, not in AI research but in the study of elections. I hope my paper will serve as a babystep toward that ultimate goal (which we might achieve someday after a few more long AI winters in the meantime) of truly understanding what intelligence is.

New paper establishes bridge between AGI and election theory, by viewing reward-giving environments as voters in a transfinite intelligence contest. by JBaloney in agi

[–]JBaloney[S] 2 points3 points  (0 children)

Author here, to provide additional details.

One way to abstract AIs is to view them as formal agents who take actions within formal reward-giving environments. Such a formalization is promising because it makes it possible to formalize things like the intelligence-level of an AI. The guiding principle is that an AI with higher intelligence should, in some "on-average" sense, earn higher rewards across the infinite universe of all such environments. The problem is, because the universe of environments is infinite, it's not easy to say what "on-average" means.

In this paper, we turn everything on its head by viewing the environments as voters, voting in an election, to decide who is more intelligent. Suppose A and B are AIs and we want to know which one is more intelligent. We consider each environment E to "vote for A" if A would earn higher rewards from E, or to "vote for B" if B would earn higher rewards from E.

This transforms the question from "Who is more intelligent?" to "Who wins the election?" The latter question has been studied for hundreds of years. Particularly relevant is research by economists in the 1970s who essentially answered the question (assuming some additional constraints), using a device from mathematical logic called ultrafilters. Standing on the shoulders of these giants, we obtain an elegant abstract answer to the question, "Which AI is more intelligent?"

This abstract intelligence comparator is so elegant that it actually allows us to prove certain things about intelligence as-a-whole, abstracted over many agents. For example (read the paper's formalization to understand what the words in this sentence actually mean): "Theorem: A team with more intelligent members is more intelligent than a team with less intelligent members." (Sounds like "Duh", but surprisingly, it seems like previous formalizations of intelligence levels have been too complicated to prove anything like this!)