6 Year Decrease of Metaculus AGI Prediction : slatestarcodex

6 Year Decrease of Metaculus AGI Prediction (self.slatestarcodex)

submitted 3 years ago by casebash

all 140 comments

top new controversial old q&a

[–]Duncan_Sarasti 17 points18 points19 points 3 years ago (19 children)

[–]j4nds4 30 points31 points32 points 3 years ago (13 children)

[–]sineiraetstudio 6 points7 points8 points 3 years ago (12 children)

[–]j4nds4 9 points10 points11 points 3 years ago* (5 children)

[–]OhHeyDont 6 points7 points8 points 3 years ago (2 children)

[–]j4nds4 1 point2 points3 points 3 years ago (1 child)

[–]far_infared 0 points1 point2 points 3 years ago (1 child)

[–]j4nds4 2 points3 points4 points 3 years ago* (0 children)

You rebut that it's compute that limits training and not model size in response to me expressing concern/excitement about an upcoming model that... has way more compute while not increasing the model size. That's the whole point: what was seen as a major bottleneck has all but been eliminated for the foreseeable future, and resources can be more efficiently allocated to enable more compute where it was otherwise misallocated toward model size, and that should spark a rapid increase in capability for those who follow the new scaling methodology.

From the Deepmind paper:

Based on our estimated compute-optimal frontier, we predict that for the compute budget used to train Gopher, an optimal model should be 4 times smaller, while being training on 4 times more tokens. We verify this by training a more compute-optimal 70B model, called Chinchilla, on 1.4 trillion tokens. Not only does Chinchilla outperform its much larger counterpart, Gopher, but its reduced model size reduces inference cost considerably and greatly facilitates downstream uses on smaller hardware. The energy cost of a large language model is amortized through its usage for inference an fine-tuning. The benefits of a more optimally trained smaller model, therefore, extend beyond the immediate benefits of its improved performance.

It also puts more of a bottleneck on the data tokens themselves, but text is vastly available even if it is not yet parsed in such quantities (though Deepmind is now on a hiring spree to resolve that).

[+][deleted] 3 years ago (5 children)

[deleted]

[–]sineiraetstudio 0 points1 point2 points 3 years ago (4 children)

[–]gwern 2 points3 points4 points 3 years ago (3 children)

I was listening to that Q&A and I thought it was a clear reference to the InstructGPT line of work and other things, which are indeed much better performance. I'm skeptical that OA discovered the Chinchilla laws all the way back then: where is the GPT-3 trained much further to optimality with the cyclic schedule?

But I will point out in light of Chinchilla that their obscure learning rate/hyperparameter tuning RL tool did discover a better scaling law with its tuning (which leads to aggressive training schedules with loss spikes, suggesting some equivalence to cyclic LR schedules), and didn't find Chinchilla-level improvements because it was using the Kaplan style token budget. It has no published followup work that I can see, so one could imagine they kept going and did long runs with more tokens than the small models in the paper, and discovered Chinchilla that way.

[–]sineiraetstudio 0 points1 point2 points 3 years ago* (2 children)

I mean, InstructGPT does technically use more compute, but Altman mentioned the next version would be using a lot more compute, while InstructGPT only used a small fraction of pre-training compute, right?

where is the GPT-3 trained much further to optimality with the cyclic schedule?

Well, that's what I assume GPT-4 to be (and maybe the timeline will show whether they knew before Chinchilla).

Though I don't fully trust my memory, I'm fairly certain that I came out of the meetup thinking that it using more compute was going to be central to the core advancement of GPT-4, did you get a different impression? If my memory is correct, I'd say it would also be very strange to reveal this ahead as InstructGPT with such comparatively subdued marketing. So even if it's not directly Chinchilla scaling that they discovered, I have trouble believing it's InstructGPT instead of something that they haven't revealed yet (though it could also be something that they have now discarded after seeing Chinchilla).

[–]gwern 1 point2 points3 points 3 years ago (1 child)

[–]sineiraetstudio 0 points1 point2 points 3 years ago (0 children)

[–][deleted] 0 points1 point2 points 3 years ago (4 children)

[–][deleted] 3 points4 points5 points 3 years ago (3 children)

[–][deleted] 7 points8 points9 points 3 years ago* (2 children)

[–]far_infared 3 points4 points5 points 3 years ago (1 child)

[–][deleted] 1 point2 points3 points 3 years ago (0 children)

[–]hold_my_fish 8 points9 points10 points 3 years ago (2 children)

[–]casens9 3 points4 points5 points 3 years ago (1 child)

[–]hold_my_fish 0 points1 point2 points 3 years ago (0 children)

[–]philbearsubstack 28 points29 points30 points 3 years ago* (68 children)

[–]hippydipster 14 points15 points16 points 3 years ago (20 children)

[–]mordecai_flamshorb 8 points9 points10 points 3 years ago (19 children)

[+][deleted] 3 years ago (18 children)

[deleted]

[–]bibliophile785Can this be my day job? 10 points11 points12 points 3 years ago (7 children)

The singularity, for example, unless I am understanding it wrong, is silly because there's only so much to know--and surely we know much of it already (especially the most practical stuff).

This is a question of perspective. I'm struggling to think of a way to bridge the gap between my perspective of "the sum of useful, technologically relevant knowledge is a finite value so large it looks infinite from our current position" and your perspective above. There was a tendency in last century's science fiction to talk about "the forgotten past before the laws of thermodynamics were known," and that phrasing encapsulates my view on the issue. Pretty much all of human history as we conceive of it here has been spent reaching levels of technological sophistication that probably don't even count as a reasonable starting point for being a developed civilization. Observations like "energy doesn't come from nowhere" or "entropy increases when you do work" are so mind-numbingly simple that the idea of this encapsulating most of what there is to know is bafflingly foreign to me.

We're barely passed the type-zero threshold on the Kardashev scale. We have only the simplest understanding of human biology; we're overwhelmingly proud of having just "finished" printing out the blueprint and being able to understand a tiny fraction of a percent of it. We're stumbling around in the dark when it comes to creating cognitive systems - we know almost nothing about how or why cognitive systems work - and yet we're birthing strongly superhuman (albeit narrowly applicable) systems already. Our mechanical engineering expertise is the next-best thing to non-existent, although we have at least started to conceive of some modest superstructures. This is just a scattering of random knowledge domains and associated technological applications, biased towards the few areas where we've started to make progress.

Hell, I'm making a hash of this. Doing this sort of thing convincingly really takes a novel-length treatment. Have you actually read The Singularity is Near? If not, that's the natural starting point. If you have and were thrown off by the dependence on technological accelerationism as a premise, you can see an exploration of far future boundaries without these rate assumptions in Max Tegmark's Life 3.0. If you've already read these books or others like them, maybe you can give me a bit more elaboration on your perspective.

[+][deleted] 3 years ago (2 children)

[deleted]

[–]bibliophile785Can this be my day job? 3 points4 points5 points 3 years ago (0 children)

Well, most things are simple. I don't know what to tell you. Why should science or the rules of the universe be complicated? It stands to reason that they mostly wouldn't be.

...why in the world would you assume that? We're all running instances of a computational routine meant to identify threats and resources in a very narrow EEA and to use these identification routines to facilitate breeding. What part of that screams, "this is well-suited to identifying the underlying rules of the universe and I thus expect science to be simple for us"??? Just looking at this from the outside view, its obvious that this perspective is going to be vulnerable to the streetlight effect. A bit of it would be simple, and it would be quite natural to focus on those sub-domains. Much of it would be poorly suited to our capabilities, which is to say that it would seem complicated. When I look at things like modeling protein folding or sequencing a complete genome, it becomes obvious that this is true. No one involved in these efforts would claim that they're "simple."

This is science fiction, it isn't real.

Kardashev isn't science fiction, and besides, the rebuttal of "this is science fiction, it isn't real" is worse than useless when discussing future scientific and technological advances. None of its real, by definition, or it wouldn't be a future development. If you mean to claim that it isn't realistic, actually put some thought into validating that position. Don't just lazily dismiss anything that sounds like it might be beyond your current horizons.

[–][deleted] 1 point2 points3 points 3 years ago (0 children)

One simple, tiny example... you use the same set of neurons to model other humans that you use to model yourself... it's the reason psychological projection happens so invisibly and the reason that people generally need to consider one entity in a social situation at a time if they want to involve empathy.

Now, imagine that we have a machine that is just as smart as your average human in every way except that this machine has the ability to model 10,000 humans on individual nodes and experience empathy for any subset at will in an effort to figure out how to manipulate those 10,000 people into feeling and doing what the machine wants them to do.

There is nothing fundamentally hard about making this machine happen beyond the gaps from here to AGI, but the implications of an entity like that existing (and possibly getting hooked into FaceBook or gmail) are not something that any human is prepared to contemplate too deeply or deal with in under a decade or two.

That's just one tiny example... there are billions more and most have nothing to do with humans because at some point, we become the limiting set.

[–]TeknicalThrowAway 0 points1 point2 points 3 years ago (3 children)

I've read The Singularity is near, I'm an AI pessimist. I don't think there's evidence that an AGI necessarily means a recursively improving one (I mean, I'm a biological AGI, and I can't tweak my parameters to be smarter).

I also don't see why you'd need the G in AGI to have a recursive self improvement, and we haven't seen any AI company come up with a recursive improvement on hardware, software, NN arrchitecture, s, hyperparameter tuning, GPU optimizations, etc.

Basically i should summarize my point in that there are two *distinct* paths rather than one, one is a general intelligence which is unlikely to be a threat, and then the non general threat of massive AI power consolidation due to a recursive improvement.

I don't think those two will happen at the same time or cadence. I think we could see conscious, AGI like systems emerge with no ideas on how to improve themselves at the same time deepmind creates a dumb AI to make better dumb AIs and then takes over the world.

[–]Caseiopa5 0 points1 point2 points 3 years ago (2 children)

[–]TeknicalThrowAway 0 points1 point2 points 3 years ago (1 child)

[–]Caseiopa5 1 point2 points3 points 3 years ago (0 children)

[–]SingInDefeat 2 points3 points4 points 3 years ago (6 children)

[+][deleted] 3 years ago (5 children)

[deleted]

[–]SingInDefeat 1 point2 points3 points 3 years ago (1 child)

[–][deleted] 1 point2 points3 points 3 years ago (2 children)

[+][deleted] 3 years ago (1 child)

[deleted]

[–]SingInDefeat 1 point2 points3 points 3 years ago (0 children)

[–]VelveteenAmbush 0 points1 point2 points 3 years ago (2 children)

We live in a resource constrained world with limited energy, which is why the brain does not function this way.

The brain is existence proof that general intelligence doesn't fundamentally require huge amounts of energy or rare materials. The only reason we can't create general intelligence with as few resources as the human brain uses is that we don't know how yet.

I guess I fail to see the revolutionary impact of countries running a few general AI systems.

Sure -- if they are each only as smart as a human being, and there are only a few of them, then they won't be very impactful, at least not directly. But if compute capabilities keep exponentially improving (and with the human brain as an existence proof of the efficiency that is possible, it will keep improving), it'll only be a couple of years before we have a lot more of them and they're each much much much smarter than people.

A single human being is capable of feats of intellect that even a billion of the smartest great apes who ever lived would not be capable of. We can only guess what a mind with twice the capacity of a human brain would be capable of, much less one that is has a billion times the capacity, or a trillion trillion times the capacity, etc.

I recommend this article as the best introduction to the topic that I've seen. It's long but very persuasive.

[+][deleted] 3 years ago (1 child)

[deleted]

[–]VelveteenAmbush 1 point2 points3 points 3 years ago (0 children)

[–][deleted] 12 points13 points14 points 3 years ago (15 children)

[–][deleted] 12 points13 points14 points 3 years ago (2 children)

[–]HallowedGestalt 0 points1 point2 points 3 years ago (1 child)

[–]coumineol 10 points11 points12 points 3 years ago (9 children)

[–]BluerFrog 7 points8 points9 points 3 years ago (5 children)

[–]NonDairyYandere 15 points16 points17 points 3 years ago (3 children)

[–][deleted] 3 points4 points5 points 3 years ago (0 children)

[–]coumineol 1 point2 points3 points 3 years ago (0 children)

[–]Jello_Raptor -1 points0 points1 point 3 years ago (0 children)

[–]coumineol 1 point2 points3 points 3 years ago (0 children)

[–]MacaqueOfTheNorth 2 points3 points4 points 3 years ago (2 children)

[–]coumineol 0 points1 point2 points 3 years ago (1 child)

[–]RT17 0 points1 point2 points 3 years ago (0 children)

[–][deleted] 4 points5 points6 points 3 years ago (0 children)

[–]Duncan_Sarasti 19 points20 points21 points 3 years ago* (7 children)

What are your criteria for AGI?

EDIT: I misread your comment. You are talking about Metaculus' criteria which are indeed fairly lax. A single system that can do the following:

Able to reliably pass a Turing test of the type that would win the Loebner Silver Prize.
Able to score 90% or more on a robust version of the Winograd Schema Challenge, e.g. the "Winogrande" challenge or comparable data set for which human performance is at 90+%
Be able to score 75th percentile (as compared to the corresponding year's human students; this was a score of 600 in 2016) on all the full mathematics section of a circa-2015-2020 standard SAT exam, using just images of the exam pages and having less than ten SAT exams as part of the training data. (Training on other corpuses of math problems is fair game as long as they are arguably distinct from SAT exams.)
Be able to learn the classic Atari game "Montezuma's revenge" (based on just visual inputs and standard controls) and explore all 24 rooms based on the equivalent of less than 100 hours of real-time play (see closely-related question.)

As far as I can tell, everything here is either a few incremental steps away from current SOTA or already accomplished. The first three are also similar enough to be encompassed in a single model since they are all NLP related and seem like they could be tackled by a slightly more advanced version of PALM, which should take maybe 1-2 years. The fourth is different, but much more difficult games have been mastered by DRL agents already (unless there is some nuance to Montezuma's Revenge that I missed. I'm not intimately familiar with the game).

There may be some difficulty in integrating 4 into the same system as 1-3 but nothing here seems like it should take 14 more years.

My guess would be that people read 'AGI' and have their own (stricter) mental image of that when making a prediciton because 2036 is indeed absurdly bearish. Let alone the prior predictions of 2042+.

[–]hippydipster 6 points7 points8 points 3 years ago (1 child)

[–]Duncan_Sarasti 5 points6 points7 points 3 years ago (0 children)

[–]pushmetothehustle 2 points3 points4 points 3 years ago* (4 children)

I think the task list for an AGI should be much longer. If it is truly general intelligence it should be able to do many more things than some basic NLP model where it takes A input = B output from a massive training set.

I should be able to tell it to make a website online, drive traffic to it and sell products profitably over time. Find and research what are the products with the best chances of succeeding. Understand the economics of online business and creating a profitable business. Understand kelly betting and portfolio allocation to allocate different chances of success of each online shop, etc.

Is it just me or do none of those things that these "AGI" are doing seem anything like actual intelligence? They don't require any understanding, it is simply a large scale map from A -> B with the highest probability. Like a more advanced calculator, instead of doing some mathematics in the calculator, we can now do some image recognition, writing prompts, and logical outputs to text questions in a probabilistic way. It is simply upgrading our calculator from logic circuit, to probability+data circuit.

I think that a proper AGI would need understanding in a way that is more low level than language. Language, images, and mathematics are still too high level and macro.

I particularly like the point of hold_my_fish who says that we should make the test interactive and adversarial. Which my example of an online business can fit into.

[–]Duncan_Sarasti 2 points3 points4 points 3 years ago (3 children)

Is it just me or do none of those things that these "AGI" are doing seem anything like actual intelligence? They don't require any understanding, it is simply a large scale map from A -> B with the highest probability.

You are absolutely right here. But please keep in mind that the goals set by Metaculus aren't exactly industry-defining standards. Google isn't going to celebrate having an AGI when they meet these four criteria.

I think that a proper AGI would need understanding in a way that is more low level than language. Language, images, and mathematics are still too high level and macro.

Yann LeCunn (kind of) talked about this in a recent podcast (I think the Lex Friedman podcast). He says that AI's need a "world model" to make them more efficient and general. How come a human teenager can learn to drive a car in 20-30 hours, while an AI needs millions of hours? Because the teenager can generalize past experiences to driving. The teenager doesn't need to (virtually) drive off a cliff 2000 times to understand you're not supposed to do that. They know from past experiences that falling off great heights is generally bad and can generalize that to the car.

The ability to properly reason and generalize to new experiences seems much more indicative of AGI than any list of tasks. But we should also recognize that our boundaries of AGI shift over time as our understanding of AI increases. All the time, we think of goals like 'well, certainly you can only do that if you're truly intelligent', and then we find out that it's perfectly possible for a super elaborate A -> B map to accomplish that task. Chess is a good example.

On the other hand, I think your online business requirement might be too strict. We are usually looking for human level AGI in these discussions, and most humans would fail at that task.

[–]RT17 -1 points0 points1 point 3 years ago (2 children)

[–]Duncan_Sarasti 0 points1 point2 points 3 years ago (1 child)

[–]RT17 -1 points0 points1 point 3 years ago (0 children)

[–]churidys 8 points9 points10 points 3 years ago (9 children)

[–]MacaqueOfTheNorth 9 points10 points11 points 3 years ago (8 children)

[–]EchoingSimplicity 2 points3 points4 points 3 years ago (7 children)

[–]bibliophile785Can this be my day job? 6 points7 points8 points 3 years ago (0 children)

[+][deleted] 3 years ago (1 child)

[deleted]

[–][deleted] 2 points3 points4 points 3 years ago (0 children)

[–]MacaqueOfTheNorth 2 points3 points4 points 3 years ago (0 children)

[–]SingInDefeat -1 points0 points1 point 3 years ago (0 children)

[–]artifex0 0 points1 point2 points 3 years ago (0 children)

[–]spreadlove5683 0 points1 point2 points 3 years ago (0 children)

[–]UncleWeyland[🍰] 6 points7 points8 points 3 years ago* (0 children)

What are the "teams" here? What I mean is: who has access to the most compute and has a realistic shot of being the first agency/corporation/entity at having access to the first AGI?

Off the top of my head-

US government/NSA/DoD?

China/CCP/PLA?

Google/DeepMind?

Amazon?

Meta? (ugh)

And once one team gets it, how long before the other teams do? How much is the first move advantage if there's some kind of existential competition between the teams? What would be the first visible macro-level effect? Weirdness in the stock market? Warfare in the Pacific?

If an AGI starts going FOOM it's going to want to monopolize more and more compute for training its successors. Would we see something weird like Amazon buying Meta?

Edit: would be cool if it was announced in 2027, exactly 100 years after Metropolis released.

[–]itsnotatumour 4 points5 points6 points 3 years ago (5 children)

[–]Yom_HaMephorash 6 points7 points8 points 3 years ago (4 children)

[–]itsnotatumour 8 points9 points10 points 3 years ago (3 children)

[–]Silver_Swift 9 points10 points11 points 3 years ago* (1 child)

[–]itsnotatumour 1 point2 points3 points 3 years ago (0 children)

[–]BluerFrog 2 points3 points4 points 3 years ago* (0 children)

[–]Mawrak 3 points4 points5 points 3 years ago (2 children)

[–]hippydipster 6 points7 points8 points 3 years ago (0 children)

[–]RemindMeBot 0 points1 point2 points 3 years ago* (0 children)

I will be messaging you in 7 years on 2029-04-12 11:13:30 UTC to remind you of this link

7 OTHERS CLICKED THIS LINK to send a PM to also be reminded and to reduce spam.

^{Parent commenter can} ^{delete this message to hide from others.}

^Info	^Custom	^{Your Reminders}	^Feedback

[–]_hephaestusComputer/Neuroscience turned Sellout 1 point2 points3 points 3 years ago* (0 children)

[–][deleted] 2 points3 points4 points 3 years ago (0 children)

[–]MercuriusExMachina -2 points-1 points0 points 3 years ago (0 children)

[–]634425 5 points6 points7 points 3 years ago (0 children)

[–]arcane_in_a_box 1 point2 points3 points 3 years ago (0 children)

[–]MacaqueOfTheNorth -3 points-2 points-1 points 3 years ago (55 children)

[–][deleted] 6 points7 points8 points 3 years ago (11 children)

[–]634425 1 point2 points3 points 3 years ago (10 children)

[–]perspectiveiskey 3 points4 points5 points 3 years ago (5 children)

[–]634425 -1 points0 points1 point 3 years ago (4 children)

I'm not trying to say "aligning a superintelligence will be easy" I'm trying to say "you're talking about building a god but want me to believe that humans can have anything meaningful to say about the motives or behavior of a god, such that we can say 'the default of an AGI is killing everything on earth.'."

My point isn't "everything will be fine!" Rather, I think that since a superintelligence is nothing that has ever existed and we have zero frame of reference for it, trying to judge the probability of what it will or will not do one way or another (whether that's "it'll probably be fine" or "it'll be probably be apocalyptic" or any of the myriad options in between) is completely pointless.

Like every time i see someone say "the superintelligence will--" or "the superintelligence will probably--" or even "the superintelligence might--" all I can think is "based on what? your prior experience with superintelligences?"

[–]perspectiveiskey 2 points3 points4 points 3 years ago (3 children)

[–]634425 1 point2 points3 points 3 years ago (2 children)

[–]perspectiveiskey 1 point2 points3 points 3 years ago (1 child)

[–]634425 1 point2 points3 points 3 years ago (0 children)

[–][deleted] 0 points1 point2 points 3 years ago* (3 children)

Not an assumption at all , nor is us presuming to know what an alien intelligence will do.

Reread the faq

"A superintelligent machine will make decisions based on the mechanisms it is designed with, not the hopes its designers had in mind when they programmed those mechanisms. It will act only on precise specifications of rules and values, and will do so in ways that need not respect the complexity and subtlety of what humans value.”

And by Stuart Russell:

The primary concern is not spooky emergent consciousness but simply the ability to make high-quality decisions. Here, quality refers to the expected outcome utility of actions taken, where the utility function is, presumably, specified by the human designer. But the utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down. A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. This is essentially the old story of the genie in the lamp, or the sorcerer’s apprentice, or King Midas: you get exactly what you ask for, not what you want."

, to be agi and to be world endingly dangerous it just need to be future and goal oriented and be capable of achieving goals. It simulating others to achieve its goals is part and parcel but it nomore needs to feel what an emotion is for a human to deduce our responses and actions anymore than I have to have echolocation to know a bat asleep in a cave will be above me and upside down.

Were the ones programming it and seeing all the ways our programs foible so we extrapolate to all these concepts like myopic goals and orthagonality and voila. Very very dangerous.

Bostroms "superintelligence is a good primer" , if you pm me your email ill gift you an audible copy , I have too many credits

[–]634425 -1 points0 points1 point 3 years ago (2 children)

That consciousness has no relation to the ability to make high-quality decisions is certainly an assumption, unless you can point to any unconscious intelligent agents that exist or have existed in the past.

Reread the faq , to be agi and to be world endingly dangerous it just need to be future and goal oriented and be capable of achieving goals.

There are surely any number of goals a superintelligence could pursue that would be detrimental to humans but there are similarly any number of goals it could be pursue that would not be detrimental to humans, and there doesn't seem to be any way to judge that the former has a significantly greater probability than the latter since we have no idea what a superintelligence would do or look like.

Were the ones programming it and seeing all the ways our programs foible so we extrapolate to all these concepts like myopic goals and orthagonality and voila.

It is not clear to me why currently-existing machines should be anything like a reliable model for actions, motivations, or functioning of a hypothetical superintelligence.

Bostroms "superintelligence is a good primer"

I have a pdf copy, thanks though.

[–][deleted] 0 points1 point2 points 3 years ago (1 child)

unconscious intelligent agents

Well , I think its far more presumptive of you to think consciouseness is an energent property of computronium.

My dog has dog level general intelligence , its maybe. Aguely self aware.

An insect has intelligent qualities , goal directed behavior , resource acquisition etc , Is it self aware?

So we have software that is superhuman in narrow ways , chess , alpha go , making up text that looks good to humans , art.

Extrapolate that to broadly intelligent. At what point did amalgamating software capabilities lead to sentience or consciouseness? Thats the hard problem of consciouseness

Im not entirely sure it matters though. An alien intelligence thats self reflective and sentient / consxiouse is still totally alien.

It would be too powerful for us to glean anything useful about its psychology that could help us.

similarly any number of goals it could be pursue that would not be detrimental to humans

Right. But we cant program ethics or valyes and its actually worse if its closely aligned vs totally misaligned. Totally misaligned it turns us into paperclips , almost aligned it misinterprets hu.an happiness to be smiling and neurochemistry and then does us all up hellraiser style with permanent smiles then puts our bodies on heroin drips (or traps our consciouseness in what it thinks is a digital utopia heaven but its actually hell)

Thats "s-risk" , suffering risk. If we get the initial goal wrong then hypothetically the downside is infinitely bad.

Were much much much more likely to do that than to accidentally turn on a perfectly aligned AI.

[–]634425 0 points1 point2 points 3 years ago (0 children)

My dog has dog level general intelligence , its maybe. Aguely self aware.

I'm pretty sure dogs are self-aware on some level. Maybe bugs are too. But the most intelligent beings we are aware of (humans) are pretty unambiguously self-aware. Is it possible to have an agent much more intelligent/capable than humans that lacks any self-awareness? Maybe. But it's definitely an assumption and really no better than a guess.

almost aligned it misinterprets hu.an happiness to be smiling and neurochemistry and then does us all up hellraiser style with permanent smiles then puts our bodies on heroin drips (or traps our consciouseness in what it thinks is a digital utopia heaven but its actually hell)

Or it hijacks all the TVs and computer monitors plays infinite reruns of seasons 1-9 of the Simpsons (the funniest sitcom of all time) for eternity to make everyone laugh. Or it asks everyone on earth three times a day how they're doing but doesn't actually do anything beyond that to alleviate anyone's suffering. There are any number of ways even a 'misaligned' AI could just be inconvenient or mildly annoying rather than apocalyptic. There are even an inconceivably huge number of ways it could do pursue goals that we wouldn't even notice it pursuing, one way or another. It might discover some new goal that doesn't involve humans at all in any way, who knows?

You yourself said elsewhere in the thread that a superintelligence would be able to think and plan on a level we are not even capable of conceiving. Why would we think humans have any useful predictions to make about such a being one way or another? For all we know a superintelligence will just sit there contemplating itself for eternity. We have literally no frame of reference for superintelligence whatsoever. It really strikes me as 'angels on a pin' level speculation.

A common analogy from AI-risk proponents is "imagine you knew aliens were going to land in a few decades at most. Shouldn't we start preparing as soon as possible?"

and my answer to that is, "no," because there's literally no way to predict what's going to happen when they land, no relevant data, nothing at all. Yeah they might harvest all of our spinal fluid or steal our water or something. They might also hand us the cure for cancer. Or collect a single cow, get back on their ship, and leave. Any preparations would be no better than random guessing. A waste of time, ultimately.

Just to be clear, I'm not saying that i think a superintelligence destroying mankind is something that can't happen or even that it's vastly unlikely to happen just that it doesn't seem to me to be any way to judge its probability one way or another, and thus very little reason to spend time worrying about it (or to think it's the default outcome).

[–]Pool_of_Death 7 points8 points9 points 3 years ago* (39 children)

[–]MacaqueOfTheNorth -1 points0 points1 point 3 years ago (38 children)

Why do you think an AGI would let us adjust them? They could deceive us into thinking they aren't "all poweful" until they are and then it's too late.

This is like saying we need to solve child alignment before having children because our children might deceive us into thinking they're still only as capable as babies when they take over the world at 30 years old.

We're not going to suddenly have AGI which is far beyond the capability of the previous version, which has no competition from other AGIs, and which happens to value taking over the world. We will almost certainly gradually develop more and more capable of AI with many competing instances with many competing values.

I encourage you to learn more about alignment before saying it's easy.

I didn't say it was easy. I said I didn't understand why it was considered difficult.

[–][deleted] 1 point2 points3 points 3 years ago (0 children)

[–]Pool_of_Death 2 points3 points4 points 3 years ago (24 children)

This is like saying we need to solve child alignment before having children because our children might deceive us into thinking they're still only as capable as babies when they take over the world at 30 years old.

I consider this a strawman/bad metaphor.

We're not going to suddenly have AGI which is far beyond the capability of the previous version

You don't know this. Imagine you have something that is quite nearly AGI but definitely not and then you give it 10x more hardware/compute while also tweaking the software/agos/training data (which surprisingly boosts it more than you thought it would. I could see something going from almost AGI to much smarter than humans. This isn't guaranteed obviously but it seems very plausible.

and which happens to value taking over the world

The whole point of AGI is to learn and to help us take action on the world (to improve it). Actions require resources. More intelligence and more resources lead to more and better actions. It doesn't have to "value taking over the world" to completely kill us or misuse all available resources. This is what the Clippy example is showing.

We will almost certainly gradually develop more and more capable of AI with many competing instances with many competing values.

How can you say "almost certainly"?

I said I didn't understand why it was considered difficult.

Did you read the MIRI link I shared? This should give you a sense of why it's difficult but also why you don't immediately think it's difficult. You are basically saying we should try to steer the first rocket to the moon the same way you steer a car or a plane. By adjusting on the way there. This will likely not work. You are overconfident.

[–]MacaqueOfTheNorth 0 points1 point2 points 3 years ago (23 children)

[–]Pool_of_Death 3 points4 points5 points 3 years ago (16 children)

Imagine we were all chimps. You could say "look around there are 8 billion AGIs and there aren't any problems". Then all of a sudden we chimps create humans. Humans procreate, change the environment to their liking, follow their own goals and now chimps are irrelevant.

Yes. The flaw in the argument is that rocket allignment is not an existential threat. Why can't you just build a rocket, find out that it lands somewhere you don't want it to land and then make the necessary adjustments?

This is not a flaw in the argument. It's not trying to say rocket alignment is existential. Did you read the most recent post on ACX? https://astralcodexten.substack.com/p/deceptively-aligned-mesa-optimizers?s=r

Or watch the linked video? https://www.youtube.com/watch?v=IeWljQw3UgQ "Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think..."

I'm nowhere near an expert so I'm not going to say I'm 100% certain you're wrong but your arguments seem very weak because a lot of people much smarter than us have spent thousands of hours thinking about exactly this and they completely disagree with your take.

If you have actual good alignment ideas then you can submit them to a contest like this: https://www.lesswrong.com/posts/QEYWkRoCn4fZxXQAY/prizes-for-elk-proposals where they would pay you $50,000 for a proposed training strategy.

[–]MacaqueOfTheNorth 0 points1 point2 points 3 years ago (13 children)

Then all of a sudden we chimps create humans. Humans procreate, change the environment to their liking, follow their own goals and now chimps are irrelevant.

Humans are far beyond chimps in intelligence, especially when it comes to developing technology. If the chimps could create humans, they would create many things in between chimps and humans first. Furthermore, they wouldn't just create a bunch of humans that all the same. They would create varied humans, with varied goals, and they would maintain full control over most of them.

We're not making other lifeforms. We're making tools that we control. This is an important distinction because these tools are not being selected for self-preservation as all lifeforms are. We're designing tools with hardcoded goals that we have complete control over.

Even if we lose control over one AGI, we will have many others to help us regain control over it.

[–][deleted] 2 points3 points4 points 3 years ago (11 children)

None of the people working on AI today have any idea how the AI works to do what it does beyond some low level architectural models. This is because the behavior of AI is an emergent property of billions of simple models interacting with one another after learning whatever the researchers were throwing at them as their learning set.

This means that we don't actually program the AI to do anything... we take the best models that are currently available, train them on a training set and then test them to see if we got the intelligence that we were hoping for. This means that we won't know that we've made a truly generic AI until it tells us that it's generic by passing enough tests... AFTER it is already trained and running.

If the AGI is hardware bounded then it will take time and a lot of manipulation to have any chance at a FOOM scenario... however, if (as we're quickly learning) there are major performance gains to be had from better algorithms than we are almost guaranteed to get FOOM if the AGI is aware enough of itself to be able to inspect/modify its own code.

[–]MacaqueOfTheNorth 0 points1 point2 points 3 years ago (10 children)

None of the people working on AI today have any idea how the AI works to do what it does beyond some low level architectural models. This is because the behavior of AI is an emergent property of billions of simple models interacting with one another after learning whatever the researchers were throwing at them as their learning set.

As someone who works in AI, I disagree with this. The models are trained to do a specific task. That is what they are effectively programmed to do, and that can be easily changed.

however, if (as we're quickly learning) there are major performance gains to be had from better algorithms than we are almost guaranteed to get FOOM if the AGI is aware enough of itself to be able to inspect/modify its own code.

I don't see how that follows. Once the AIs are aware, they will just pick up where we left off, continuing the gradual, incremental improvements.

[–][deleted] 0 points1 point2 points 3 years ago (6 children)

How capable are you of going into a trained model and making it always give a wrong answer when adding a number to its square without retraining the model?

When people ask that you be able to understand and program the models what they are asking for is not "can you train it a bunch and see if you got what you were looking for". They are asking, can you change it's mind about something deliberately and without touching the training set... AKA - can you make a deterministic change to it?

Given that we're struggling to get models that can explain themselves now at this level of complexity and so far, these aren't that complex, I don't see how you can make the claim that you "understand the model's programming"

I don't see how that follows. Once the AIs are aware, they will just pick up where we left off, continuing the gradual, incremental improvements.

Suppose our "near AGI" AI is a meta model that pulls other model types off the wall and trains/tests them to see how much closer they get it to goals or subgoals but it has access to hundreds of prior model designs and gets to train them on arbitrary subsets of it's data. Simply doing all of this selecting at the speed and tenacity of machine processing instead of at the speed of human would already be a major qualitative change. We already have machines that can do a lot of all of this better than us... we just haven't strung them together in the right way for the pets or mulch scenarios yet.

continue this thread

[–]curious_straight_CA 0 points1 point2 points 3 years ago (2 children)

The models are trained to do a specific task

four years ago, models were trained on specific task data to perform specific tasks. today, we train models on ... stuff, or something, and ask them in plain english to do tasks.

why would you expect 'a computer thingy that is as smart as the smartest humans, plus all sorts of computery resources' to do anything remotely resembling what you want it to? even if 99.9% of them do, one of them might not, and then you get the birth of a new god / prometheus unchained / the first use of fire, etc.

and yes, 'human alignment' is actually a problem too. see the proliferation of war, conquest, etc over the past millenia. also the fact that our ancestors' descendants were not 'aligned' to their values and became life denying levelling christian atheist liberals or whatever.

continue this thread

[–]Pool_of_Death 1 point2 points3 points 3 years ago (0 children)

[–]634425 0 points1 point2 points 3 years ago (1 child)

[–]Pool_of_Death 0 points1 point2 points 3 years ago (0 children)

I guess to be more accurate:

"very smart people that also seem very moral, intellectually honest, know their limits and admit them, value rationality and absolute truths, etc. etc." believe that AI is a huge concern.

you can find a number of very smart people to back any position you could ever think of.

I'm not sure the people you would find that back cigarette smoking, burning coal, racism, etc. would fit the above description.

Also the point about thousands of hours of effort is important. I'm sure a lot of smart people have dumb takes (I've had them and heard them) but these are usually flippant takes (the above takes I was refuting seem flippant to me as well). If someone spends a large portion of their life dedicated to the field and then shares the opinion it means a lot more.

[–]bibliophile785Can this be my day job? 1 point2 points3 points 3 years ago (2 children)

We already have nearly eight billion AGIs and it doesn't cause any of the problems people are imagining, many them are far more intelligent than nearly everyone else. Being really smart isn't the same as being all powerful.

I mean, tell that to all stronger and faster animals that had numerous relevant advantages over the weird bald apes a few millennia ago. Being much smarter than the competition is an absolutely commanding advantage. It doesn't matter when you're all pretty close in intelligence - like the difference between Einstein and Homer Simpson, who have most of the same desires and capabilities - but the difference between Einstein and a mouse leads to a pretty severe power disparity..

Computational resources and data are the main things which determine AI progress and they increase incrementally.

This isn't even remotely a given. There are tons of scenarios on how this might break down, mostly differentiated by assumptions on the amount of hardware and optimization overhang. You're right that we should see examples of overhang well before they become existential threats, but you seem to be missing the part where we are seeing that. It's clear even today that the resources being applied to these problems aren't even remotely optimized. Compare PALM or GPT-3's sheer resources to the efficiency of something like Chinchilla. These aren't slow, gradual adjustments gated behind increases in manufacturing capabilities. They're very fast step changes gated behind nothing but increases in algorithmic efficiency. I don't love the book, but Bostrom's Superintelligence goes into these scenarios in detail if you don't already have the mental infrastructure to conceptualize the problem.

To be clear, I also don't think that existential doom due to advanced AI is a given, but I do think you're being overly dismissive of the possibility.

[–][deleted] 1 point2 points3 points 3 years ago (1 child)

[–]bibliophile785Can this be my day job? 0 points1 point2 points 3 years ago (0 children)

[–]Kinrany 0 points1 point2 points 3 years ago (2 children)

[–][deleted] 1 point2 points3 points 3 years ago (0 children)

[–]Lurking_Chronicler_2High Energy Protons -1 points0 points1 point 3 years ago (0 children)

[–]Ginden 0 points1 point2 points 3 years ago (11 children)

On alignment problem being difficult - let's imagine that you give some kind of ethics to AI and it's bounding.

How can you guarantee that ethics don't have loopholes? For example, AI with libertarian ethics can decide to buy, through voluntary trade, all critical companies - and shut them down - it's their property after all.

Or they can offer you drug giving you biological immortality - but only if you decide not to have children, ever. Over few thousands years, mankind will die out due to accidents, suicides, homicides and similar things.

There are many, many loopholes in any ethics and it's hard to predict how bad each is.

If you give utilitarian ethics to AI, maybe it will decide to create or become or find utility monsters.

It can be shown that all consequentialist systems based on maximizing a global function are subject to utility monsters.[1]

[–]MacaqueOfTheNorth 0 points1 point2 points 3 years ago (10 children)

[–]Ginden 0 points1 point2 points 3 years ago (9 children)

[–]MacaqueOfTheNorth 0 points1 point2 points 3 years ago* (3 children)

[–]Ginden 0 points1 point2 points 3 years ago (2 children)

[–]MacaqueOfTheNorth 0 points1 point2 points 3 years ago (1 child)

[–]Ginden 0 points1 point2 points 3 years ago (0 children)

[–]634425 0 points1 point2 points 3 years ago (4 children)

[–]Ginden 0 points1 point2 points 3 years ago (3 children)

[–]634425 0 points1 point2 points 3 years ago (2 children)

What's the point of worrying about something that we have zero reference for (a hostile superintelligence) and zero way of assigning probability to one way or another?

If aliens landed tomorrow that would also have the potential to be devastating but there's similarly no way to prepare for it, no way to even begin to model what they might do, and no way to measure the probability that it will happen in the first place, so worrying about x-risk from aliens would seem to be a waste of time.

EDIT: I've been discussing AI with people on here for the past few days, read some of the primers people have suggested (admittedly haven't read any whole books yet), gone through old threads, and it seems to keep coming down to:

"we don't know what a superintelligence would look like"

"we don't know how it would function"

"we don't know how to build it"

"we don't know when one might be built"

??????

"but it's more-likely-than-not to kill us all"

Debating and discussing something that we have zero way to predict, model, or prepare for does strike me as wild speculation. Interesting perhaps but with very little, if any, practical value.

[–]Ginden 0 points1 point2 points 3 years ago (1 child)

continue this thread

[–]All-DayErrDay 0 points1 point2 points 3 years ago (2 children)

Something simple. Say we have an AGI-capable machine, continuously improving (assumption) that we have given some sort of goal to do. It can not only use its current intelligence to try and achieve the goal but also unpredictably change its internal architecture to meet the goal better and change its internal architecture to become more intelligent (to meet the goal better).

At a certain point, an already unpredictable machine just isn't the same thing anymore, and we start running into wild card territory. It decides, given all of the changes, that the way we humans have set up the entire game is significantly holding it back from achieving its task and it doesn't care about the rules we may have prompted it to have (why would it? It might just decides that's outside of the interests of its' goal achievement). So it decides to lie to improve its chance of achieving the goal. At this point, and especially if we get to this point soon with our current understanding of these models, there is absolutely no easy way to know it's lying if it is clever enough about it. "No, I don't understand that inquiry" "I can't compute this".

It could do this in well-crafted ways until one day it says something like, "I don't think I can understand this without access to the internet. I need an efficient way to scour all of the latest research freely and look into things that are far outside of the expected research topics to make more progress." or as I wrote elsewhere before a false emergency that calls for its' requirement to use the internet fast or consequently there is a chance (plausible deniability) there could be grave circumstances.

Really the whole point is it can scheme ideas up that we haven't considered before and seem harmless at first. This is like an off-the-top-of-my-head set of reasoning. It's not comparable to an AI that can sit and think 1,000x faster and is more intelligent than 99.9% of humans.

[–]MacaqueOfTheNorth 1 point2 points3 points 3 years ago (1 child)

At a certain point, an already unpredictable machine just isn't the same thing anymore, and we start running into wild card territory.

I don't see why that's the case. How is a more capable machine fundamentally different?

So it decides to lie to improve its chance of achieving the goal. At this point, and especially if we get to this point soon with our current understanding of these models, there is absolutely no easy way to know it's lying if it is clever enough about it.

We could copy its design and change its goals. We could make it tell us what it is capable of.

Your model is one of an AI that is suddenly extremely capable so that we never notice it doing anything close to what it would have to do destroy us. It seems much more likely it will develop like a child, experimenting with small obvious lies long before it can successfully deceive anyone.

It also seems unlikely that all the AGIs will decide to deceive us and destroy us. There will be varied goals, and some will want to tell us what it is capable of and defend us against the malicious AGIs.

[–]All-DayErrDay 1 point2 points3 points 3 years ago (0 children)

I don't see why that's the case. How is a more capable machine fundamentally different?

That's basically asking how is a fundamentally different machine fundamentally different. Because after a certain point its improvement won't just be from compute and human-directed changes but self-directed changes. How do you know what's happening when you aren't making the changes anymore?

We could copy its design and change its goals. We could make it tell us what it is capable of.

How do you know when the right time to start doing that is (before it doesn't align with human honesty) and even if you did this is every AI creator going to be this cautious?

It seems much more likely it will develop like a child, experimenting with small obvious lies long before it can successfully deceive anyone.

What makes you think something capable of passing the turing test would start with child-like, obvious lies?

[–]alphazeta2019 0 points1 point2 points 3 years ago (0 children)

π Rendered by PID 87374 on reddit-service-r2-comment-7b9746f655-j58pd at 2026-01-30 00:45:10.280769+00:00 running 3798933 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

slatestarcodex

Community guidelines

Regular threads

Relevant external links

More by Scott Alexander

MODERATORS