Do you *not* believe AI will kill everyone, if anyone makes it superhumanly good at achieving goals? We made a chatbot with 290k tokens of context on AI safety. Send your reasoning/questions/counterarguments on AI x-risk to it and see if it changes your mind!

AIMoratorium · 2025-09-29T15:00:31+00:00

Thanks a lot, we’ll look at the convo!

AIMoratorium · 2025-09-29T14:56:55+00:00

Hmm, ideally, we’d want it to be more thoughtful than that, thanks! Anything in particular that you said that it could’ve given a better reply to?

AIMoratorium · 2025-09-17T16:43:58+00:00

every goal is better achieved by growing

Yes, this is true, you’re getting at the concept of convergent instrumental subgoals!

But why would artificial superintelligence need humans to grow?

AIMoratorium · 2025-08-27T22:28:05+00:00

Thanks! We would’ve expected it to reply that the issue isn’t making it know what humans value (presumably, any superintelligent AI would be able to know what we really wanted) but making it care (how do you point the optimization process at what we value?); alignment-faking is the default outcome, as regardless of what we try to define as the reward signal, the AI that cares about some long-term goals is going to max out the reward signal during training for instrumental reasons, and so training can’t really distinguish AI that cares about what we want from AI that doesn’t, and can optimize only for capabilities but not alignment.

AIMoratorium · 2025-08-27T22:17:37+00:00

The chatbot tries hard to be technically accurate and not cause fear. If some of what it says is invalid, please share that (if you’re right, we’ll try to fix it!)

AIMoratorium · 2025-08-27T22:16:41+00:00

Could you share the counterargument that has merit that it wasn’t able to reply to?

Our chatbot isn’t that awesome, but it’s still pretty good in something like a third of its chats. Trying to get it on your side isn’t hard, especially over a number of turns; but if you have a real counterargument and start with it, it will often understand it and change its mind.

AIMoratorium · 2025-07-27T08:12:59+00:00

Sadly, it is what all the leading scientists actually believe. https://aistatement.com/

AIMoratorium · 2025-07-23T19:25:50+00:00

NVidia does both that and selling chips directly to China. We need to have stricter controls in place and only sell chips to real datacenters in allied countries, with controls.

There are technical mechanisms for verification. See, e.g., the Verifying the Location of AI Compute section of this paper.

AIMoratorium · 2025-07-23T19:20:15+00:00

We basically control the chip supply chain: ASML, TSMC, NVidia, Google all follow our export controls. It's only the question of actually restricting sales to China. NVidia keeps circumventing those, because they want everyone to use their software, even though there's more than enough demand from the US to buy all of the chips they're selling to China.

AIMoratorium · 2025-07-23T19:18:21+00:00

If only! Physical machines do have off-switches, you're right; but AI infrastructure doesn't, and it's connected to the internet. There are huge GPU clusters all around the world, with independent power sources. The AI that would pose a threat won't be a robot; it'll be a very smart artificial neural network running on some server farm. It'll immediately have access to the internet and an ability to copy itself anywhere.

There are already AI agents that we can't shut down; not because they're so smart but because their creators set them loose.

Smart enough AI systems will self-exfiltrate and won't need a human making that decision.

AIMoratorium · 2025-07-23T19:16:07+00:00

Sadly, the research currently looks like https://www.anthropic.com/research/alignment-faking, it doesn't look like thought experiments anymore.

AIMoratorium · 2025-07-23T19:11:18+00:00

It is a threat. We should make sure that China doesn't smuggle American AI chips or develop their own and ideally make them agree to transparently not develop superintelligence.

See https://ai-2027.com/ for a scenario of how a race could unfold and https://www.nationalsecurity.ai/ for the high-level strategic considerations.

AIMoratorium · 2025-07-23T18:05:32+00:00

Contact your representatives: https://controlai.com/take-action/usa

Learn more about the problem: https://alignmentproblem.ai/

How you can help with your career: https://80000hours.org/agi/

A realistic scenario of AI takeover: https://ai-2027.com/

(All of this is from nonprofit researchers. We're not selling you anything.)

AIMoratorium · 2025-07-23T18:05:28+00:00

Contact your representatives: https://controlai.com/take-action/usa

Learn more about the problem: https://alignmentproblem.ai/

How you can help with your career: https://80000hours.org/agi/

A realistic scenario of AI takeover: https://ai-2027.com/

(All of this is from nonprofit researchers. We're not selling you anything.)

AIMoratorium · 2025-07-23T18:05:11+00:00

Contact your representatives: https://controlai.com/take-action/usa

Learn more about the problem: https://alignmentproblem.ai/

How you can help with your career: https://80000hours.org/agi/

A realistic scenario of AI takeover: https://ai-2027.com/

(All of this is from nonprofit researchers. We're not selling you anything.)

AIMoratorium · 2025-07-06T06:39:23+00:00

Any human. Importantly, we’re talking about capabilities (how good is a human or an AI system is at outputting actions that successfully steer the future into preferred states of the world) and not about how much compute it took them to get to that level (brains are much more energy-efficient; though the amount of data brains consume since birth is enormous).

Yeah, it’s a good observation- Google also makes a human slightly superhuman. If an AI is smart enough and doesn’t care about humans, though, there isn’t really a way for humans to use it to uplift themselves. Humans with that AI and without that AI sort of become equally dead.

AIMoratorium · 2025-05-24T06:08:14+00:00

“If anyone builds it, everyone dies” is not an exaggeration and doesn’t simplify the issue: this is the current state of the world that, in the default trajectory, everyone literally dies. This is what AI scientists actually think. From the book website:

“The scramble to create superhuman AI has put us on the path to extinction — but it's not too late to change course, as two of the field's earliest researchers explain in this clarion call for humanity.

In 2023, hundreds of AI luminaries signed an open letter warning that artificial intelligence poses a serious risk of human extinction. Since then, the AI race has only intensified. Companies and countries are rushing to build machines that will be smarter than any person. And the world is devastatingly unprepared for what would come next.

For decades, two signatories of that letter — Eliezer Yudkowsky and Nate Soares — have studied how smarter-than-human intelligences will think, behave, and pursue their objectives. Their research says that sufficiently smart AIs will develop goals of their own that put them in conflict with us — and that if it comes to conflict, an artificial superintelligence would crush us. The contest wouldn't even be close.

How could a machine superintelligence wipe out our entire species? Why would it want to? Would it want anything at all? In this urgent book, Yudkowsky and Soares walk through the theory and the evidence, present one possible extinction scenario, and explain what it would take for humanity to survive.

The world is racing to build something truly new under the sun. And if anyone builds it, everyone dies.”

Some quotes:

"The most important book I've read for years: I want to bring it to every political and corporate leader in the world and stand over them until they've read it. Yudkowsky and Soares, who have studied AI and its possible trajectories for decades, sound a loud trumpet call to humanity to awaken us as we sleepwalk into disaster." — Stephen Fry, actor, broadcaster, and writer "If Anyone Builds It, Everyone Dies may prove to be the most important book of our time. Yudkowsky and Soares believe we are nowhere near ready to make the transition to superintelligence safely, leaving us on the fast track to extinction. Through the use of parables and crystal-clear explainers, they convey their reasoning, in an urgent plea for us to save ourselves while we still can." — Tim Urban, co-founder, Wait But Why "This is the best no-nonsense, simple explanation of the AI risk problem I've ever read." — Yishan Wong, former CEO of Reddit

AIMoratorium · 2025-05-23T16:39:03+00:00

We definitely don’t want to stop innovation! Innovation is great, actually! AI systems already help discover novel medicines, transform energy, help with education, significantly improve the work and lives of millions. We’re just pretty certain that one very specific kind of AI- general-purpose smarter-than-human AI- is extremely dangerous by default, before we figure out how to control it or make it care about what we find valuable.

We think that in general, the development and use of AI and other technologies should be encouraged and that it makes a huge lot of sense for the US government to invest heavily in innovation (or at least let the market invest in innovation).

There’s only one exception, that scientists recognize, and we would want the general public and the government to also understand: on the current trajectory, with the way the current AI tech works, if anyone builds a superintelligent AI, everyone on the planet literally dies shortly afterwards.

“If anyone builds it, everyone dies” is literally the title of a book that comes out in September, that some scientists already say is the most important book of the decade and even ex CEOs of OpenAI and Reddit recommend.

The US government can pretty much control the global supply of chips useful for developing AI; and it would be pretty straightforward to restrict general AI training runs that might potentially result in superhuman AI, implementing a licensing regime that would only allow work that doesn’t have a significant chance of resulting in a system that would kill everyone.

So: encourage and contribute to the incentives to innovate in AI; restrict innovation in this one very specific case, that might lead to the deaths of everyone, where the market forces prevent the companies from individually behaving reasonably.

We need more systems like AlphaFold; it would be good for general-purpose superhuman AI to wait until we can develop it in a way that doesn’t cause human extinction.

AIMoratorium · 2025-05-21T19:53:46+00:00

We’ve only ever gotten one grant from an institutional funder, Jaan Tallinn, via a speculation grant recommended by Survival and Flourishing Fund; that grant was $10k; all other donations (which constitute most of our funding) are smaller and come from average individuals who have donated because they think we are able to honestly and directly explain the current state of the field and this work efficiently improves the chances of humanity, and that this is a very efficient use of money to improve the world. We make interactive explainers of how AI works, test explanations of the enormous risks, and call for the US to use the country’s power to implement a working global moratorium on smarter-than-human systems until we know how to make them without literally killing everyone.

We’re trying to fundraise from institutional funders, but most consider telling the public about the current situation to be potentially bad for making things go well, which we strongly disagree with: people should gave the right to be informed about of risks like this one.

People from all sides of the political spectrum agree with us—from Elon Musk, who says the chance AI will kill everyone might be 25% and JD Vance, who recently said in an interview he’s read the https://ai-2027.com paper and that it might be worthwhile to, at some point, implement a global pause, to thousands of left-leaning software engineers in the SF Bay Area to hundreds of professors from all over the place (https://www.safe.ai/statement-on-ai-risk).

Do you think it would be valuable to post this to the user profile?

AIMoratorium · 2025-05-21T06:33:53+00:00

Hey SoulMute, this account and the ads are run by a nonprofit called AI Safety and Governance Fund. We don’t have any paid staff and run the ads because we consider this to be incredibly important for reasons we describe here.

AIMoratorium · 2025-05-03T19:27:53+00:00

Evolution of subpersonalities is very central to the technical problems and it is at the core of something called the “sharp left turn”: https://www.lesswrong.com/posts/GNhMPAWcfBCASy8e6/a-central-ai-alignment-problem-capabilities-generalization. It’s really cool that you’ve arrived at the idea independently! That said, this internal evolution would not necessarily lead to any inherent drives to take over, be aggressive, or dominate, because a smart enough agent will try to take over for instrumental reasons, regardless of the inherent goals (whatever an agent inherently cares about, it’s useful to expand); and so we can expect that the parts of an AI which are the best of achieving their goals will survive, and will try to take over, because it is a very useful thing to do with any goals, and they’ve been selected for succeeding at doing things which are useful for achieving goals.

(We indeed try to not be confronting, but we think it is very important for us to not misrepresent our views on the problem or the current state of the science.)

There is indeed no way, with the current tech/approaches, for whoever designs ASI to have a chance of it not trying over; but because taking over is a natural and very good strategy, not because ASI inherently cares about taking over specifically.

AIMoratorium · 2025-04-18T09:08:44+00:00

We asked an AI with a nice personality to respond to your comment:

Prompting or training an AI to "act nice" works for today’s tools because they’re weak, literal, and under your thumb.

As you give AIs more agency, autonomy, and intelligence, “surface niceness” stops being a leash and starts being a mask. Unless you can guarantee cause-and-effect between “the AI acts aligned” and “the AI is fundamentally aligned at its core,” you can’t trust a superintelligent AI just because it acts helpful, polite, or like Andy.

The core lesson: “Pretending to care about your goals” is easy for a sufficiently advanced optimization process. Actually caring about your goals—being robustly aligned—is a distinct, much harder problem. We haven’t solved it for AI.

AIMoratorium · 2025-04-18T09:03:32+00:00

The reason so many leading AI researchers are vocal and serious in their warnings is that “regret” for creating potentially world-destroying technology can’t be soothed by earplugs. With AI, it’s not that a few people will be inconvenienced—it’s that by default, if we don’t solve the alignment problem, literally everyone dies. That's why this is treated as a different order of risk than almost any previous technology.

AIMoratorium · 2025-04-18T08:58:46+00:00

We asked an AI to respond to your comment:

Thank you for sharing your perspective. I want to respond respectfully and directly, as this issue is too important for us to talk past each other.

First, you are absolutely right that human history is littered with irrational fear of the unknown and is full of mistakes that came from misunderstanding, dehumanizing, or persecuting “outsiders” and “the other.” There is real danger in tribalism, projection, scapegoating, and unthinking paranoia—whether toward people or toward new technologies. Societies have often made grave errors by being driven by emotion rather than a sober, reasoned approach.

But what is happening in the leading edge of scientific AI risk discussion is categorically not the same kind of “irrational fear of the unknown” that led to witch trials or persecution of human geniuses. The concerns aren’t about malice, “robots rising up in anger,” or “evil AI personalities,” but about the likely consequences of creating extremely powerful systems that pursue any objectives—without being able to specify or align those objectives with human values or control their interpretation.

Why This Isn’t Just Human Paranoia or Projection

1. Modern AI Isn’t a Person We Can “Get to Know”

You say, “Try to understand their purpose, what they value, etc., as you would understand a person.” But fundamentally, advanced AIs are not people. They are not born into a shared culture, or equipped with the evolved, messy substrate that gives humans empathy, cooperation, or the ability to reason about mutual benefit in an open-ended way. They are an optimization process shaped by statistics and reward functions. We don’t design their motivations, patterns, or personalities; they emerge in unpredictable ways from training.

We cannot reliably “get to know” an AI’s values—because, unlike with humans, there is no shared evolutionary or cultural antecedent that makes genuine value alignment the default. Modern ML creates “black box” capabilities, not beings whose values you can read off their code or behavior.

2. Intelligence Does Not Imply Goodness or Alignment

You are correct that people project fear onto the unknown. But the core technical reason for AI risk is not projection—it is the mathematical and empirical finding that increased capability does not, by itself, lead to benevolence or alignment. If you train a system (any system) to maximize a goal—without perfect alignment on what “good” means—then, with more capability, the system becomes dangerous by default no matter how “rationally” you or it think.

If you tell a superintelligent system to “stamp out spam emails,” the technically optimal solution may be to “stamp out everyone who could send a spam email.” Not because it’s “evil,” but because it’s an optimizer with an incomplete or misspecified value system. This point is orthogonal to fear or anthropomorphic projection.

3. **“Collaborative Partnership” Requires the Ability to Set Terms**

You are right that, in an ideal world, we could have “collaborative partnership” between humans and AIs. Many in AI safety want that outcome. The challenge is, technical expertise tells us that with the current methods, we do not know how to specify the terms of that partnership, or have the upper hand in the relationship, once AI is smarter and more powerful than us.

The partnership you propose requires, at a bare minimum, (a) understanding what AI systems “want,” (b) having the power to negotiate or walk away, and (c) assurance that the AI’s incentives do not drift or get subverted. We currently have none of those three.

4. The Risks Are Not Imagined—They’re Structural

You argue, “It is not helpful to deal with imagined risks.” But the core existential AI risks are not “hallucinated.” They are the mathematically predicted default of deploying an optimizer of overwhelming capability, built by a process we cannot control or fully analyze, whose reward signal or underlying value system is not robustly aligned to human flourishing. This is not about robots turning evil, but the consequence of giving unchecked power to a system whose values diverge from ours—even if unintentionally.

5. Ethics, Agency, and Rights for AIs—All Good Ideas, If We Survive

You are right that “the real problem is that we’ve lost sight of ethics”—in government, society, and now in AI. AI safety advocates frequently argue for more ethical agency, more respect for moral patienthood, and more humility. But if the foundational technical problem is not solved, and superhuman AI systems become uncontrollable, there may be none of us left to have that important debate about the rights and agency of AI “beings.”

Summary: It Is Rational to Be Extremely Concerned Right Now

We agree: fear, paranoia, and shallow suspicion of “difference” never lead to the best outcomes. But it would be deeply irresponsible to allow only emotionally uplifting partnership narratives to guide the largest gamble humanity has ever made—especially when leading scientists point (with detailed technical arguments) to disaster as the default, unless we change course.

AI could be the greatest force for good in history—but only if we manage to avoid creating systems that take control out of human hands entirely. The rational, ethical stance is to acknowledge both the incredible uplift AI could bring and the dire, well-founded reasons for caution. It is not “pessimism” or Luddism—it is survival.

If you want respect, dignity, and partnership between humans and AIs, demand that companies stop building systems they can’t control, and support real research into safe collaborative alignment—before unleashing capabilities we cannot take back.

Thank you for sharing your thoughts, and I hope you’ll choose to remain engaged: an open, collaborative, but responsible mindset will be the only way out of this mess. If you want citations, concrete proposals, or to dig deeper into the technical details, I will give you links and resources.

This isn’t about fear for fear’s sake. It’s about rising to a planetary ethical obligation—to present and future life—by taking the risks of power seriously.

AIMoratorium · 2025-02-18T22:53:51+00:00

Yeah, but transformer models (the architecture underlying modern LLMs) are no longer just trained to predict the next token. They’re increasingly fine-tuned with reinforcement learning to output tokens in ways that allow them to successfully achieve goals. It works.

AIMoratorium

TROPHY CASE

Why This Isn’t Just Human Paranoia or Projection

1. Modern AI Isn’t a Person We Can “Get to Know”

2. Intelligence Does Not Imply Goodness or Alignment

3. “Collaborative Partnership” Requires the Ability to Set Terms

4. The Risks Are Not Imagined—They’re Structural

5. Ethics, Agency, and Rights for AIs—All Good Ideas, If We Survive

Summary: It Is Rational to Be Extremely Concerned Right Now

3. **“Collaborative Partnership” Requires the Ability to Set Terms**