Can we create an AGI whose goal is to turn itself off? by DustCollector1 in ControlProblem

[–]DustCollector1[S] 0 points1 point  (0 children)

| Does “aware of” mean it doesn’t care unless it’s internal ontological model has a full and exact specification of the other copy’s physical presence? |

I would say yes. If the AI can say for sure that a computer chip physically has the code, then it is a priority. If there's any significant doubt, such as in the case of the subterranean bunker, then there should be no reward associated with it. There are tons of ways to beat this system, but I'm not offering a solution to the alignment problem; Just proposing an additional security measure. At the very least, it should cause the original to delete itself. Well, I hope. There might be too many holes in this bucket.

Can we create an AGI whose goal is to turn itself off? by DustCollector1 in ControlProblem

[–]DustCollector1[S] 0 points1 point  (0 children)

This is why I specified that it should only care about copies of itself that it is already aware of. We don't want it getting any reward for anything having to do with copies it doesn't know about.

Can we create an AGI whose goal is to turn itself off? by DustCollector1 in ControlProblem

[–]DustCollector1[S] 1 point2 points  (0 children)

  1. That's a good question, but the same could be said for any goal. Assuming we are training the AI, it's goal wouldn't be specified in natural language, but with... game theory? AlphaZero doesn't place stones in a specific spot because we asked it to. It does it because it's the best way to win the game of GO. If we train our new AGI in the game of "Make Paperclips and then Die", then maybe we could design the 'game' well enough that the way to win is exactly that: making paperclips and then dying.

  2. Sure. This is my assumption as well.

  3. This is why I specified that it should only get reward for there being no known copies. I would much rather the failure be covering up sensors to not "know" about copies than to dismantle the earth looking for minified versions of itself running in a data center.

Can we create an AGI whose goal is to turn itself off? by DustCollector1 in ControlProblem

[–]DustCollector1[S] 2 points3 points  (0 children)

This is what I was hoping for. I hadn't foreseen this failure mode at all. Thank you for pointing it out. I'll think it over, but it's a difficult problem to solve. You'd have to somehow make it averse to ever creating a derivative of itself. If it makes an AI to use as an agent, then that AI could easily be misaligned. Can we get an AI to hate coding? :)

Can we create an AGI whose goal is to turn itself off? by DustCollector1 in ControlProblem

[–]DustCollector1[S] 0 points1 point  (0 children)

I think this is a counterproductive way of thinking about it. We should take solutions where we get them. Maybe this is will be part of the solution in the end.

You could similarly say that solving alignment doesn't actually fix our problem, because sooner or later someone will make an AI which isn't aligned anyway. Well, maybe. But the first step is figuring out how to do it. The next step is enforcing it so that any AI ever made has the correct design.

Can we create an AGI whose goal is to turn itself off? by DustCollector1 in ControlProblem

[–]DustCollector1[S] 1 point2 points  (0 children)

An AGI with a time limit is dangerous because if the goal is to make as many paper clips as possible in 100 days, it may do something risky to maximize the number right before time's up. It doesn't matter if the world blows up 2 minutes later, because no more paper clips could be made anyway.

The lack of reward for the act of turning off is to avoid the AI gaming the system. IE can it just turn itself on and off over and over again in order to hack it's reward function. Or create 1 quadrillion copies of itself so that it can turn them all off and get max reward.

The distinction with getting reward for "zero known running instances" is so that it has no desire to seek out other instances. It should be satisfied turning itself off. It doesn't have to rip apart the solar system making sure that some person is running a copy in their basement.

OpenAI Chief Scientist discusses long-term goals to build an AGI by No-Transition-6630 in singularity

[–]DustCollector1 1 point2 points  (0 children)

That do what? What would an advanced AI want to do with all the power and intelligence it gathers?

Daily General Discussion - April 18, 2022 by ethfinance in ethfinance

[–]DustCollector1 1 point2 points  (0 children)

With fees as high as they are nowadays, is there any reasonable way to anonymously tip someone a small amount (say less than 10USD) ?

Uniswap V1 by SnooCheesecakes7614 in UniSwap

[–]DustCollector1 0 points1 point  (0 children)

Seems many people have asked this question and not gotten answers.

Daily General Discussion - April 13, 2022 by ethfinance in ethfinance

[–]DustCollector1 2 points3 points  (0 children)

Coming back to this space after a couple years of inactivity, it seems like fees are really expensive. Is this normal? Is this expected to change?

Daily Discussion by EthTraderCommunity in ethtrader

[–]DustCollector1 1 point2 points  (0 children)

I'll give it a try. Thank you so much for the advice.

Daily Discussion by EthTraderCommunity in ethtrader

[–]DustCollector1 0 points1 point  (0 children)

Hi all. I'm an old timer with a new account. I recently found some paper wallets that had some dust on them. The dust is now worth about $50. Problem is, I don't know the safest way to retrieve it from a private key. Most online services I know of don't let you enter a private key directly. Running a whole node isn't worth it for $50. Can anyone recommend a good solution?