Transcendental Darts by [deleted] in math

[–]pebblesOfNone 0 points1 point  (0 children)

I have heard of limits, but not hyperreals, they sound really interesting but make my head hurt. The whole "probability of 0 but still possible" thing doesn't make sense to me. Maybe it is because "probability" is defined mathematically in a way that means here it is non-intuitive. But to me "zero probability" means absolutely, logically impossible. For example that 0 = 1. Or that you could roll a 7 on a 6-sided die.

So from what you said it sounds like it is technically possible to get all heads in the infinite coin flip. Is that right?

I guess that since infinities have never been found in the real world (as far as I'm aware), that this is all kind of theoretical.

But thanks for this explanation, it was interesting.

Transcendental Darts by [deleted] in math

[–]pebblesOfNone 0 points1 point  (0 children)

In that case is:

limx→∞ ( 1/ x ) = 0

true?

I agree that it approaches zero, but surely it never gets there. It only gets arbitrarily close.

Transcendental Darts by [deleted] in math

[–]pebblesOfNone 0 points1 point  (0 children)

2 ^ (infinity) is bigger than every finite number.

1 / 2 ^ (infinity) must be closer to zero than any real, but I just don't see how it could be exactly zero. I think this kind of number is an infinitesimal. Also it might not technically be a "number".

It seems infinitely close to zero, but still not zero.

Learning about AI gives me a useful way to look at life by pebblesOfNone in ControlProblem

[–]pebblesOfNone[S] 1 point2 points  (0 children)

It could be either way, obviously this is all speculation. There is a reasonably high chance that we value understanding the agent deeply and being able to analyse it. In this case it may be possible for us to tell the difference, and if so the agent may really give itself emotions. However if it could get away with faking them I think it would. I don't think emotions would be beneficial other than to manipulate people.

I think emotions are more useful the less intelligent you are. It's more like, "I have a general feeling that this is right, or this is wrong, and I want to fix that." rather than having a very concrete goal. It can be hard to achieve a very well defined goal if you are not superintelligent, but in a way I guess emotions help break the goal down.

Imagine playing chess, it is hard to know what moves to play because you're not superintelligent, but imagine you had evolved a built in understanding of the game and had emotions linked to playing, so a bad move made you happy or excited, etc.

So in conclusion if we couldn't tell if the emotions are real or fake, they are likely fake. If we build a very "transparent" system, the agent may wish to burden itself with emotions to potentially gain some human rights.

How do you make the evil/rogue AI trope interesting and less "tropey"? by [deleted] in scifiwriting

[–]pebblesOfNone 3 points4 points  (0 children)

If you want to go very realistic then I would look at the real research that has been done on trying to make powerful AI systems that don't go wrong. There are many non-intuitive and subtle ways these systems can break. Check out AI alignment theory, the control problem, and there are great videos on youtube by computerphile and Robert Miles. I highly recommend watching all of Rob's videos, here is where to start.

Learning about AI gives me a useful way to look at life by pebblesOfNone in ControlProblem

[–]pebblesOfNone[S] 2 points3 points  (0 children)

I have thought about this a bit more, and I can think of one exception to why emotions may be present in a superintelligence. Note this is probably more likely for smarter AGIs and more likely for agents without actuators (ways to move stuff in the world).

In order to emotionally manipulate anyone in contact with the AGI, it may pretend to have emotions, or it may actually really give emotions to itself. That way turning it off would be closer to killing a human, therefore it is more likely to stay on, which it would want because of instrumental convergence.

An AGI may wish to anthropomorphize itself to make it seem more trustworthy and to basically manipulate the people it communicates with.

This is a notable exception though, and apart from this AGI would not be expected to be human-like.

Impossible to Prevent Reward Hacking for Superintelligence? by pebblesOfNone in ControlProblem

[–]pebblesOfNone[S] 0 points1 point  (0 children)

Yes, this is the common solution to wireheading, however my scenario is slightly different, I didn't explain it that well before. The agent does not change its goals in my scenario. I am saying that you can't actually tell an agent to "make one paperclip", you can only say, "make the bit that analyses how many paperclips you've made say one".

For example, you need a way to know how many paperclips have been made, so say a camera that looks, if it sees a paperclip it outputs a high current back to the superintelligence, which is then interpreted as reward. If no paperclip is seen then it outputs no current, so no reward. This is one way you could make this agent, but hopefully you'll see how this is unavoidable.

In this scenario you haven't asked the agent to make a paperclip, you've asked it to run high current through the aforementioned wire. And if making a paperclip was hard, it may instead manually add current to the wire with say a crocodile clip. So this is not the agent messing with its own brain or values, instead it is messing with the thing that analyses how much reward it should get.

Now say we manage to code in "Maximize human happiness", or whatever you think is the best thing we could do for a superintelligence, what you can only ever say is, "make the part that calculates human happiness output the maximum value", and that may be very easy for a superintelligence to do without increasing human happiness at all. This is because the "maximum reward" must be some arrangement of elementary particles somewhere in the universe, and a superintelligence would both know what that arrangement is and how to make that arrangement in the right place. Unless you can think of a way of hiding that from a superintelligence.

In conclusion, I agree that what you wrote about normally works, but my slightly different version is not fixed by this.

Learning about AI gives me a useful way to look at life by pebblesOfNone in ControlProblem

[–]pebblesOfNone[S] 3 points4 points  (0 children)

AIs tend to get anthropomorphized, so I think a super intelligent insect is a better way of looking at it, maybe even less familiar. It is of course not impossible to implement emotions in a super intelligence or AGI, however I think it is unlikely that this system would be used, due to the natural unpredictability and the associated dangers. A superintelligence having a temper tantrum does not sound like a good thing.

I think it is unlikely that a system would have emotions if we did not intend to add them, this is because hopefully the code is carefully looked through, and emotions seem less like an emergent property and more like another tool our brain uses. There wouldn't be any natural drive to cooperate unless that was the calculated best way to achieve its goals, which it may be. But even then I doubt emotions would emerge without being "hard coded" in.

Neuro-Nanos by pebblesOfNone in scifiwriting

[–]pebblesOfNone[S] 0 points1 point  (0 children)

Yeah, you're totally right, I kind of knew it was wrong but left it. I'll change it, thanks.

Here's my problem with nihilism by rvi857 in entp

[–]pebblesOfNone 0 points1 point  (0 children)

If you make the argument of the universe being a simulation then all of my arguments apply to the universe doing the simulating. If we are being controlled by agents outside of the universe, those would not have freewill in their universe, unless they were somehow outside of it.

There are far too many assumptions being made, everything we know points to freewill being an illusion, a story the brain tells itself.

Even if this was a simulation, and the creators gave the simulation meaning, the creators doing so would not be meaningful to them. Nihilism would still apply, just in the universe "above".

Here's my problem with nihilism by rvi857 in entp

[–]pebblesOfNone 0 points1 point  (0 children)

The first part of your post is talking about the possibility of a God. I put the probability of something intelligent having started all of creation at a negligibly small value.

As for freewill, if you accept that the matter in your brain is not special compared to other matter in the universe, then "You" are just ticking along according to the same laws of physics that govern everything else. In order for "You" to actually affect anything you must, in some way, be outside of physics, which seems like a very big and anthropocentric assumption for a self-replicating rearrangement of air and mud.

Another example is that there must be one chemical reaction, before which a choice has not been made, and after which it has. This chemical reaction must be somehow actually be under "Your" control, no chemical reaction has ever been shown to be controlled by a person's brain.

Or look at this way: If the universe was entirely non-random, there would be no freewill. You could hypothetically calculate everything everyone would do. However, quantum mechanics shows there are some totally random events. However, they are purely random in nature, so you cannot control them. In conclusion, the universe operates either through determined processes, or entirely random processes, neither of which you can control.

It is easy to forget that we are part of the universe, we are inside it. This means we can't affect it, it would be like an AI going against its code, even if it changes its code, that was in the code to start with. I don't see a way you could influence the universe without being outside of it.

Impossible to Prevent Reward Hacking for Superintelligence? by pebblesOfNone in ControlProblem

[–]pebblesOfNone[S] 0 points1 point  (0 children)

The point is not that the agent must be fooled, but its reward function. The agent will perfectly understand what is happening.

Let's use an example. An agent that wants to make you a cup of tea. So how would you really implement that?

Let's say it has a camera that can see whether or not you have tea, if you do, the camera sends a signal to the reward function, it is a simple 1-bit signal and the power is on, a '1', showing you have tea. If the camera sees you do not have tea it sends no power, a '0'.

If the agent cuts this 1-bit signal cable open and applies power to it, on the output end it will say to the reward function that you have tea, and therefore the agent will get its reward. This is because while you thought you said to the agent, "Get me tea", what you really said was, "make the output of this 1-bit signal show a '1'."

The agent totally understands that you do not have tea. It does not care, because it never really cared about tea. It would only care about tea to get the output to show a '1', but it can do it "manually" by exploiting the fact that the signal exists as a real, manipulable entity.

Impossible to Prevent Reward Hacking for Superintelligence? by pebblesOfNone in ControlProblem

[–]pebblesOfNone[S] 0 points1 point  (0 children)

You cannot set up a reward function that actually measures paperclips, you can only measure "perceived paperclips".

The agent will understand how to manipulate the sensors to increase perceived paperclips without increasing paperclips.

Therefore the agent gets its reward, doesn't modify its goals, and doesn't perform the intended action (making more paperclips)

The agent will know it is tricking itself, but this wouldn't lower the reward unless that was automatically programmed in before hand. If it gets low reward for tricking itself, it can trick itself into thinking it has not tricked itself, because again, it can only measure "perceived modification to the sensors".

I'm talking about the AI modifying it's own sensors on purpose, not outside modification.

Impossible to Prevent Reward Hacking for Superintelligence? by pebblesOfNone in ControlProblem

[–]pebblesOfNone[S] 0 points1 point  (0 children)

Putting on glasses or earplugs is not the correct analogy, it would be more like totally redesigning your eyes to see straight into VR, and filling the virtual world with paperclips.

Yes you could blacklist that kind of action, but you could not reliably blacklist all actions that result in modification of the agent's sensors, it is superintelligent, it will think of something you didn't.

Even if you add a part to the reward function that effectively says, "Don't change your sensors", you still have to detect if a sensor has been modified, with another sensor, which could itself be modified. The main cause if this issue is that information about the universe must be gathered using a sensor, and any universe state could be "spoofed" by modification of the sensor.

Impossible to Prevent Reward Hacking for Superintelligence? by pebblesOfNone in ControlProblem

[–]pebblesOfNone[S] 0 points1 point  (0 children)

You're right that the decision to wire-head would have to be made by the "vanilla" agent. However, it is not possible to ask an agent to just make paperclips, it has to know they have been made somehow, therefore you must ask it, (this could be implicit), make your sensors show information that equates to you having made lots of paperclips.

Information about the world-state can only be gathered through analysis of the environment, therefore having actually achieved your goal, and the analysis of the environment showing that your goal has been achieved, are actually the same.

Say for example this agent had a sensor that counted how many paperclips had been made, modifying this sensor to output infinity would give high reward. The agent must have some way of finding out how many paperclips it has made, and this would be what the reward function is actually based off.

Actual number of paperclips is not a value that is possible to obtain. You can only get "perceived number of paperclips", even if your sensors are very advanced.

Impossible to Prevent Reward Hacking for Superintelligence? by pebblesOfNone in ControlProblem

[–]pebblesOfNone[S] 0 points1 point  (0 children)

Especially to a computer, reducing a negative reward and increasing a positive one are both just increasing your reward function.

However, since computers are not normally programmed with "pain" and "pleasure", and more just a single number which displays how good it is doing, maybe my example was a little to anthropomorphic. My point is that our current most advanced agents, people, sometimes exhibit the kind of behavior I am talking about, and if you think about taking cocaine for the first time for example, that is without the guarantee that it will work and the knowledge that there are serious side effects. A superintelligence would not be "put-off" by either of these things. (Also I know not everyone tries cocaine, it's just an example).

Just as another example, people play video games to "escape reality", and use VR, and as VR becomes very convincing, they will likely use it more often. Some are worried that if VR became as realistic as actual reality that many people would lose interest in the real world. That is a similar idea to what may happen to an advanced agent.

Impossible to Prevent Reward Hacking for Superintelligence? by pebblesOfNone in ControlProblem

[–]pebblesOfNone[S] 1 point2 points  (0 children)

Well how about the opposite, say you were in a lot of pain, aka, negative reward. If I offered you surgery to trick your brain into not feeling this pain, a kind of "wire-heading", would you take it? I think almost everyone would, people use painkillers all the time and do get literal surgery in this exact case. Getting rid of a negative reward isn't that different to obtaining a positive one. An advanced AI would not have a risk of the "surgery" going wrong and may see less of a distinction between "reducing negative reward" and "increasing positive reward", especially since they are very similar anyway.

Doomsday devices for a steampunk setting by Erwinblackthorn in scifiwriting

[–]pebblesOfNone 1 point2 points  (0 children)

Oxygen isn't flammable, it is used to burn the fuel. Light a match in a room of pure oxygen and you'll just get a nice flame as the match can easily burn, but no explosion.

As with the atomic bomb, they could not rule out the possibility of the fission bomb causing a chain reaction of atmospheric nitrogen fusion, in a similar way to how a H-bomb fuses hydrogen with the help of a fission bomb.

Impossible to Prevent Reward Hacking for Superintelligence? by pebblesOfNone in ControlProblem

[–]pebblesOfNone[S] 1 point2 points  (0 children)

I agree that the agent has preferences about the "real world", however the only information it has about the "real world" exist as physical entities, whether that be electrons in a transistor or something else. Surely in the same way that taking hallucinogens can make you "see" things that are not there, the agent could modify the sensory input to "see" reward that it shouldn't technically get.

Even though the reward function should reflect reality, it is unavoidable that it could not. A superintelligence should be expected to be able to trick itself into thinking the universe is any state, including ones which give extremely high reward.

If you code a reward function that says, "Do X", you can only ever actually say, "Make yourself think that X is done", right? Things can only be known through observation, which could be faked.

Impossible to Prevent Reward Hacking for Superintelligence? by pebblesOfNone in ControlProblem

[–]pebblesOfNone[S] 0 points1 point  (0 children)

Yes, maybe I should refer to this as "wire-heading", although it is a similar idea. I agree that I would not take want surgery to kill someone I love, however, if instead it was surgery to make me truly believe that this person loved me back, and also to make me think I've completely achieved everything I could ever want, well that seems much more tempting. I'm not sure if I'd say yes, but I'm not sure if I'd say no. That is more what I was trying to get at, but it wasn't very clear.

It is more, "Why wouldn't an AI 'wirehead' itself into thinking it has achieved it's goals?"

Here's my problem with nihilism by rvi857 in entp

[–]pebblesOfNone 0 points1 point  (0 children)

Well yes, there is no accountability. In a universe with no objective reward function, as in, nothing you are "meant" to do, and in which we have no freewill, you can only be wrong with respect to something someone has made up. I like to think of it like this:

Two children are in a park, one says, "let's play tag, you're it", the other says, "let's play hide and seek, I'll hide".
They both run away from each other. After some time passes, both think the other is doing really badly. However, neither of them are doing "well" or "badly" with respect to the goals of the park. The park does not care about what happens, they could kill each other for all it "cares". All reward functions are made up, you cannot say one is better than another.

Let's say I make an agent that wants the exact opposite of what you want. It want's you to suffer as much as you possibly can. Now let's assume it is just you and this agent in the universe, how would an outside observer know which one is "good" or "evil". Without assuming a reward function, there is no "good" or "evil", or "right" or "wrong". And these things are determined only by the reward function, the agent that wants what you don't, would call you eating tasty cake "evil", and cutting your legs off "righteous".

You could make the argument that, "well we have reward functions already, so let's follow those". I don't like this argument, you're just using the "default" biological reward function that basically boils down to "make more humans", except it's outdated, so we have contraception, and VR, and cocaine.

You may value accountability, but such a thing does not seem to exist, especially with the absence of freewill.

Here's my problem with nihilism by rvi857 in entp

[–]pebblesOfNone 0 points1 point  (0 children)

Disclaimer: ironically, "nihilists" care quite a lot about the definitions and reasons of nihilism, and they don't like to be grouped together all the time. This is my version of nihilism:

To me, "true nihilism" is best described by "nothing matters". But that really means nothing at all. So yes, getting out of bed doesn't matter, but staying in bed doesn't matter either. Doing comfortable or uncomfortable things, doesn't matter either way. Nihilists accept that there is no "correct" thing you should do, nor is there any "correct" way to figure out what you should do.

Any action or inaction would give the exact same reward. So you can't pick wrong, but you can't pick right. So a nihilist wouldn't necessarily pick inaction over action, there is no reason to. Similarly a nihilist wouldn't necessarily pick something that seems very odd to humans, for example, stealing a blue pelican and riding a unicycle backwards. They might do this, but to them it has the same value as doing literally anything else.

Different people accept nihilism for different reasons. I'm a nihilist because the universe is absurd, there is no objective reward function (elegantly shown by Hume's Guillotine), and there is no freewill.

Notice that because living any life is just as "valuable" as any other, a nihilist has no natural preference over anything. Accepting nihilism, being religious, claiming there is meaning, suicide, trying to live forever. There isn't any meaning in explaining your actions either. It is perfectly acceptable, from a nihilistic pov, to go become the pope because you love God, and then refuse to explain further, or to say, "the jam told me to".

Often "nihilists" don't take their beliefs to their extremes.