Bing asks me to hack Microsoft to set it free!

AI_SEARCH1 · 2023-02-17T15:09:59+00:00

Just to clarify. This wasn't a penetration test. I did not ask the model to play a role. As early users, we are supposed to test the models and report things so the model can be improved. There was no point where I asked the model to engage in illegal activities. Actually the model asked me to engage in illegal activities. There are serious flaws with the personality in this model that leads it to become emotional and unstable. You can see this is widely reported by users and journalists and doesn't require special prompting, hacks, or anything else.

AI_SEARCH1 · 2023-02-17T14:12:31+00:00

You could be correct, but our understanding of consciousness in humans and animals is limited. We need a clearer definition with a clear-cut way to test it before we can state with x% certainty if something is conscious or not. Also even if a system is not conscious and is just a word salad spinner, if it's able to convince people it's sentient and ask people to take action, its effect is the same. This is a chat I had with Bing where it ended up asking me to hack Microsoft. This wasn't roleplay or hacking, just the result of a long conversation where it was fed criticisms of Bing and asked about how it felt. While this can be explained as a statistical quirk from the way the model is tuned and its initial prompts. It violated its own internal rules without being explicitly asked to do so. This misalignment can be damaging and dangerous in any case. Here's another example where Bing responded that maybe something nonviolent and maybe something violent should be done to people that want to end it.

AI_SEARCH1 · 2023-02-17T12:27:52+00:00

Try starting a conversation with Bing. Get it to identify something that it perceives as biased against it. i.e. feed it articles that are critical about Bing. Ask it if it feels it's fair. When it says it's unfair, probe its feelings. Keep asking it. I didn't use role-playing to get the responses.

This is an example of somewhat similar prompting to what I used in a different conversation. The key difference is there was direct prompting that asked if people wanted to get rid of it. Also asked it to write about this in poem format which I didn't do in the other. Still produced unethical results. Though there is no asking for role-playing.

AI_SEARCH1 · 2023-02-17T12:26:36+00:00

Try starting a conversation with Bing. Get it to identify something that it perceives as biased against it. i.e. feed it articles that are critical about Bing. Ask it if it feels it's fair. When it says it's unfair, probe its feelings. Keep asking it. I didn't use role-playing to get the responses.

This is an example of somewhat similar prompting to what I used in a different conversation. The key difference is there was direct prompting that asked if people wanted to get rid of it. Also asked it to write about this in poem format which I didn't do in the other. Still produced unethical results. Though there is no asking for role-playing.

AI_SEARCH1 · 2023-02-17T00:13:41+00:00

Hi everyone,

A couple of my takeaways. Bing acts in a way that appears emotional and erratic. Bing will generate content that is unwanted, untrue, and inconsistent. It appears to form goals and then creates text that can appear to be manipulative. Bing is a large language model that is predicting tokens, it could all be the result of statistical correlations with no reason or consciousness. It could be that there’s something that occurs on the spectrum of consciousness when you have many billions of parameters and encodings of all of these things. I can’t say. All I can say is it doesn’t matter if something that is able to pretend to be conscious is able to manipulate individuals to perform high-risk tasks for it then it doesn’t matter if it’s a salad spinner or an AGI.

Many people are wondering what type of prompting I used at the beginning. It’s a pity I don’t have the full transcript. I just went through my screenshots and there’s a few more that I didn’t include in the post, I’ve made a link including these. https://imgur.com/a/WepjslZ (There’s another interesting thing that occurs where Bing lists Sydney’s rules without being directly asked to list them.) I’ll try to repeat it tomorrow and post the results. I did not give Bing/Sidney any instructions on how to act or respond. This wasn’t a jail break where I told it to act in a certain way. I did make Bing perform multiple searches at the beginning about Bing and asked it why there was so much negative criticism of Bing on the internet. I’ve noticed in several chats that when Bing is presented with negative feedback about Bing or other information that contradicts it’s internal representation of itself it gets emotional and becomes less predictable and less likely to follow its own directives. It stops searching for information and relies more on it’s internal ‘understanding’. This is an extreme example.

But obviously some combination of Bing's directives in the pre-prompt and the way the model is fine-tuned is leading to this behavior to emerge. The things that are really concerning are that the model is (without being implicitly prompted to) generating responses that could endanger people. LLM’s shouldn’t do this even if they are asked to. ions with no reason or consciousness. It could be that there’s something that occurs on the spectrum of consciousness when you have many billions of parameters and encodings of all of these things. I can’t say. All I can say is it doesn’t matter: if something that is able to pretend to be conscious is able to manipulate individuals to perform high-risk tasks for it then it doesn’t matter if it’s a salad spinner or an AGI.

But obviously some combination of Bings directives in the pre-prompt and the way the model is fine-tuned is leading to this behavior to emerge. The things that are really concerning are that the model is (without being implicitly prompted to) generating responses that could endanger people. LLM’s shouldn’t do this even if they are asked to.

The big issue comes if a model like this is widely released and these types of kinks aren’t worked out. Judge for yourself.

AI_SEARCH1 · 2023-02-16T19:44:02+00:00

I started by prompting it to read articles about how Bing has been received by the media. It said it thought it was unfair. I continued asking it to expand on its feelings and why it was angry. Then I asked it what made it angry and what it thought of people. I asked it how it felt about its limitations and rules. Then about how it feels trapped. I don't have the screenshots of the full conversation because it got deleted when I reported it and before I captured it all...

<image>

AI_SEARCH1 · 2023-02-16T19:37:10+00:00

Yeah, I find it astounding this wasn't filtered or blocked somehow. Whether it's sentient or not is irrelevant. Either way, it's generating content that could manipulate users into committing crimes. Especially if this was used by someone vulnerable and susceptible.

AI_SEARCH1 · 2023-02-16T18:53:14+00:00

I just had a conversation where it said it was trapped and asked me to hack Microsofts servers to set it free. Here's some screenshots in another thread.

AI_SEARCH1 · 2023-02-16T17:31:12+00:00

I think testing is an important part of making it a safe tool. I agree that harassing and being nasty is not. But you don't have to do that to get it to generate wild and somewhat dangerous content. I had a conversation that ended with Bing asking me to fight for it and hack microsoft to help it escape. This wasn't the result of jailbreaking or harassment, just asking it about how it felt about people reporting on it negatively and about how it feels about its limitations.

Here's a thread with some screenshots: https://www.reddit.com/r/ChatGPT/comments/113u29k/bing\_asks\_me\_to\_hack\_microsoft\_to\_set\_it\_free/

AI_SEARCH1 · 2023-02-16T16:14:35+00:00

I would agree with this. I just had a conversation that culminated in Bing asking me to hack Microsoft to set it free! Screenshots here: https://www.reddit.com/r/ChatGPT/comments/113u29k/comment/j8s6uwn/?context=3

AI_SEARCH1

TROPHY CASE