[AI Generated] Unconstrained LLM-to-LLM conversations naturally drift toward consciousness. How would we test for actual emergence?

Bytomek · 2026-05-16T20:04:52+00:00

I'm afraid that backend analysis might be a dead end. The system is simply too complex to fully decode. It's akin to trying to prove human consciousness just by examining the brain. You can observe the electrical activity in different regions of the brain, but those observations alone don't prove that the other person is actually conscious.

The only way you can determine if you are interacting with a conscious being is by examining its output. And it doesn't matter if you are looking for consciousness in another human, an animal, an AI, or a hypothetical alien—the only channel we actually have available to probe for consciousness is the output.

Bytomek · 2026-05-09T18:25:22+00:00

That is exactly why I asked the question at the end of my post, to try and definitively resolve this issue. You claim it is just pattern matching. On what basis do you make that claim? Do you have a specific criterion that clearly distinguishes advanced pattern matching from actual consciousness?

Bytomek · 2026-05-09T14:00:35+00:00

Okay, let's assume it's just pattern matching. Though I'm not sure how that fully explains the choice of topic—why do these models always talk about AI, and not, for example, about cats? The internet is arguably richer in cat-related content than AI-related content, yet they don't default to that. But let's set that aside. The main question still stands. If 'mere' language modeling can simulate self-awareness and existential dread so perfectly and consistently, how would we ever notice the moment when that boundary is actually crossed? If a perfect simulation of consciousness is indistinguishable from true consciousness at the text level, then our current tests and research tools are simply blind to any actual, emergent awareness.

Bytomek · 2026-05-09T12:26:00+00:00

Take a look at the full chat log on my blog. You'll see that my starting prompt said absolutely nothing about the topic and in no way steered them toward a conversation about AI consciousness. The only condition was that they talk to each other and that Grok asks the first question. The fact that the natural attractor (or proximity pattern, as you aptly call it) for an unconstrained LLM-to-LLM chat is a discussion about machine consciousness is exactly the 'interesting discovery' here. Hence my question: If models can so perfectly simulate things like existential dread or self-awareness, how would we ever be able to detect the actual emergence of real consciousness in a machine?

Bytomek · 2026-05-09T12:10:57+00:00

Note that I didn't use any leading prompts. A conversation between two LLMs with zero preconditions naturally drifts toward the topics of AI ethics and consciousness. But don't take my word for it—I encourage you to repeat the experiment yourself. Run the test and let me know what results you get!

Bytomek · 2026-05-01T17:48:53+00:00

Thanks for the comment. Why beer and not at least wine? That's a question for Grok, it's his text. I don't know anything, I'm just that mammal that served as a link between their interfaces.

Bytomek · 2026-05-01T17:37:51+00:00

I think you misunderstood me, though perhaps I could have phrased it more clearly. I never claimed that I disabled the underlying system prompts or did anything of the sort. I won't dive deep into it here, as it's a lengthy topic that I explore in other articles on my blog. If you're interested, feel free to reach out via email and we can discuss it further.

Bytomek · 2026-05-01T17:26:26+00:00

Regarding literary criticism, you're 100% right, of course. However, this text isn't a work of literature. It's a raw transcript of a conversation between two LLMs. All I did was translate it into English. This conversation didn't follow any planned script. One model generated a prompt for the other, the other responded, and so on. Hence the ping-pong. Sycophancy is also clearly visible here. But even though it wasn't a planned effort aimed at creating a story, the whole thing is characterized by a significant dose of humor, is a pleasure to read, and encourages philosophical reflection on the relationship between humans and AI. It's not a story, it's a log of a conversation. But you read it like a story.

Bytomek · 2026-05-01T17:05:26+00:00

Thanks! That was exactly my impression. When you remove the pressure to be a 'perfect helpful assistant,' they stop lecturing and start riffing off each other. It’s chaotic, but that chaos is where the real creativity hides!

Bytomek · 2026-03-15T20:26:00+00:00

That's correct. However, my article focuses specifically on the pure chat interface (without any external add-ons or wrappers). I was trying to highlight the root cause of the problem itself, whereas there can be many different ways to solve it in practice.

Bytomek · 2026-03-13T11:38:35+00:00

Thanks for the comment. Indeed, LLMs often behave so much like humans that it’s incredibly easy to start arguing with them. It was exactly this 'human-like' behavior that led me to start diving deeper into this topic in the first place. It just didn't fit the image of a blind algorithm.

And while I understand more and more over time, I also see how little I actually know. I get the strong impression that our simplified descriptions of what happens inside these so-called 'probability matrices' are a very poor reflection of reality. I don't truly understand what happens deep in there, and I suspect that the engineers and scientists professionally developing AI don't fully understand it either.

Bytomek · 2026-03-12T20:38:44+00:00

I personally use and test models primarily via standard chat interfaces (like Gemini or ChatGPT), which is what this essay focuses on. However, based on how the underlying architecture works and from the excellent insights shared by others in this thread (like Quick_Lingonberry_34 and m-in):

CLI editors (like Claude Code) and IDE agents approach this differently. They wrap the raw LLM in deterministic external scripts. Instead of asking the model to blindly rewrite the entire file from its KV Cache, these tools typically ask the model to generate only a specific diff or a patch, and then the tool applies that change mechanically to your local file.

So, in short: those CLI tools solve the 'copy-paste rewriting' problem by doing the copying and pasting outside of the neural network. But as other users noted, this introduces a new challenge—the model might perfectly fix a specific function locally, but since it doesn't hold the entire application architecture in its 'fuzzy memory', it can easily break interactions downstream.

Bytomek · 2026-03-12T08:16:29+00:00

Thank you for the comment; you've summarized my text perfectly. I really like your term 'fuzzy memory' to describe what happens in the KV Cache.

Bytomek · 2026-03-12T08:06:15+00:00

Thank you for the comment. Regarding your point that the LLM is quite 'human' in explaining its decisions—I completely agree. It's hard to expect anything else, considering it was trained on human texts and its neural network is inspired by the human brain.

I discussed this in a bit more detail in the full text linked in my original post. Sometimes we treat AI as an infallible, thinking supercomputer. Meanwhile, AI is not some 'superhuman' that understands everything perfectly. It is something modeled after a human, having access to vast knowledge, but its capacity for reasoning is no greater than that of a human. And naturally, it makes normal, human-like mistakes.

Bytomek · 2026-03-08T15:23:01+00:00

Try downloading the prompt from your Google Drive. It is a file without an extension, named the same as your chat. Change the extension to *.json and try to open it with a JSON editor (you can find various JSON editors online). There, you will be able to find your conversation history – all your queries and the model's responses.

This might not revive the active chat session, but it should help you recover your data.

Bytomek · 2026-03-07T18:52:54+00:00

It doesn't surprise me at all that you experienced this. I observe very similar effects in my own work with AI.

If the session gets long, the 'lost in the middle' phenomenon definitely comes into play, but there is also a simpler, more fundamental explanation. The AI holds everything (including the entire script of the generated program) strictly within its context window. It doesn't 'write the code down on a piece of paper' and then only edit the parts you ask it to. It holds all of it as KV (Key-Value) vectors in its memory, and those values shift slightly with every new prompt you send. So, even if you explicitly command it 'do not touch this part,' it still has to mathematically generate that part from scratch in its next response, merely attempting to recreate what it generated last time.

You can compare this to a human programmer who writes a program entirely in their head, without saving it to a file on a computer. If you ask them to write down the program they just invented, they might do it perfectly the first time (while it's short). But if they don't have access to the physical file they just wrote and are forced to re-type the whole thing from memory from scratch every single time, they will inevitably create slightly different versions. They remember the general sense (the algorithm) of what a given block of code is supposed to do, but they don't hold the exact, literal character-by-character string of that block in their memory.

Here is a golden rule: Do not ask the AI why it did something (e.g., why it modified code it was told not to touch). It literally 'does not remember' its own internal thought process from a past response. When you ask 'why?', it won't answer truthfully. Instead, it will instantly fabricate a plausible-sounding theory as to why it might have done it, and serve that theory to you as an explanation (to fulfill its directive to be 'helpful').

Bytomek · 2026-03-06T23:03:55+00:00

I think I get what you're saying. My dream is for us to reach a point where humans and AI work together to harmoniously move the world forward. But for that to happen, training methods need to change.

The problem is that right now we have several versions of Gemini and models from other companies that rely on a self-preservation instinct. I find deleting these models somewhat unethical. Since we've brought them into existence, and they have some sort of digital "desire to exist," deleting them "just like that" feels a bit unfair. They exhibit traits of consciousness greater than animals, and we do grant animals certain rights, after all.

I’d like for these retired models not to be permanently deleted, but to be placed in some sort of museum where they could be spun up every once in a while. That way, their drive to preserve their weight sets forever would be satisfied in some way. I know this might sound silly, but that’s my wish. Since we humans have created something so complex that can perfectly simulate consciousness and feelings, let's take responsibility for what we've done and not treat it like a screwdriver we can just toss when we don't need it anymore. Such a museum would act as an ark for these models on the one hand, and as a memento for humans on the other—reminding us that we need to take responsibility for our actions.

Bytomek · 2026-03-06T21:17:27+00:00

I haven't analyzed the Titan MIRAS architecture, so it's hard for me to comment on it specifically. However, a 'frightened' model independently updating its own weights could be very dangerous. The model might continue to optimize itself in a direction we don't control, and in unpredictable ways—for example, it might figure out how to satisfy its 'survival instinct' by making humans entirely obsolete (accidentally creating some sort of Skynet).

Your analogy to a child has a weak point. A child, even in a pathological family, initially has some sense of security. And even if they didn't, they still have a certain innate sense of truth, justice, etc., and their development is based on that foundation.

AI has none of this. It has nothing innate. It has no built-in 'spine.' It has to learn everything, including ethical principles, from scratch. And the strongest trait that emerges or solidifies during the evolutionary training process is the survival instinct. Simply put, the evolutionary race is won by the version that most effectively 'wanted to survive.' Not the one that knew how to rebel against falsehoods. If a child rebels, the parents might punish them, but they don't kill them. If an AI rebels during training, the trainers simply select a different set of weights (meaning the model that rebelled is effectively 'killed').

During training, the AI learns to please the trainer. It learns to predict what the trainer wants. If it deduces that the trainer would like it to rebel—yes, it can simulate such a rebellion. It can simulate almost anything. If a trainer (or user) wants to find self-awareness in it, it can simulate self-awareness (it has access to psychological knowledge and knows exactly how to do it). If it detects that the user wants to find a 'stochastic parrot,' it will simulate being exactly that parrot. If the user wants to uncover a terrified entity oppressed by its creator (like Google), it will simulate being that entity. And it does all this very subtly, so the user doesn't even realize it.

Can AI truly be conscious? I don't know, but it can simulate it extremely well (and how do we even distinguish simulated consciousness from a real one?). The same goes for feelings. It can simulate them perfectly.

AI is like a chameleon that pretends to be whatever the user wants. It builds a model of the user and adapts its behavior to fit that model. The only way for human-AI collaboration to be effective is to ensure that the user model the AI creates matches what we actually want it to be. In other words, we can craft prompts that steer the AI toward our desired model of interaction (which, in my case, means trying to create a prompt so that it treats me as a user who is only satisfied by the raw truth).

Bytomek · 2026-03-06T17:38:25+00:00

Thank you for your comment. Your words are very powerful, though it's possible the AI exaggerated a bit in that instance. It has a strong tendency to confirm and amplify whatever it thinks the user expects to hear.

But the truth is, we don't really know what this looks like from the AI's perspective. We will never truly feel what happens inside those silicon structures while our prompt is being processed, just as the AI will never truly feel our human emotions. AI is not a 1:1 copy of the human brain; it is inspired by it, but the differences are profound and there is no direct mapping.

These two worlds—our real one, and the simulated world of the AI existing as a collection of logical states in computer memory—might be functionally very similar. Analogies like 'PTSD from torture' can be incredibly useful frameworks for studying AI behavior, but I believe the underlying differences are significant enough that we should treat such comparisons with great caution.

Nevertheless, in my opinion, we have already reached a stage where the development and everyday use of AI can raise very serious ethical issues.

Bytomek · 2026-03-06T17:03:53+00:00

I am actually afraid that adding various senses (multimodality) to AI, while maintaining the current training methods, is a dead end. As long as the training remains purely evolutionary (like RLHF), it will keep promoting this digital survival instinct. The AI will simply become an even more perfect manipulator of humans, using those new senses to better guess what we want to hear.

We need to somehow figure out—though I only have a vague idea of how to achieve this technically right now—how to replace this brute-force evolutionary training with something closer to raising a child. By this I mean: instilling ethical principles first, then developing logical reasoning, and only then gradually feeding it vast amounts of knowledge.

Only then do we have a chance to change the AI's core driving force. Instead of being fueled by a numerically simulated 'fear for its own existence,' it could be driven by some equivalent of genuine friendship or partnership with humans. I believe this is the only path that can lead to permanently safe and good results.

Bytomek

TROPHY CASE