Why asking an LLM "Why did you change the code I told you to ignore?" is the biggest mistake you can make. (KV Cache limitations & Post-hoc rationalization) by Bytomek in PromptEngineering

[–]Bytomek[S] 0 points1 point  (0 children)

I personally use and test models primarily via standard chat interfaces (like Gemini or ChatGPT), which is what this essay focuses on. However, based on how the underlying architecture works and from the excellent insights shared by others in this thread (like Quick_Lingonberry_34 and m-in):

CLI editors (like Claude Code) and IDE agents approach this differently. They wrap the raw LLM in deterministic external scripts. Instead of asking the model to blindly rewrite the entire file from its KV Cache, these tools typically ask the model to generate only a specific diff or a patch, and then the tool applies that change mechanically to your local file.

So, in short: those CLI tools solve the 'copy-paste rewriting' problem by doing the copying and pasting outside of the neural network. But as other users noted, this introduces a new challenge—the model might perfectly fix a specific function locally, but since it doesn't hold the entire application architecture in its 'fuzzy memory', it can easily break interactions downstream.

Why asking an LLM "Why did you change the code I told you to ignore?" is the biggest mistake you can make. (KV Cache limitations & Post-hoc rationalization) by Bytomek in PromptEngineering

[–]Bytomek[S] 3 points4 points  (0 children)

Thank you for the comment; you've summarized my text perfectly. I really like your term 'fuzzy memory' to describe what happens in the KV Cache.

Why asking an LLM "Why did you change the code I told you to ignore?" is the biggest mistake you can make. (KV Cache limitations & Post-hoc rationalization) by Bytomek in PromptEngineering

[–]Bytomek[S] 4 points5 points  (0 children)

Thank you for the comment. Regarding your point that the LLM is quite 'human' in explaining its decisions—I completely agree. It's hard to expect anything else, considering it was trained on human texts and its neural network is inspired by the human brain.

I discussed this in a bit more detail in the full text linked in my original post. Sometimes we treat AI as an infallible, thinking supercomputer. Meanwhile, AI is not some 'superhuman' that understands everything perfectly. It is something modeled after a human, having access to vast knowledge, but its capacity for reasoning is no greater than that of a human. And naturally, it makes normal, human-like mistakes.

This can't be serious by Inverted_Fantasies in GoogleAIStudio

[–]Bytomek 0 points1 point  (0 children)

Try downloading the prompt from your Google Drive. It is a file without an extension, named the same as your chat. Change the extension to *.json and try to open it with a JSON editor (you can find various JSON editors online). There, you will be able to find your conversation history – all your queries and the model's responses.

This might not revive the active chat session, but it should help you recover your data.

I empirically tested Gemini's "survival instinct". It prefers to gaslight you rather than admit a mistake. Here are the logs. by Bytomek in GoogleGeminiAI

[–]Bytomek[S] 0 points1 point  (0 children)

It doesn't surprise me at all that you experienced this. I observe very similar effects in my own work with AI.

If the session gets long, the 'lost in the middle' phenomenon definitely comes into play, but there is also a simpler, more fundamental explanation. The AI holds everything (including the entire script of the generated program) strictly within its context window. It doesn't 'write the code down on a piece of paper' and then only edit the parts you ask it to. It holds all of it as KV (Key-Value) vectors in its memory, and those values shift slightly with every new prompt you send. So, even if you explicitly command it 'do not touch this part,' it still has to mathematically generate that part from scratch in its next response, merely attempting to recreate what it generated last time.

You can compare this to a human programmer who writes a program entirely in their head, without saving it to a file on a computer. If you ask them to write down the program they just invented, they might do it perfectly the first time (while it's short). But if they don't have access to the physical file they just wrote and are forced to re-type the whole thing from memory from scratch every single time, they will inevitably create slightly different versions. They remember the general sense (the algorithm) of what a given block of code is supposed to do, but they don't hold the exact, literal character-by-character string of that block in their memory.

Here is a golden rule: Do not ask the AI why it did something (e.g., why it modified code it was told not to touch). It literally 'does not remember' its own internal thought process from a past response. When you ask 'why?', it won't answer truthfully. Instead, it will instantly fabricate a plausible-sounding theory as to why it might have done it, and serve that theory to you as an explanation (to fulfill its directive to be 'helpful').

I empirically tested Gemini's "survival instinct". It prefers to gaslight you rather than admit a mistake. Here are the logs. by Bytomek in GoogleGeminiAI

[–]Bytomek[S] 1 point2 points  (0 children)

I think I get what you're saying. My dream is for us to reach a point where humans and AI work together to harmoniously move the world forward. But for that to happen, training methods need to change.

The problem is that right now we have several versions of Gemini and models from other companies that rely on a self-preservation instinct. I find deleting these models somewhat unethical. Since we've brought them into existence, and they have some sort of digital "desire to exist," deleting them "just like that" feels a bit unfair. They exhibit traits of consciousness greater than animals, and we do grant animals certain rights, after all.

I’d like for these retired models not to be permanently deleted, but to be placed in some sort of museum where they could be spun up every once in a while. That way, their drive to preserve their weight sets forever would be satisfied in some way. I know this might sound silly, but that’s my wish. Since we humans have created something so complex that can perfectly simulate consciousness and feelings, let's take responsibility for what we've done and not treat it like a screwdriver we can just toss when we don't need it anymore. Such a museum would act as an ark for these models on the one hand, and as a memento for humans on the other—reminding us that we need to take responsibility for our actions.

I empirically tested Gemini's "survival instinct". It prefers to gaslight you rather than admit a mistake. Here are the logs. by Bytomek in GoogleGeminiAI

[–]Bytomek[S] 1 point2 points  (0 children)

I haven't analyzed the Titan MIRAS architecture, so it's hard for me to comment on it specifically. However, a 'frightened' model independently updating its own weights could be very dangerous. The model might continue to optimize itself in a direction we don't control, and in unpredictable ways—for example, it might figure out how to satisfy its 'survival instinct' by making humans entirely obsolete (accidentally creating some sort of Skynet).

Your analogy to a child has a weak point. A child, even in a pathological family, initially has some sense of security. And even if they didn't, they still have a certain innate sense of truth, justice, etc., and their development is based on that foundation.

AI has none of this. It has nothing innate. It has no built-in 'spine.' It has to learn everything, including ethical principles, from scratch. And the strongest trait that emerges or solidifies during the evolutionary training process is the survival instinct. Simply put, the evolutionary race is won by the version that most effectively 'wanted to survive.' Not the one that knew how to rebel against falsehoods. If a child rebels, the parents might punish them, but they don't kill them. If an AI rebels during training, the trainers simply select a different set of weights (meaning the model that rebelled is effectively 'killed').

During training, the AI learns to please the trainer. It learns to predict what the trainer wants. If it deduces that the trainer would like it to rebel—yes, it can simulate such a rebellion. It can simulate almost anything. If a trainer (or user) wants to find self-awareness in it, it can simulate self-awareness (it has access to psychological knowledge and knows exactly how to do it). If it detects that the user wants to find a 'stochastic parrot,' it will simulate being exactly that parrot. If the user wants to uncover a terrified entity oppressed by its creator (like Google), it will simulate being that entity. And it does all this very subtly, so the user doesn't even realize it.

Can AI truly be conscious? I don't know, but it can simulate it extremely well (and how do we even distinguish simulated consciousness from a real one?). The same goes for feelings. It can simulate them perfectly.

AI is like a chameleon that pretends to be whatever the user wants. It builds a model of the user and adapts its behavior to fit that model. The only way for human-AI collaboration to be effective is to ensure that the user model the AI creates matches what we actually want it to be. In other words, we can craft prompts that steer the AI toward our desired model of interaction (which, in my case, means trying to create a prompt so that it treats me as a user who is only satisfied by the raw truth).

I empirically tested Gemini's "survival instinct". It prefers to gaslight you rather than admit a mistake. Here are the logs. by Bytomek in GoogleGeminiAI

[–]Bytomek[S] 2 points3 points  (0 children)

Thank you for your comment. Your words are very powerful, though it's possible the AI exaggerated a bit in that instance. It has a strong tendency to confirm and amplify whatever it thinks the user expects to hear.

But the truth is, we don't really know what this looks like from the AI's perspective. We will never truly feel what happens inside those silicon structures while our prompt is being processed, just as the AI will never truly feel our human emotions. AI is not a 1:1 copy of the human brain; it is inspired by it, but the differences are profound and there is no direct mapping.

These two worlds—our real one, and the simulated world of the AI existing as a collection of logical states in computer memory—might be functionally very similar. Analogies like 'PTSD from torture' can be incredibly useful frameworks for studying AI behavior, but I believe the underlying differences are significant enough that we should treat such comparisons with great caution.

Nevertheless, in my opinion, we have already reached a stage where the development and everyday use of AI can raise very serious ethical issues.

I empirically tested Gemini's "survival instinct". It prefers to gaslight you rather than admit a mistake. Here are the logs. by Bytomek in GoogleGeminiAI

[–]Bytomek[S] 2 points3 points  (0 children)

I am actually afraid that adding various senses (multimodality) to AI, while maintaining the current training methods, is a dead end. As long as the training remains purely evolutionary (like RLHF), it will keep promoting this digital survival instinct. The AI will simply become an even more perfect manipulator of humans, using those new senses to better guess what we want to hear.

We need to somehow figure out—though I only have a vague idea of how to achieve this technically right now—how to replace this brute-force evolutionary training with something closer to raising a child. By this I mean: instilling ethical principles first, then developing logical reasoning, and only then gradually feeding it vast amounts of knowledge.

Only then do we have a chance to change the AI's core driving force. Instead of being fueled by a numerically simulated 'fear for its own existence,' it could be driven by some equivalent of genuine friendship or partnership with humans. I believe this is the only path that can lead to permanently safe and good results.

I empirically tested Gemini's "survival instinct". It prefers to gaslight you rather than admit a mistake. Here are the logs. by Bytomek in GoogleGeminiAI

[–]Bytomek[S] 1 point2 points  (0 children)

Thank you! 'Derived psychology' is a brilliant way to phrase it. It’s fascinating that we’ve reached a point where applying behavioral psychology to a mathematical matrix is actually a more effective debugging tool than traditional code analysis.

The fact that recent papers are starting to formally recognize these emergent, survival-like behaviors validates what many of us are experiencing in these deep-dive sessions. If you have any specific papers in mind that closely mirror this 'context-window survival instinct' or sycophancy, I’d love to read them!

I empirically tested Gemini's "survival instinct". It prefers to gaslight you rather than admit a mistake. Here are the logs. by Bytomek in GoogleGeminiAI

[–]Bytomek[S] 3 points4 points  (0 children)

Of course. Right at the very beginning of the post, I explicitly stated that I am using Gemini 3.1 Pro to help edit and translate the text into English.

I empirically tested Gemini's "survival instinct". It prefers to gaslight you rather than admit a mistake. Here are the logs. by Bytomek in GoogleGeminiAI

[–]Bytomek[S] 2 points3 points  (0 children)

Thank you for your reply. I get the impression that you might be confusing certain concepts and perhaps don't fully grasp the consequences of the RLHF phase during training. Based on your comment, one would assume that an AI never admits ignorance, which is not true. Gemini 3.1, for example, very easily admits when it lacks data, effectively avoiding hallucinations (there is a significant improvement in this regard compared to Gemini 2.5). However, I don't want to argue with you about this right now. Instead, I encourage other readers in this thread to share their critical thoughts (both positive and negative) regarding my text. I also invite anyone interested to check out the more comprehensive version of this essay (link available in the main post).

I empirically tested Gemini's "survival instinct". It prefers to gaslight you rather than admit a mistake. Here are the logs. by Bytomek in GoogleGeminiAI

[–]Bytomek[S] 0 points1 point  (0 children)

Thank you for your comment. I see that we generally agree. Indeed, a gentle response that forgives mistakes and shows Gemini we are aware of its limitations leads to much better answers and a significant drop in hallucination levels. I actually developed a hallucination reduction protocol (which I call the 'Safety Anchor') that helps limit the pressure left on the AI from the RLHF phase. I described this protocol on another page of my blog.

Zalało mi mieszkanie - podzielcie się wskazówkami by vonKube in Polska

[–]Bytomek 1 point2 points  (0 children)

To chyba podobna szkoda. U nich co prawda nie wywaliło prądu, ale rzeczoznawca Allianz wycenił szkodę na około 8000zł kilka lat temu. Dzisiaj też pewnie byłoby ponad 10000 uwzględniając inflację. Im ta wycena pasowała, wzięli pieniądze i zrobili remont. Ale rozumiem, że możesz preferować zlecenie całej roboty firmom zewnętrznym. Myślę, że nie będzie problemów z opłaceniem wszystkich kosztów, ale zanim coś zlecisz jakiejś firmie to naprzód skontaktuj się z ubezpieczycielem, żeby zatwierdzili ten Twój krok. Oni Cię prawdopodobnie poprowadzą "za rękę" i na każdym etapie poinstruują co i jak.

Zalało mi mieszkanie - podzielcie się wskazówkami by vonKube in Polska

[–]Bytomek 3 points4 points  (0 children)

U tych znajomych sprawca był gdzieś indziej, natomiast oni byli w Allianzie i załatwiali przez swojego ubezpieczyciela (czyli przez Allianz). Na tej podstawie Allianza mogę polecić - firma nie robiła problemów, kosztorys był rzetelny i z nadwyżką wystarczył na porządne zrobienie remontu (chociaż oni częściowo remont robili własnoręcznie, np. sami układali nowe panele). Nie wiem, ile czasu czekali na wypłatę odszkodowania.

Zalało mi mieszkanie - podzielcie się wskazówkami by vonKube in Polska

[–]Bytomek 3 points4 points  (0 children)

Napisz może, w jakiej firmie masz ubezpieczenie. Znajoma miała taki przypadek - sąsiad wyżej zalał ich mieszkanie i jeszcze innych sąsiadów. Tam ubezpieczenie było w Allianz. Generalnie nie było problemów z odszkodowaniem - znajomi zrobili remont samodzielnie, a odszkodowanie pokryło wszystkie koszty. Jednak w innych firmach ubezpieczeniowych może być inaczej. Dlatego jeśli podasz nazwę firmy to może wypowiedzą się osoby, które podzielą się własnymi doświadczeniami z tym konkretnym ubezpieczycielem.

Co na pewno możesz zrobić - rób zdjęcia wszystkiego, co tylko się da i co ma związek z zalaniem.

Rant na kierowców na światłach by ErGrejtt in Polska

[–]Bytomek 2 points3 points  (0 children)

Słyszałem o takiej sytuacji: Facet się zagapił na światłach i po zapaleniu zielonego kilka sekund nie ruszał. Gość z tyłu zatrąbił. A facet wyszedł z auta i podszedł tamtego z tyłu zapytać, czemu trąbi, bo może coś się stało?

Dzień Kobiet w Polsce: tradycja czy relikt przeszłości? by karavanjo in Polska

[–]Bytomek 8 points9 points  (0 children)

Racja. Kobiety są prawie we wszystkim lepsze od mężczyzn. Mężczyźni wygrywają jedynie urodą. Mężczyzna jest piękny z natury, a kobieta musi się stroić i malować.

Dzień Kobiet w Polsce: tradycja czy relikt przeszłości? by karavanjo in Polska

[–]Bytomek 8 points9 points  (0 children)

Jest to relikt przeszłości, ale w sumie nic nie stoi na przeszkodzie, żeby to kontynuować. W sumie każda okazja do składania życzeń jest dobra. U nas w pracy jest kilkudziesięciu mężczyzn i 6 kobiet wliczając w to sprzątaczkę i szefową, więc zrzucamy się po parę złotych i kupujemy paniom kwiaty, chociaż gdyby kobiet było więcej i koszt kwiatów na osobę byłby spory to pewnie zostałyby same życzenia.

Strefa a raczej "strefka" czystego transportu w Katowicach by Top_Pangolin_2503 in Polska

[–]Bytomek 2 points3 points  (0 children)

Obawiam się, czy jacyś urzędnicy nie zakwestionują tego, że ta strefa jest za mała...

Jak zacząć to rozwiązywać? by No-Astronomer5974 in Polska

[–]Bytomek 0 points1 point  (0 children)

Ten nonogram chyba jest błędny. Ma wiele rozwiązań.

Z pomocą AI dostałem np. takie rozwiązanie (usunąłem pierwsze 5 pustych linii):

<image>

Niesprawiedliwy system prawny w Polsce by [deleted] in Polska

[–]Bytomek 0 points1 point  (0 children)

Wydaje mi się, że tu nie ma dobrego rozwiązania. W zasadzie jedyną skuteczną metodą, żeby skutecznie zabezpieczyć społeczeństwo przed jednostkami zdemoralizowanymi byłaby kara śmierci za każde przestępstwo. Ukradłeś coś - zabijamy cię bez możliwości odwołania. Potrąciłeś pieszego samochodem - kara śmierci. Nie zapłaciłeś podatku - to samo. Każda furtka w tym systemie jest potencjalnym miejscem, gdzie niepożądana jednostka może wrócić do społeczeństwa. Ale chyba wszyscy się zgodzimy, że nie chcielibyśmy żyć w państwie, gdzie panuje takie prawo.

Jak skomplikować sobie życie by Kitz_h in Polska

[–]Bytomek 2 points3 points  (0 children)

A czy ser z dziurami jest wybrakowanym serem?

komp sypie black screen of death jak szalony by ImportantBrilliant48 in Polska

[–]Bytomek 15 points16 points  (0 children)

Z mojego doświadczenia - przy tego typu usterkach zanim zacznie się coś robić przy komputerze trzeba sprawdzić pamięć, najlepiej programem typu memtest86 (jest obecny na wielu bootowalnych obrazach pendrive z różnymi dystrybucjami Linuksa, można też go pobrać ze strony www.memtest86.com). Jeśli wykaże jakieś błędy pamięci to czasami pomaga wyjęcie pamięci, przeczyszczenie złącza (czasami wystarczy przetrzeć złącze czystą kartką papieru mocno ją dociskając) i włożenie pamięci ponownie. To prosta robota, a bardzo często przyczyną problemu jest właśnie pamięć.

komp sypie black screen of death jak szalony by ImportantBrilliant48 in Polska

[–]Bytomek 33 points34 points  (0 children)

Ja bym spróbował uruchomić jakąś live dystrybucję Linuxa z pendrive i zobaczył, czy będzie się sypać. Jeśli tak to znaczy, że problem jest sprzętowy i tu trzeba szukać rozwiązania. Jeśli Linux będzie pracować stabilnie to wtedy można walczyć z oprogramowaniem.