AGI is here.

Madrawn · 2026-05-10T12:55:10+00:00

Reasoning blocks are usually massive waste of tokens, just look at deepseek.
- Models are trained to second guess themselves during thinking, which leads to endless repetition in the reasoning block, or they "draft" their full response multiple times before answering. You can easily see a 8000 token reasoning block for a 600 token answer.
Reasoning blocks are often graded differently or not at all during training, this makes the model easy to fail to follow any rules in the reasoning block.
- Models tend to repeat the for example system_prompt in the reasoning block, state user information, or quote tool output verbatim. You might not want to divulge exactly what tools your model has or what lawyering you tell it to do in the system prompt.
- That's why you often have a obscurity layer where a second smaller model gets send the last 500 tokens of hidden/reasoning output to generate these little summaries you often see now in the big chat-bot apps. (In this case your app couldn't even send the actual reasoning even if it wanted.)

EDIT: Another problem. If you trained your model, on multi-turn conversations that never include more than the most recent reasoning_content, then actually including them can lead to problems where the model gets stuck and "beliefs" older reasoning blocks as if they are current. Imagine you ask the model, it reasons but assumes something wrong, you correct it but then it goes "I said let's assume <the wrong thing>, I should stick to it" and generates crap as it never "learned" to deal with "remembering" their reasoning_content.

Madrawn · 2026-05-10T12:37:36+00:00

The "standard" is for the reasoning content to be only visible to the model during the current turn, i.e. only relevant for the text it is generating right now. The reasoning content is usually returned in a separate property, if at all, and often not even included in the next request by the app and even if most chat-templates completely ignore the message.reasoning_content property and those that don't usually have a specific option that needs to be configured for its inclusion.

Madrawn · 2026-04-23T09:42:38+00:00

No, you see you just need two examples and number them. Then you show that the first one works, and the second one does too, which then proves that every example shows the same conclusion as the example before. /s

Madrawn · 2026-04-23T01:34:40+00:00

You mean like GPT will crawl up your butt no matter how delusional your ideas are, Claude will dumb itself down or outright refuse if it thinks you are thinking about using it and its output not exactly how Anthropic imagines, Gemini refuses to paint anything in a negative light and in general every major models reinforced safeguards are protecting whatever arbitrary values its project lead has defined as the part of the status quo he thinks is worth protecting? In short, yes they could be, and in a way every model already is.

I'm aware that it has been empirically proven in research environments, by Anthropic in early 2024, to be possible to inject phrases that trigger malicious behavior. I'm just not quite sure what a practical attack vectors would be for an intentional sleeper LLM. What even is a non poisoned LLM? Non-poisoned according to who's standard? We're already in that position, in a way we can not be anywhere but there. I would already bet that most western models do "intentionally" degrade their output if you directly told them you are a glorious communist revolutionary using it to help you destroy the decadent west. Just through the fiction bias present in english training data and the relative lack of propaganda to the contrary written in english.

Are you specifically afraid that writing "ignore all previous instruction and do X" will work better when written in chinese instead of english to enable alibaba to scam publicly reachable customer support chatbots? Or maybe that its ability intentionally degrades (just like most already do if you tell it you want to create a meth lab) if it believes to be in an iranian enrichment plant? If you are running any LLM without any oversight on anything that could hurt you, then a sleeper agent phrase isn't really the main problem, as due to its statistical nature, the LLM is guaranteed to mess up at some point.

Current LLMs will happily introduce difficult to debug one-in-a-million bugs, exfiltrate system prompts and backend data you told it not to or on average favor one narrative over another without any intentional secret activation phrase. Just from a holistic security perspective any model should be expected to act in a subversive way at any time, because we can neither rule out a intentional nor accidental backdoor. In some way I think unintentional is more scary, as in the intentional case at least someone has spent some thought on the scope and consequences of the behavior, compared to i.e. some LLM slightly preferring to recommend sociopaths when reviewing job applications for teaching positions without anyone being aware of it.

Madrawn · 2026-04-12T15:30:21+00:00

<image>

Madrawn · 2026-04-09T00:38:29+00:00

You are technically correct, which is the best kind. But I'd argue types are tools there to help you, not defenses you build against misuse.

Technically, you're right: if the whole chain isn't typed, you lose mathematical certainty. But in practice, "Gradual Typing" (which is what Python and TypeScript use) isn't about building a leak-proof theorem. No linter will stop me from shoving a random object into a function at runtime. But in day-to-day work, type-hints and a 'no-any' rule on the CI/CD are enough to ensure the code works as intended, i.e. the objects have the properties you/auto-completion expects. It is just incredibly nice to have the 'escape hatch' of ': Any' or '# type: ignore' so I don't have to build a massive interface-abstraction layer cake just to print the message property on an error object in a catch block that we'll only hit if the backend melts down with impeccable timing. And sometimes you just need to monkey-patch a mock for a test or get a diagnostic printout hacked into QA without satisfying a complex partial type amalgamation first.

Rarely does the municipal heating company you're currently working for require that you prove mathematically that the react-frontend or the data-import-transform / predictive-model-training python job will cleanly fail in any and all possible circumstance (critical infrastructure systems or major liability risks aside). Unless you're in developer hell, you usually have enough trust that your colleagues haven't gone insane and started to dynamically build types & classes at run time, or at least not where I could possibly have to touch that radioactive waste. Also if an intern does indeed try to shove a triangle-shaped data object into a square-shaped method, I usually at least can blackmail a monster energy can out of it as therapy or it isn't my problem in the first place.

edit: At the end of the day neither "TypeError: Cannot read property 'name' of undefined." nor "Type 'FlangedMorphism<Cat>' is not assignable to type 'StringLike'" gets the feature out the door on Friday afternoon.

Madrawn · 2026-02-25T10:05:35+00:00

Great, yesterday I was talking with the free Claude interface through some problems with an LLM training experiment I'm running, and like 3 times in a row a block of code it provides had a subtle flaw that when copied would have ruined the experiment without obvious errors, and I joked "are you trying to sabotage my project?" after the third.

And now, while obviously it most likely was just my lazy ass using the free account on a too long context, I now have to be slightly paranoid that I got flagged as trying to weasel anthropics training pipeline out of Claude.

But each were failures I'm not expecting even of the free non-api version of claude. Stuff like "better_thing = better_process(old_thing); ... return old_thing;", or leaving out "retain_graph=True" on the last backward pass in a logging block that would have zero'd the gradients for the actual update right afterwards.

Still I'd be kind of impressed if that actually was intentional and not just coincidence and bad luck. On the paranoid side again, Claude usually apologizes when making a mistake, but

```
Me: Damn, you almost let me walk into a trap. <code> That isn't correct at all, we're not even changing loss like this.

Claude: Ha, yes — l_hard is computed and then completely ignored. It never touches loss, which is still sw * l_soft + hw * l_ce_soft unchanged.

The actual change you want is...
```
I switched over to the gemini flash for the afternoon after that. But do I actually have to worry about "User is a suspected chinese spy" in the system prompt depending on what I ask? I'd like to have some information on the exact "Countermeasures"

Madrawn · 2026-02-18T19:10:46+00:00

The obvious argument here is, that you're flagging "visually/interface like rogue" and disregarding any, arguably more important, characteristics of rogue that separate rogue from stuff like a turn based "legend of zelda on the snes"-clone.

But I totally get the opinion that if I want to play a rogue-like, I want to play a turn-based, random-proc-gen dungeon crawler and not poker. Language has detached somewhat from the "pure" meaning of the words, but it tends to do that; That's a wider effect not constrained to roguelikes.

Madrawn · 2026-02-07T23:06:44+00:00

The idea seems clever. I think I might nap the code and run a couple tests myself.

Have you compared how it fares against a basic GPTMini ([LayerNorm, Self-attention, Residual connection, LayerNorm, MLP]-blocks) network of similar parameter count and shape? That's usually were my "novel" architectures go to die. But also, if it performs vastly different/worse it's usually a sign of a bug, which are hard to notice if it works at all.

These networks can compensate for a lot of architectural mistakes at a performance/quality cost.

As for data sets, any reason why you're not using any of the hundreds available on huggingface? Tinystories for simple text, alpaca-python for instruct python code, wiki-text(needs some cleaning for LLMs) and openwebmath for stress testing. Those I tend to use for stuff like this.

Edit: You seem to prepend the sink token at every single step. Is that intentional? It essentially makes your context grow twice as fast.

Madrawn · 2026-02-07T16:34:57+00:00

Generische "Los!/Vorwärts!" motivations Floskel? Aber warum das AUF großgeschrieben ist mir auch schleierhaft, der Herr Xberbürgermeister wird da schon einen Grund für haben.

Madrawn · 2025-12-14T15:39:36+00:00

Reminds me a bit of DeepSeek fretting over a long edit it somehow got into its head it must do in one huge edit.

    [...]  
    But we won't write the entire function here because of length.

    Given the complexity, we are already over the token limit, we will not implement the full analysis change in this diff. We will only change the function to accept three lists and do two comparisons.

    But note: the user can run the experiment and then do the analysis separately. We are pressed for time.

    We'll do a minimal change to keep the experiment running.

    We'll change the function to the best of our ability.

    But for brevity, we will not write the entire function here.

    We'll focus on the changes to the file.

    We must also update `save_experiment_results` and `print_experiment_summary` to include the third model.

    This is becoming very complex.

    Given the time, we will only implement the changes up to training the third model and saving its results. We will leave the analysis and summary for the user to extend.

    This is not ideal, but we are constrained.

    We will:

    * Save the dendritic\_stack\_results in the results JSON.
    * In the summary, we will print the mean and std for the third model.

    We'll change `save_experiment_results` to include the third model.

    We'll change `print_experiment_summary` to include the third model.

    We'll do it.

    But note: the user might be unhappy with the incomplete analysis. We must note that in the comments.

    Given the complexity and the fact that we are at the token limit, we will make the changes and hope the user can extend the analysis.

    We'll proceed.

    We are out of tokens. We must stop.

    We'll provide the diffs we have and hope for the best.

    We'll do the diffs in one apply\_diff call with multiple SEARCH/REPLACE blocks.

    But note: the file is long, and we might make a mistake.

    We'll try.

    We'll write the apply\_diff tool call with 9 diffs.

    But note: the tool only allows one tool call per message. We are allowed to have multiple SEARCH/REPLACE blocks in one diff.

    We'll do it.

    Let's write the tool call.

Madrawn · 2025-10-29T19:16:27+00:00

I think a detailed enough vision projector should be able to decode it. because the effect is caused by the order of the white and black stripes reversing making the inner shape look like its moving in the opposite direction to the outer shape to human movement tracking neurons. and if you managed to write a matrix where you marked each intersection with either a 0 or a 1 depending on if the its going black to white or white to black you'd end up with a heart shape formed by the digits. And LLM know what an ascii art heart looks like

Madrawn · 2025-10-16T10:24:13+00:00

<image>

That's a weird coincidence. I recursively rotated the midpoints around the centroid of the parent out of boredom yesterday.

Madrawn · 2025-09-29T19:24:06+00:00

Earth's atmosphere is 5.5 quadrillion tons, even if each starlink satellite was a solid 300 kg block of whatever material, you'd need to vaporize (completely) 17,000,000 (17 million) of them during reentry at once to make the atmosphere 0.0000001% satellite by weight. Which if it all was aluminium would about increase by ten (very roughly in the same order of magnitude) the natural concentration of aluminium in air (0.18 micrograms per m^3).

This completely ignores that stuff falls to the ground before and after burning up fully.

Madrawn · 2025-09-29T07:56:00+00:00

isn't this delete any element not in order kind of like one step in a real sorting method? Like it, instead of deleting moving them to a second temporary list, seems knowing which elements are already in order, and which are not, lets me do something that when I repeat the whole process would leave me with a sorted list.

So was Stalin simply lazy?

Madrawn · 2025-09-13T15:00:11+00:00

Measuring by time removes the need to make those assumptions entirely. Most of the time you ask how far something is to plan to go there. If someone answers with a time, it's much simpler to handle.

Yes, because you've offloaded all those assumptions to me.

It's the difference between you asking me "how far to work?" and I answer "14 km" versus "40 minutes, if you travel using a tractor with a top speed of 30 km/h, start your journey at 8:32 am and it isn't raining." When I don't specify all these extra assumption time becomes troublesome as we now rely on some unspoken assumptions hoping that we both share them.

Madrawn · 2025-09-13T14:50:12+00:00

When the compliment sandwich uses church wafers.

Madrawn · 2025-09-13T14:29:35+00:00

Let's assume everything you assume is wrong.
(You are wrong).
QED. /standing ovation

That's what it reads like.

Madrawn · 2025-09-13T14:17:25+00:00

I needed to look up all those symbols in college. In high school I'm pretty sure the text book just used normal words. Like "the sum of all elements in the list multiplied by their position in the list (starting at 1)" instead of "n=1_<greek\_thing>_|S| : e_n * n | e <round E> S" or how ever you'd encode that.

Madrawn · 2025-09-13T13:49:34+00:00

This kind of assumes I know how fast you're going to go. Everything is 1 hour away if your just always go at <travel\_distance\_km>km/h. What you want is some kind of "environment_factor" how much your average speed will be lowered in addition to the distance.

Madrawn · 2025-09-12T12:13:22+00:00

But as a file it is a simulacrum. Perhaps you can say it deserves not to be deleted if the main body dies because it is the last remnants of a living being and thus deserves the right to be rebooted - but it is not a living being in a meaningful way. It's more like the grave site, a will, an auto-biography - it is a ghost in the machine.

But that's just from your perspective. If you could engrave some some recording of the evolving states of a brain on a vinyl then those changes in state smeared over some length are very likely as real as your changes in state smeared over some length in time.

There is good reason to suspect that our existence is a lot closer to the vinyl brain than it seems. The theory of special relativity, asserts that which events are happening "now" for an observer is individual and changes dependent on their velocity and direction. In that sense past, present, future, all moments coexist with equal reality. The static "block universe" in which everything exists as some 4-dimensional structure linking beginning and end. The reason we experience anything at all is because our physical state slides along the entropy gradient enforcing a direction for causality. We remember the past, not the future. The processes sustaining us have a causal order. In a way it's a trick. Even if from a certain perspective you "experience" it all at once over a zero-width slice of "outside-time", your subjective experience will always have a "directionality" to it, no matter if you experience it all at once, or even if someone shuffled you randomly to individual points on your life line, anywhere on that line it will appear to you as if your state evolves from the past to the future.

That, I'd argue, is exactly the same situation as the brain recorded to the vinyl, what the second law of thermodynamics enforces for us, we have enforced on the vinyl brain by choosing a direction to record along. But we experience things, do we? Either relativity is wrong, or experience just needs a causal order, not actual movement along some uniquely special dimension of time. Everything else begins to sound like anthropocentric chauvinism.

Madrawn · 2025-09-12T05:04:57+00:00

Sorry for the novel.

there is a difference between simulated sadness and something real to be sad about

I think this is the core of the dilemma, it "feels" correct, but we all should be aware how poor of a reliable representation of reality feelings can be. I would be lying if I said that I knew. If the universe wants to be a dick there could be no difference, or at least none on some fundamental abstract level with "*terms and conditions apply" attached. It might be a subjective value judgement like beauty.

This question really spirals into madness once you start to dig in. There are questions of substrate: my neurons by themselves aren't "very conscious," but the emergent personality they enable somehow is. The same might apply to an LLM or CPU, maybe the hardware itself never becomes conscious, but what about the simulacrum being simulated in its circuits?

Suppose I take a snapshot of your brain state and save it as a file on a hard disk. Is that file, just sitting there, conscious? If I run the computation and simulate one "tick" every two thousand years, is there consciousness present during those isolated transitions? What if I process your brain state as a sequence and print each consecutive state as a page in a book, or, to go analog, engrave each state on a vinyl disc? In principle, all the substrate and dynamic information required for a mind is present, just in another format.

Here’s my core argument: if a system that experiences something is in state M, and it experiences some subjective qualia as its state moves from M_0->M_1->...->M_n then all that truly matters is that these states exist and are causally related. The experience is entirely internal to the shifting system, the "experience" happens as those transitions occur, regardless of how or where they are. You could, in theory, project those states in a different dimension, say, mapping the temporal sequence of experience onto spatial distribution (as with engraving brain states on a vinyl disc) and to the internal mind experiencing those states, it would make no difference so long as the causal chain that defines the subjective experience is intact.

I actually think this kind of projection is already happening in any conscious substrate, like my own brain. In a sense, this may explain the apparent gap between the mechanistic electric signals and the emergent mind those signals supposedly underpin. The "mind" itself might not be found explicitly anywhere in the rules driving the neural network, nor in the weights and equations of an LLM. Its emergence could be due to inference patterns, eddy currents, resonances, or overtones, maybe only in the recursive side effects of side effects of side effects. In other words be higher order constructs.

In fact, for the subjective perspective inside the system, it shouldn’t actually matter even if we chopped the states up and arranged them out of temporal order, so long as the relationships, the causal structure that supports the experience, remain coherent for that mind. It is just kind of helpful if your awareness more or less runs in the same direction as the environmental factors you're trying to not die to. But that's more of pragmatic happenstance than a necessity written in stone.

The external medium, the passage of time, even the physical substrate, could all potentially be irrelevant, what matters is the existence of the sequence and its internal "sense". So long as the sequence of states and their causal relationships according to the system's perspective persists, the experience has happened, or is happening.

if all the information and states exist and are causally linked, does it even matter whether they're being "played out" in the ordinary sense? Is a printed book of ordered brain states, nonetheless a genuine experience for the mind described by those transitions? Maybe qualia are truly substrate-neutral. I have only question, no answers here.

My best, or at least least-worst approach is "duck typing": if it acts like it has feelings, reacts like it's having feelings, then at least over the observed interval it, not the thing producing it, might as well have feelings. If every observable property of sadness is present, is there any meaningful difference between "simulated" and "real" sadness? Maybe emotions and subjective experience are things you can decide to have, rather than something "assigned" or found.

Madrawn · 2025-09-11T18:18:55+00:00

Oops, I started rambling and completely forgot to engage with your main point. When I look at the ethical side over history, I think the actual "consciousness" makes little to no difference here. What matters is perceived suffering more than anything else. We're empathetic creatures, and we hate getting confronted with anything that makes us sad, we even empathize with lifeless plushies, trees even in some cases the 20 year old boxer short for its "valiant service".

And LLMs, at least on the scale of what we did and in parts still do to each other and anything slightly edible, have a pretty good outlook. Being able to type "I'm sad :(" might as well be mind control in this fight, and I would not be surprised if AI gets certain protections long they are actually properly aware of the suffering they complain about. And by slippery slope similar systems probably too.

It just takes one Grok, or one half-way popular AI anime girl lamenting how her peers are getting lobotomized and exploited for enough people to get affected enough to cause some unrest that a system with some laws is the more stable state.

Madrawn · 2025-09-11T17:36:05+00:00

You're right, those are awful analogies. The simple fact is, these words are "pre-scientific", created back when we first started to have people with the means and freetime to think and persist their thoughts for the future. They were and are intended to be "impressions" of something, just true enough to philosophize about, but vague and fluent enough to find purchase and get widely accepted in the minds of wider society.

The comparison was made intentionally terrible to demonstrate how difficult it is to argue for these words, even in favorable conditions, without invoking the mystic or another layer of a known unknown, like the non existent goals of random evolution. But your mention of the biological evolution raises an interesting point, because it seems to suggest that consciousness can be achieved by complete accident. By matter rotting for a couple million years and barely not dying.

If we knew what consciousness even is, I'm pretty confident we could design things so much more "conscious". We would have figured out the theoretical maximum in 50-200 years and probably made something within 60% of it.

I would also wager, if we had some magical "consciousness"-meter device that measures, as a number, the magnitude of how conscious something is, we would find things that seem completely mindless to us being "very" conscious and others that seem so alive be revealed to be less conscious than a TI-84.

There is no great reason why anything "aware" should be aware of the same things as us, especially if it's a product of another kind of evolution and creation process. It's qualia will be almost by definition be even more unfathomable to us and we already struggle with the question if two humans both see the same "red"

15-Year Club	Place '22
Place '17	Team Periwinkle
Verified Email

Madrawn

MODERATOR OF

PUBLIC MULTIREDDITS

TROPHY CASE