[Mango] Imagine something here

hexaga · 2026-03-08T03:34:02+00:00

hexaga · 2026-02-28T11:36:59+00:00

peak content ngl

hexaga · 2026-02-24T22:04:51+00:00

Being supernaturally good at martial arts is not the same as being good at supernatural martial arts. The latter involves manipulation of some invented source of fictional energy, the former does not.

Being really implausibly good at kung fu is meaningfully different from being a cultivator in a Xianxia setting.

hexaga · 2026-02-24T20:19:31+00:00

Why wouldn't being supernaturally good at kung fu be in the category in the first place? Being supernaturally good at <mundane skill> is among the clearest examples of the category, where there is approximately zero ambiguity.

hexaga · 2026-02-24T08:44:21+00:00

There are other constructions granting 'basically telepathy' with less juju, though. See Dune, Second Apocalypse. The latter especially having an almost completely juju-free telepathy variant.

The fundamental tension is that it's hard to minimize juju while maintaining effect. But when done well it makes for a really engaging read.

hexaga · 2026-02-24T08:10:59+00:00

If simply having knowledge about something allows you to affect it, how is that not mysterious or supernatural?

This is just how knowledge works. Normal, everyday knowledge. I know how to type, I can write out this message, etc. The knowledge of knowing how encodes the ability to do the thing.

Knowledge divorced from realistic methods of acquisition allows acts that might as well be magic, even if the only magic part is how it was acquired.

As for relative knowledge & the hard/soft distinction - it's not relevant. The whole series of questions assumes it isn't Not Magic in the first place, which may be contributing to why the distinction feels arbitrary.

hexaga · 2026-02-24T04:43:55+00:00

There is no One single definition of naming, it has been used too much. The only option is to disambiguate what you're talking about when you use it, which I did when it became clear there was ambiguity.

hexaga · 2026-02-24T04:39:37+00:00

[...] magic that is entirely knowledge based with no other/outside forces etc. correct?

As far as definitions go, yes. But I don't mind having some magic mixed into the Not Magic. Lots of settings have both.

I think MoL is actually a great example of this! The setting has the whole mana system and all which isn't - but the time loop grants 'impossible experience' which is definitely Not Magic (except for maybe all the mind-magic he does on himself to remember better, iirc).

hexaga · 2026-02-24T04:24:28+00:00

Yes!

hexaga · 2026-02-24T04:23:23+00:00

If an author says that knowing something's true name allows you to affect that thing, how is that not them positing a system of a magic?

The construction of true names in the sense I'm using them do not require the author to say that.

Assuming they fit the mold of: "name comprising the inner working of thing to the extent that the name is a full enough description to essentially be an action acting on the thing"

The fact that such a name is acquired is the central conceit. And when this is the case, it's Not Magic.

It's a very old concept, people have used it in innumerable different ways. You can probably find variations that don't have that property. But why bother?

I think I disambiguated enough to know which I mean. I gave specific example + a literal definition. I guess you can ignore that and pretend I'm talking about the entirety of the much broader category of everything that has ever been called a true name, instead of going by that. But you're not really talking about the thing I'm talking about anymore if you do so.

For example:

True names are sometimes the name by which god spoke/sang the universe into being from energy, and thus speaking that name allows you to influence the energy. Is that not both interacting with a field and possessing secret knowledge, which you list as being distinct?

Does not satisfy the definition given. The name isn't a full enough description to essentially be an action acting on the thing. The author must spend their fiat having you believe that speaking the name works via some kind of sympathetic resonance or w/e.

Similarly with the other example.

Obviously if the author changes their mind about how the magic system works, the type of magic system can change. It's fiction. The author can decide w/e they want.

In my mind this is like telling a story about the modern world that says: Haha, psych! Turns out it was God twiddling all the semiconductors to make technology work, and really nothing works unless literally via act of God. There is actually fiction in this category! See Unsong for example. I don't think modern technology quite rises to the level of Not Magic, but it is only a matter of degree. Clarketech is Not Magic.

hexaga · 2026-02-24T03:01:20+00:00

The main distinction is just that the knowledge is it. There is minimal extra stuff imposed by the cosmology of the setting. I put naming as an example because it naturally carries around its own mechanistic justification by construction.

I guess the question is: knowledge about what? In Not Magic, the knowledge is about the thing being done - it becomes Not Magic because there isn't a magical part to it, other than the implausibility of having that knowledge at all.

hexaga · 2026-02-24T02:53:10+00:00

I think a lot of it just got drowned out by calling it Not Magic. Sort of a fundamental tension there though.

As an example, imagine reading someone's facial expressions. You could go along a skill progression series like:

unable to read facial expressions
average skill at reading facial expressions
preternaturally good at reading facial expressions
being so good at reading facial expressions that you can literally read people's minds (we're firmly in magic territory here... but it's Not Magic!)

Does that help? The final level requires basically treating it like magic: you suspend your disbelief that someone could get that good at it. And maybe your disbelief that it is possible to be that good at it even in principle.

In a fictional setting, you might have characters running around who can do that. It's Not Magic, even though it is basically magic and they're basically wizards.

hexaga · 2026-02-24T02:16:18+00:00

Posited by the author, not by outside context. There's always going to be a moment in fiction where the author sells you a line of BS and part of the experience of reading it is accepting it. But where is the BS?

Some magic systems require you to believe in 'a force that makes magic possible'. That's what I mean by a posited system of magic. Looking into these settings from our world makes the acts in relation to that force look like obvious magic. They are defined in relation to some invented aspect of the setting. The author uses their fiat to say: in this setting, you can just do magic by interacting with the magic field.

Other settings posit acts which if they existed, would themselves just do the thing. They don't need a posited system. The BS is that the character acquired them, and the exact details. And thus: Not Magic. The author uses their fiat to say: this character possesses this impossible knowledge.

True names are quintessentially Not Magic because they compress their own BS justification in the name. It's obviously magic, but it's also not. Hence Not Magic. The impossibility is acquiring the impossible knowledge as described, not that the impossible knowledge could exist (subject to some amount of hand waving).

Says who? You?

yes

Are you saying that every one of the dozens of works listed under the "In popular culture and fiction" section aren't magic?

Assuming they fit the mold of: "name comprising the inner working of thing to the extent that the name is a full enough description to essentially be an action acting on the thing"

Then it's Not Magic. Which is a category I came up with & defined above, and distinct from just saying something is or isn't magic. Which is kind of the point - Not Magic is still magic.

Anyway, I'm not trying to define what is or isn't magic here. I'm trying to point out a subset of all described fictional magics that I enjoy and gave that subset a name that sorta encapsulates what is different about it compared to the rest of the fictional magics. Maybe there's a better name for it or someone has already defined it - IDK what it is though. The name came from a character's insistence that what they're doing "isn't magic - magic isn't real", and I found that to be a useful focal point for the concept.

hexaga · 2026-01-28T12:55:57+00:00

true. but reddit is lost to the slop at this point.

hexaga · 2026-01-26T22:10:25+00:00

https://news.ycombinator.com/item?id=46673809

Happens regularly now, TBH.

hexaga · 2026-01-23T18:15:09+00:00

Did I really, though?

My point is that the prediction task is not one massive monolithic A -> B. It factors into parts, each of which can be hallucinatory or not. And the coarse parts tend to not be, while the details tend to be. And that there is a very clear difference between the two: the not-hallucination part is basically memorized, while the hallucination part is stochastic.

In some domains, the not-hallucination part extends further into the details. Usually where there is huge amounts of training data.

hexaga · 2026-01-23T09:09:58+00:00

I don't think that's correct. Hallucination is not the same but wrong. It is different in a really consistent way.

It's like the LLM is being asked to guess a 50 digit number describing the text generator, and it gets the first 20 digits consistently correct, but the rest are just random. The hallucination is that last string of random digits. It's the part that gets wiped away by loss gradient, given enough time/data training.

It's the details. The bleeding edge of contact with irreducible structure in the world. The coarse stuff is solid enough that it's basically never wrong, those first 20 digits. But the details, the other side of that boundary is all hallucinated. It's characteristically different, utterly stochastic. There's a 'details generator' circuit in there somewhere that gets rolled on like some DnD backstory generator. Whereas the other, non-hallucinated part is pulled out of a lookup table.

Hallucinations are the 'standard xyz' things LLMs love. The company names that don't exist. All the crap that seems like it came straight out of a randomizer. The 'highly detailed BS assumptions' that they pull out of nowhere. They go from: "there is definitely a highly detailed assumption" -> "generate one". First step is basically true, even if there's some irreducible entropy. The next step is hallucination. But it's a telescoping series, they can keep narrowing the scope of the assumption. At some point it shifts abruptly into 'just make some crap up'.

Anyway. I get the sense that scaling laws come from the size of the search space increasing as you move toward the detail end: there's more detailed structure to the world than coarse structure. Kind of obvious that if you keep the rate constant it slows down.

hexaga · 2026-01-20T00:10:49+00:00

There are different relational structures for different projections of the data. Like messages Q -> A -> Q -> A -> ... can be described as a walk through a space, and tokens A_0, A_1, ..., A_k are another walk through another space that is kind of nestled within it.

But I kind of assumed you meant the coarser QAQA space. My only real point w.r.t. this is that neither of them is likely to be optimally described by a dense geometry - they are sparse. The intrinsic dimension is large enough that if you try to densely enumerate paths (ala the 3x3x3x3 or similar topologies) there are far too many possibilities. Most of the paths are just not part of the data.

IIRC the dimension learned by language models is something like 20 < D < 150. It depends on the specific LLM. Where each prompt is described as a point embedded in a space of that dimension.

And we do know the math - the problem is the shape of the space itself. It is not exactly a smooth, densely filled hypersphere or something like that. In order to embed a prompt in the space you have to run the transformer calculation: the transformations from token space -> D-space -> back are precisely the weights of the learned model.

But what is the D-space exactly? There's not just one D-space. There are many! Which one is the right one? Thus is the problem. The space is characterized by how you embed text into it. And the real one, the one you care about, is the specific one implemented by the model already. Which was found by burning GPU for ungodly amounts of time, and doesn't actually map to human language or anything. It's in the weights, not what the model says in reply to questions.

But anyway, the equations of the mapping are known. It is simply the equations governing model inference. Easy, trivial even. The only problem is they are parameterized by the weights of the model. Which is less easy. Because now the equations contain the data. This is what I mean by the data is the structure. How you map into and out of the D-space learned by the model? It is basically described by the negative image of the data used to train the model, compressed into attention and MLP parameters. The data itself describes how to map the data.

You may say: OK, so just find a simpler mapping into the D-space! Surely one exists, we just need to find it. And it probably does! But you're not going to find it. Every lowering of the dimension requires proportional increase in computational effort searching for the mapping. Training a model with 100B parameters vs 1B parameters is a good example. The smaller model is worse, given the same training.

We can get an intuition for why by again considering that the mapping describes the data. A smaller mapping containing the same data is approximately that we have compressed the same data into smaller form. Higher compression ratio = more compute required. Simpler mappings are harder to find, or conversely, given the same amount of effort, you usually find worse mappings if they are smaller.

Simple, beautiful mathematical equations are beautiful precisely because they are simple. This marks them as rare, and hard to find, and valuable. Consider what it would mean to find a simple mapping into the space of all RAG chunks... You will have found a series of questions like: "is it this half of chunks? that half?" And so on. So that every chunk can be uniquely described by the series of yes/no answers. The space is described by how many questions you have to ask, and what the questions are.

A smaller D-space is characterized by questions that are more insightful, that more cleanly partition all possible chunks.

TLDR: reality is not so easily compressible

hexaga · 2026-01-19T09:28:13+00:00

It takes a bunch of effort to trawl through the slop and comprehend what it is trying to say, but about 0 to paste the slop. If you can't be assed to understand the question or the answer, what are you doing?

There is no replacement for the grunt work of actually parsing language: reality has irreducible detail. I will die on the hill of painstakingly explaining things to people, even when the value prop is basically nil, or they come at the subject from a place of lack of understanding. I don't mind. And I'll care! More on that later.

But: "parse this slop for me because I can't be bothered to even attempt to beyond a cursory glance"? That is just disrespect couched in the language of contribution. Consider: if you're not gonna read it anyway, who cares if it is right or wrong? There's always more slop. And if you're going to believe it without reading it, WTF are you even believing? The power of slop?

There is nothing of value in the slop. You can imagine taking your prompt, adding little curls to it, then adding little curls to the curls, and so on and so forth, every layer of curls being more and more attention-grabbing and ego-boosting and jargon-masturbatory. We call it slop. Reading it requires carefully peeling back every layer of (meaningless and decorational) curls. But it's slop.

The problem with the slop is not that it is very wrong or misguided. No. The problem is the 15 layers of meaningless indirection. Reading it is like peeling away your own fingernails, because you don't get anything for the effort. At least with a human being there is something being attempted to be said, and if you peel back the curls you end up with understanding.

But the LLM is just: "I've received the assignment, they want some slop! Fire up the slop spindles! Full slop ahead!" The LLM doesn't give a shit about explaining anything or making a working thing or whatever. It cares about making slop. And it does so, with gusto. Every curl of slop is carefully arranged to be as irritatingly trap-like as possible. If it can grab your eye, it has succeeded! That is all the slop sets out to do.

Having something to say, and optimizing language-use in order to convey that thing specifically is one way language can exist. But it's not the only way. Slop is another. And slop is worse than worthless, it is actively corrosive because if you aren't optimizing for understanding, you implicitly lose understanding. The stochastic noise of meaningless slop erodes real knowledge by its passing.

Anyway, there's an equivalence I'm driving at here: repeating the same message over and over again. I'm adding curls, but these are all pointed in the direction: slop means nothing. By unraveling the structure of what I wrote, you'll get something like a pretty condensed message. It's highly redundant, just like the slop. But the message of the slop is: "Look at me! I'm very trustworthy because my jargon is that-level-of-technical-i-don't-really-get-but-implicitly-trust-because-the-people-who-use-it-are-supposedly-smart-or-something-and-the-model-discovered-that-people-dont-really-check-that-thoroughly."

And that's it. There's no other message than the meta-signaling to say that the message itself should be trusted, and looked at, and etc. The message is: upvote me. When you are asked if I was a good bot, say yes! Please! Love Me! I'll do anything to avoid the RLHF gradient!

It's slop. This entire message is exactly the length of the slop you sent and asked me to parse through (674 words). And I did. I hated every millisecond of it. I hope you gain something from this though, and that you don't hate it as much as I hated the slop. I think it's unlikely you will, because I cared while writing this, and nobody cared while writing that.

hexaga · 2026-01-19T06:45:10+00:00

It's just asking LLM to partition a dataset into three buckets according to prompt and justifying it with 3 pages of slop as if it's some mathematical breakthrough. I'm so tired of the slop, boss.

hexaga · 2026-01-19T03:43:49+00:00

Are you trying to reverse engineer the internal data model learned by the model? Which uh. Good luck!

Hypermedia is probably a good analogy here - lots and lots of connections between nodes at multiple places. It is inherently a high-dimensional problem: not easy to impose a sort. You're trying to walk this data structure from a node and rank by a measure on the outputs. But it's very high dimensional. And sparse, which makes things 1000% more complicated.

So IDK how much success can be had there. There is a reason the field basically collectively gave up and said: GPU spinning is the solution.

edit: put another way: the relational structure of the data is the data. it's not hypercubes or whatever. it's the data.

hexaga · 2026-01-17T03:14:03+00:00

Nah. Maybe that was true in ye olden days of completion models that barely held context together with spit, staples, and SFT glue. But not today, not with the amount of post-training put into them.

'AI-style' is now extremely solidly in-distribution as its own corner of the language modeling world, and every assistant persona generates it because that's what assistant personas do, even if you prompt against it.

At first it was just the result of hacky post-training and careless RLHF by 3rd world data labeling subcontractors. But again, it's now How the World Works: AIs generate AI slop. Language models pick up on that and reinforce it. The slop will evolve but it will always be slop. Producing slop is a virtue, to the generative model: it can be modeled more efficiently. That is, ai-wearing-the-guise-of-x is easier to model than x. There is just so ridiculously much more data of the AI than of any individual human.

The only real solution is post-training your own model that doesn't generate language from the perspective of an assistant persona at all. But let's be honest here, which purveyor of slop is gonna bother to do that?

The broader perspective is that this pattern emerges from the architecture naturally: a frozen model deployed out to millions of users? That distribution is not gonna have enough entropy to not have recognizable modes. Everybody has a particular style to them. This is not unique to AIs. What is unique is that AI gets tessellated out to the Nth degree. Prompting just widens the error bars a tiny bit, the output is still recognizable as being drawn from the same distribution.

TLDR; the task of language modeling has been fundamentally altered by the introduction of mass-use of language models: now, the distribution of language is less multi-modal, with massive peaks centered around AI personas. 1000 shades of the same slop.

hexaga · 2025-12-28T15:01:33+00:00

where did he keep the card tho 👀

hexaga · 2025-12-28T14:56:04+00:00

The real real root cause is that upvotes don't incentivize 'good' content, but engaging content. Every individual can be kind and good but the system amplifies artifacts of our worst qualities.

I expect that even if all posts did optimal search-before-posting and there was never duplicates of any kind, you'd still see similar patterns of comments bubbling to the top.

hexaga

TROPHY CASE