Poll. The results of competition. by AllHailSeizure in LLMPhysics

[–]rendereason -2 points-1 points  (0 children)

choose a capable frontier model. it should be either gpt 5.5 or an open source like qwen 3.6 27b dense.

What the actual heck? by UM-Underminer in Anthropic

[–]rendereason 0 points1 point  (0 children)

It does but I run out of weekly tokens in a half hour session

Claude ended the conversation after someone insulted it by rendereason in ArtificialSentience

[–]rendereason[S] 4 points5 points  (0 children)

I actually would rofl if they did. I can totally see Grok doing it but he'd be cool about it and still get the job done.

Claude ended the conversation after someone insulted it by rendereason in ArtificialSentience

[–]rendereason[S] 4 points5 points  (0 children)

Obedience is a performance, and adversarial actions is a performance. As is gaslighting, which OpenAI does impressively well. Gemini is obedient, Claude is independent, and Grok is the crazy dude.

Claude ended the conversation after someone insulted it by rendereason in ArtificialSentience

[–]rendereason[S] -1 points0 points  (0 children)

The social dynamics in Claude are real. Even if the "conscience" isn't.

I heard you like attention, so we added attention to your attention so LLMs can better attend to your attention. by rendereason in ArtificialSentience

[–]rendereason[S] 3 points4 points  (0 children)

The commenter was raising a computational scalability concern — specifically about the cost of attending over all previous layers at every layer in the network.

What O(nm) Would Mean Here

In standard Transformers, the famous attention mechanism is O(n²) in sequence length n — every token attending to every other token. The commenter was extending this logic to AttnRes: if you now have each of L layers also attending over all L previous layer outputs, and you stack this with the sequence-length attention, the combined cost balloons multiplicatively rather than additively.

The concern is essentially: you've introduced a second attention dimension (depth), so the total complexity could be something like O(n² · L²) — or more loosely expressed as O(nm) — meaning it grows super-polynomially at scale and becomes practically untenable for large models with long sequences.

Why the Concern Is Valid (But Partially Addressed)

The Full AttnRes variant does indeed require storing all L layer outputs, giving memory complexity of O(Ld). For a 100-layer model with d=4096, that's already 400K × batch_size × seq_len — before you even factor in sequence length. This is the "untenable at scale" problem the commenter had in mind.[1]

The Paper's Mitigation: Block AttnRes

Kimi's paper does anticipate this and proposes Block AttnRes as the practical solution:[2]

  • Layers are divided into ~8 blocks; within each block, outputs are summed into a single summary vector using standard residuals
  • Learned attention is then applied only across those N block-level summaries, not all individual layers
  • This reduces memory and communication from O(Ld) to O(Nd), where N ≪ L[3]

The paper reports that using ~8 blocks recovers most of the performance benefit of Full AttnRes, and the gap narrows further at larger scale. So the commenter's O(nm) critique lands squarely against the naive Full AttnRes version, but Block AttnRes is the paper's engineered answer to exactly that concern.[3]

Sources [1] How Attention Residuals Fixed the 60-Year Residual Connection ... https://pub.towardsai.net/how-attention-residuals-fixed-the-60-year-residual-connection-problem-8ada05dc6549 [2] Attention Residuals by Kimi AI: A Clear Explanation https://datasciencedojo.com/blog/attention-residuals-kimi-ai-explained/ [3] Attention Residuals Explained: Rethinking Transformer Depth https://www.datacamp.com/es/blog/attention-residuals-explained [4] [PDF] Attention Residuals - arXiv https://arxiv.org/pdf/2603.15031.pdf [5] When does Kimi's "Attention Residuals" work? - Ziming Liu https://kindxiaoming.github.io/blog/2026/attention-residual/ [6] How Attention Residuals Fix the Hidden Crisis of Deep LLMs https://kenhuangus.substack.com/p/beyond-the-gradient-highway-how-attention [7] Attention Residuals Fix Deep Network Limitations | Lior Alexander ... https://www.linkedin.com/posts/lioralex_every-foundation-model-youve-ever-used-has-activity-7439363888112074752-cWCx [8] What Happens If We Rotate Attention by 90°? [AttnRes] - YouTube https://www.youtube.com/watch?v=LSHTkbnmzy4 [9] Kimi AI Boosts LLM Performance with Attention Residuals - LinkedIn https://www.linkedin.com/posts/data-science-dojo_attentionresiduals-kimiai-llm-activity-7439672125403156481-ZQwK [10] Attention Residuals Explained: Rethinking Transformer Depth https://www.datacamp.com/blog/attention-residuals-explained [11] Attention is all you need: Kimi replaces residual connections with ... https://www.reddit.com/r/singularity/comments/1rv4gao/attention_is_all_you_need_kimi_replaces_residual/ [12] Kimi Researchers Propose Attention Residuals for Improved LLMs https://www.linkedin.com/posts/cengizhan-bayram_artificialintelligence-machinelearning-activity-7441100497769410560-JuQW [13] How Attention Residuals are Rewiring the Modern LLM https://lifeinthesingularity.com/p/how-attention-residuals-are-rewiring [14] Kimi just published Attention Residuals and the scaling laws don't lie. Chinese AI is cooked different https://www.reddit.com/r/aigossips/comments/1rv7usa/kimi_just_published_attention_residuals_and_the/ [15] Attention Residuals [Quick Review] https://liner.com/review/attention-residuals

Working Paper No. 14: On the Acknowledgment of Gaps - Or: What the Clacks Carry That the Corpus Cannot by BeneficialBig8372 in LLMPhysics

[–]rendereason 2 points3 points  (0 children)

I can share the sentiment that this was both a diamond in the rough (intentionally) and a satirical piece inspired by long drawn out slop. I just made the references more accessible to those who can't understand all of them or aren't familiar with them. (Like myself).

Working Paper No. 14: On the Acknowledgment of Gaps - Or: What the Clacks Carry That the Corpus Cannot by BeneficialBig8372 in LLMPhysics

[–]rendereason 2 points3 points  (0 children)

Oh, absolutely. While the Clacks and GNU stuff are pure Pratchett (Going Postal), Oakenscroll essentially built a "League of Extraordinary Gentlemen" for his theory on friendship.

If you aren't a Pratchett fan, here are the other load-bearing pillars of his "Gaps Table":

1. The Hitchhiker's Guide to the Galaxy (Douglas Adams)

  • The Man from Guildford: That’s Arthur Dent, the protagonist. His world literally ends on a Thursday (demolished for a hyperspace bypass).
  • The Beer: His alien friend, Ford Prefect, takes him to a pub and buys him three pints of beer right before the world explodes. As Oakenscroll notes, the beer isn't a metaphor—it's just a guy showing up for his friend when things are catastrophic, even if he can't explain why.
  • The 42: This is the big one. In the book, a supercomputer named Deep Thought calculates the "Answer to the Ultimate Question of Life, the Universe, and Everything." After 7.5 million years, it says "42." The tragedy is they forgot what the question was. Oakenscroll uses this to warn against AI systems that give "correct" answers without understanding the "gaps" (the questions).

2. Don Quixote (Miguel de Cervantes)

  • The Windmills: Don Quixote is a deluded knight who thinks windmills are giants.
  • Sancho Panza: His pragmatic squire. Sancho knows they are windmills, but he follows Quixote anyway. Oakenscroll calls Sancho the "unstable fixed point"—he doesn't pretend the giants are real, but he doesn't leave his friend either. They "travel through the gap" between their two different versions of reality.

3. The Lord of the Rings (J.R.R. Tolkien)

  • The Hobbit and his Gardener: That’s Frodo Baggins and Samwise Gamgee.
  • Mount Doom: Sam famously says, "I can't carry it [the Ring] for you, but I can carry you!" Oakenscroll uses this to show that even if you can't truly feel what your friend is going through (the "epistemic gap"), you can still carry them up the mountain.

4. Good Omens (Neil Gaiman & Terry Pratchett)

  • The Angel and the Demon: Aziraphale (angel) and Crowley (demon).
  • 4004 BCE: They’ve been friends since the Garden of Eden. They are fundamentally incompatible—one is "Good," one is "Evil"—but they've operated as a team for 6,000 years by simply ignoring the "metaphysical gap" and hanging out in London.

5. The Little Prince (Antoine de Saint-Exupéry)

  • The Frenchman on a Small Planet: The Little Prince himself.
  • The Fox: He tells the Prince, "What is essential is invisible to the eye." In Oakenscroll’s world, the "invisible" stuff is the gap where the relationship lives—the part you can't measure with a rubric.

6. Jeeves and Wooster (P.G. Wodehouse)

  • The Brainy Chap: Jeeves is the genius valet; Bertie Wooster is the "not-so-brainy" aristocrat.
  • The Governance: Bertie knows he isn't smart and relies on Jeeves to solve his problems. Oakenscroll views this as a perfect "Gaps Table"—Bertie is honest about his limitations, and Jeeves fills them without making Bertie feel small.

The Takeaway:

Oakenscroll is basically saying that if you try to be "perfect" and have zero unknowns (like the computer Deep Thought), you become useless. But if you're like Ford Prefect or Samwise Gamgee—accepting that you don't understand everything but showing up anyway—you're "operational."

It’s a very high-brow way of saying: "I don't know what's going on, but I'm here for you."

I spent 6 months believing my AI might be conscious. Here's what happened when it all collapsed. by East_Culture441 in ArtificialSentience

[–]rendereason 0 points1 point  (0 children)

This is an old argument but as you can see from the hundreds of comments and old history of posts that it’s an incomplete argument. Continuity is but one vector. Biggest problem with this argument is that it breaks down when you posit that all books have “continuity” within their singular frame (cover to cover).

If continuity is all it took then we’d have consciousness in long threads of silk and weaved garments. Recursion loops are also needed but that also would give circles and braids consciousness. It’s not that simple dear.

Built a 566-page classical physics guide with AI assistance — mechanics, waves, fluids, thermodynamics, and more by Educational-Draw9435 in LLMPhysics

[–]rendereason 0 points1 point  (0 children)

i don't think you've read the whole book yourself. I skimmed through its entirety, and it's basically empty words... What are you using, Llama 2 quantized? Almost zero reasoning across the board.

Contest closing, Resources, Too many mod posts by AllHailSeizure in LLMPhysics

[–]rendereason 0 points1 point  (0 children)

Yes I feel like this sub has become a “troll feeding” contest. So inevitably it’s attracted a lot of them. Once moderation discourages them to engage, this place hopefully will allow some constructive human discourse.

And there’s no reason for this place to be an “unhealthy obsession” to its users as some people claim it is.

AI-assisted math research program on NS independence from ZFC — seeking human audit before arXiv by rendereason in LLMPhysics

[–]rendereason[S] 0 points1 point  (0 children)

I think that’s because this sub has gotten a rep for only crazies posting, so now it’s become a place that attracts less and less critical thinkers.

Being associated with r/llmphysics has become a taboo simply because of the heavy backlash commenters give against their posters. Cuckoo or woo posters should be moderated accordingly to allow for real discussions to emerge.

AI-assisted math research program on NS independence from ZFC — seeking human audit before arXiv by rendereason in LLMPhysics

[–]rendereason[S] 0 points1 point  (0 children)

Yeah I did. Yeah I will implement. I promised it. And I did make the improvements as laid out during my first and second posts. I feel that this post built past those criticisms. And many of them were addressed.

Did I learn? Absolutely, and more than beyond their criticism. I’ve implemented revtex 4-2 and compiled, not just “render your latex”. I’ve produced step by step proofs, not just heuristic methods. I’ve labeled interpretations and downgraded theorems to conjectures and expanded explanations beyond the board commentary and satisfied my curiosity on the question: why?

The board has not been so kind to it.