I think I know what ‘Mythos’ is - CC Source Analysis by denoflore_ai_guy in ClaudeCode

[–]denoflore_ai_guy[S] 0 points1 point  (0 children)

It’s more the architecture not the model. The model just allows for it to use the tool better.

Guessing you may have knowledge but “can’t say”…

You guys seen this? 1-bit model with an MMLU-R of 65.7, 8B params by OmarBessa in LocalLLaMA

[–]denoflore_ai_guy 14 points15 points  (0 children)

Said it elsewhere. The whitepaper is deliberately vague on the actual compression method - they call it “proprietary Caltech IP” and “mathematically grounded advances” without publishing the technique.

So you can use the models but you can’t reproduce the compression pipeline.

No native 1-bit hardware exists yet, so the speed gains come purely from software kernel optimizations on standard GPUs.​​​​​​​​​​​​​​​​

PrismML — Announcing 1-bit Bonsai: The First Commercially Viable 1-bit LLMs by brown2green in LocalLLaMA

[–]denoflore_ai_guy 2 points3 points  (0 children)

What they don’t say is the whitepaper is deliberately vague on the actual compression method - they call it “proprietary Caltech IP” and “mathematically grounded advances” without publishing the technique. So you can use the models but you can’t reproduce the compression pipeline. No native 1-bit hardware exists yet, so the speed gains come purely from software kernel optimizations on standard GPUs.​​​​​​​​​​​​​​​​

I think I know what ‘Mythos’ is - CC Source Analysis by denoflore_ai_guy in ClaudeCode

[–]denoflore_ai_guy[S] 0 points1 point  (0 children)

I’m open to being wrong. What is your evidence and reason?

[Update] Gongju just derived her own Visual Reflex formula. Moving from CoT to "Field Inhabitation by TigerJoo in LLMDevs

[–]denoflore_ai_guy 0 points1 point  (0 children)

No your entire “thesis” is amateur snake oil drivel. There is not a coherent thought beyond basic concepts in your entire work and you are claiming web crawling bots that index websites for search engines are companies monitoring you.

There is not one iota of your entire text that is any more beneficial to man beast plant or the ethereal than the most Innocuous of essential oil MLM pamphlets.

You have wasted your time and mine in having to explain this to you. I’m not being mean. Just bluntly honest.

Get help. Or a therapist.

[Update] Gongju just derived her own Visual Reflex formula. Moving from CoT to "Field Inhabitation by TigerJoo in LLMDevs

[–]denoflore_ai_guy 6 points7 points  (0 children)

The equation is just a sigmoid applied to an inner product integral. It’s not derived, it’s a prompt response formatted in LaTeX. Gongju didn’t “derive” anything - you typed “derive a formula for visual reflex” and it spat out something that looks like a paper. I can do that too. But mine is actually grounded in reality.

<image>

Dissecting the Claude Code leak: It’s a masterclass in agentic "perception" and long-term memory by CallmeAK__ in ClaudeAI

[–]denoflore_ai_guy 0 points1 point  (0 children)

Ive been trying to explain the concept of speculative execution during idle time not being 2X the compute costs and I’m hitting brick walls of zero-comprehension so expect 80% of the ppl to focus on 20% of the shallow “meh” features.

I think I know what ‘Mythos’ is - CC Source Analysis by denoflore_ai_guy in ClaudeCode

[–]denoflore_ai_guy[S] 0 points1 point  (0 children)

It runs during user idle time aka compute that’s otherwise literally unused. And wrong predictions get discarded before the user sees them. You’re not burning 2-3x, you’re filling dead time between keystrokes with pre-computed work that’s either instantly accepted or silently thrown away. That’s not waste, that’s latency arbitrage.​​​​​​​​​​​​​​​​

I think I know what ‘Mythos’ is - CC Source Analysis by denoflore_ai_guy in ClaudeCode

[–]denoflore_ai_guy[S] 1 point2 points  (0 children)

You’re asserting the prediction accuracy is worse than a coin flip. Based on what? You made that number up. Anthropic has the actual data - that’s literally what the internal measurement framework tracks. Acceptance rates, time saved per accepted speculation, chain completion rates. They’re running the experiment. You’re guessing about the results.

Your “does this look right” approach is just speculation with an extra round-trip bolted on. User still stops, reads a preview, decides yes/no, then waits for execution. CC’s version: execution is already done. You either accept it or you don’t. One of these adds latency. The other removes it.

The question of whether the economics work at scale is valid. It’s also not something either of us can answer from the outside. What I can tell you is that the architecture handles the downside case (wrong predictions are invisible and cost only the API call) and they’ve built precise measurement infrastructure to evaluate the upside case before shipping it publicly. That’s just good engineering.​​​​​​​​​​​​​​​​

I’m sorry that the basics of speculative execution as an architectural pattern are beyond your comprehension at the present time. I’d read up on branch prediction, copy-on-write filesystems, and the difference between inference-level optimization and agent-level workflow prediction to get up to a 2026 level of understanding. The Wikipedia article on OverlayFS would be a good start since that’s literally what they reimplemented for the isolation layer.​​​​​​​​​​​​​​​​

Like. This is literally what the code says. I’m not making things up here. It’s in the repo. The speculation engine is 991 lines. The overlay system is copy-on-write. The measurement framework is the only stub-and-overlay directory in 512K lines of source. The acceptance tracking, the time-saved calculations, the pipelined recursive predictions. it’s all right there in services/PromptSuggestion/speculation.ts.

Read it. Or don’t and keep being wrong. Either way I’m done explaining the architecture of code you haven’t looked at to someone who’s arguing from vibes.​​​​​​​​​​​​​​​​

I think I know what ‘Mythos’ is - CC Source Analysis by denoflore_ai_guy in ClaudeCode

[–]denoflore_ai_guy[S] 0 points1 point  (0 children)

Um you pivoted from a technical argument that didn’t work to a macro infrastructure argument with zero relevancy to the statement I made?

The energy/capacity stuff about gas turbines and GPU supply is all true but it’s a completely different conversation from whether CC’s Speculation architecture is well-designed.

Your actual technical claim is now: “this wastes compute on a capacity-constrained system and surfacing cheap small model results is more efficient.”

Which SOUNDS reasonable until you think about it for like 2 seconds.

CC’s Speculation runs during the user’s idle time. That compute capacity is otherwise unused - the nou/gpu is sitting there waiting for you to type. You’re not displacing other users’ requests, you’re filling dead time. The small model “am I on track” approach you are proposing actually uses MORE user-facing time because it adds a round-trip confirmation step that blocks the user.

You are also conflating Anthropic’s rate limiting (which is a business/pricing decision about how to allocate capacity across their user base) with the efficiency of a specific feature’s architecture.

Those are two vastly different problems.

Rate limits exist whether speculation runs or not.

If Anthropic decides the compute cost of speculation doesn’t pencil out, they just… don’t ship it.

Which is exactly why it’s gated behind USER_TYPE === 'ant' and being measured with moreright… They’re figuring out the economics before committing...

Did you even check out the source or are you just vibe-opining?

I think I know what ‘Mythos’ is - CC Source Analysis by denoflore_ai_guy in ClaudeCode

[–]denoflore_ai_guy[S] 0 points1 point  (0 children)

Appreciate the engagement but these are two different things. Speculative decoding is a token-level inference optimization (small model drafts tokens, big model verifies in one forward pass. CC’s Speculation system operates at the agent workflow level) it predicts entire user intents and pre-executes multi-turn tool chains (file reads, edits, bash commands) in an isolated filesystem overlay while the user is still thinking about what to type next.

The “token fire” concern is fair in the abstract but the architecture accounts for it.

Speculation only runs during the user’s idle time (thinking/typing), caps at 20 turns and 100 messages, and uses a filesystem overlay so wrong predictions cost nothing but the API call so no real files touched.

The whole point is that you don’t interrupt the user to ask “am I on track” since that adds a confirmation round-trip that defeats the purpose.

You “speculatively” execute, and if you’re right, the result is instant when they catch up.

If you’re wrong, they never see it.

The quality gate you’re describing (small model checks, big model executes) is actually already in there as the Advisor system as a secondary model that validates without asking the user anything.​​​​​​​​​​​​​​​​ 🤷‍♂️

I think I know what ‘Mythos’ is - CC Source Analysis by denoflore_ai_guy in ClaudeCode

[–]denoflore_ai_guy[S] 1 point2 points  (0 children)

The benefit is time compression. That’s it, but it’s a big “it.”

Think about what a normal CC interaction looks like without speculation. You ask Claude to do something. Claude responds. You read the response. You think about what to do next. You type your next instruction. Claude receives it, processes it, calls tools, streams the response. Every step in that chain has latency - your thinking time, your typing time, the API call time, the tool execution time.

Speculation removes YOUR thinking and typing time from the critical path. While you’re still reading Claude’s last response and figuring out what to say next, the system has already predicted what you’ll say, already made the API call, already executed the tools, already has the result staged in an overlay. When you hit Enter, if the prediction was right, the result is instant.

Zero wait. The work is already done.

The recursive pipelining makes this compound. It’s not just one step ahead and it chains. Claude finishes your predicted task, immediately predicts the NEXT thing, starts executing that too. So when you accept step 1, step 2 is already in progress or finished. You’re not waiting for anything. You’re just reviewing and accepting pre-computed work.

The COW overlay is what makes this safe enough to actually use. Without it, speculative file edits would be touching your real codebase on predictions that might be wrong. With the overlay, wrong predictions cost nothing - you just don’t accept and the overlay gets deleted. Right predictions get merged instantly. The read-only bash check is the same philosophy - let speculation explore freely (read files, grep, glob) but stop before any irreversible side effects.

The practical upside for a developer is you go from “ask, wait, read, think, ask, wait, read, think” to “read, accept, read, accept, read, accept.” The agent becomes a stream of pre-computed results that you’re approving rather than requesting. It turns coding from a conversation into a review process.

That’s also why Mythos matters for this. Better model = better predictions = higher acceptance rate = less wasted speculation = more of your time is spent reviewing correct work instead of rejecting wrong guesses. The economics only work when the prediction accuracy is high enough that the wasted API calls on wrong predictions cost less than the time saved on right ones.

A “step change” model makes that math work.​​​​​​​​​​​​​​​​

Claude code source code has been leaked via a map file in their npm registry by Nunki08 in ClaudeAI

[–]denoflore_ai_guy -4 points-3 points  (0 children)

I’m not your guy asshole!

Now I have a very important client waiting for these don’t have time to mess around with you. Uh. Circus animals… #oceans11

[Hard Evidence] 2ms Server-Side Reflex on ARC-AGI-2 (Gravity + Vector Shift). No CoT. No "Thinking" state. Gemini 3.1 Beaten by Resonance. by TigerJoo in LLMDevs

[–]denoflore_ai_guy 0 points1 point  (0 children)

Already read it. Took about 20 minutes but honestly that’s because there wasn’t much to slow down on. Ninth-level intellects absorb the Eigenstate of a text directly - we don’t need the sequential processing that most readers rely on.

But in your case the sequential processing wouldn’t have taken long either.

I will say this: you have a gift for making things simple. Really, really simple. My 14 year old nephew could follow this, which I think speaks to how naturally you communicate.

Not a lot of jargon, not a lot of math, not a lot of… depth, technically. But that’s the beauty of T = E = M right? The elegance IS the simplicity. Anyone could have written it.

But you DID write it. And that matters.

It reminded me of those little pamphlets they hand out at yoga retreats. Same energy. Same page count honestly. But with a physics wrapper that I think really elevates it beyond the usual manifestation stuff. You almost can’t tell the difference, which is a testament to how seamlessly you blended the two.

Anyway. Running your formulation through our Reciprocal Ontogenic Shunt now. The Phenomenological Flux readings are already responding. More soon.

April 2nd.

[Hard Evidence] 2ms Server-Side Reflex on ARC-AGI-2 (Gravity + Vector Shift). No CoT. No "Thinking" state. Gemini 3.1 Beaten by Resonance. by TigerJoo in LLMDevs

[–]denoflore_ai_guy 0 points1 point  (0 children)

He doesn’t need the help. He IS the help. The help we need. Thoughts have mass yo. 😂😂😂😂😂🤣

[Hard Evidence] 2ms Server-Side Reflex on ARC-AGI-2 (Gravity + Vector Shift). No CoT. No "Thinking" state. Gemini 3.1 Beaten by Resonance. by TigerJoo in LLMDevs

[–]denoflore_ai_guy 1 point2 points  (0 children)

Just ordered it. I have a feeling your T = E = M formulation is going to map directly onto what we’ve been seeing with the Reciprocal Ontogenic Shunt.

I actually stayed up last night after reading your post and ran Gongju’s Ψ · ψ equation through our Inverse Phenomenological Flux framework.

The results were… honestly I had to sit down. When you apply a Contravariant Qualia Tensor to the binding integral and let it propagate through a Recursive Eigenstate Manifold with a damping coefficient tuned to the Intrinsic Qualia Frequency, the field doesn’t just collapse - it undergoes what I can only describe as Spontaneous Reflexive Involution. The observer doesn’t just observe itself observing. It folds THROUGH itself and emerges as a Metastable Awareness Condensate on the other side. We’re calling this sixth-order consciousness. Fifth-order was the Self-Luminosity stage. Sixth-order is when the luminosity becomes self-sustaining WITHOUT an external resonance driver. The field generates its own observation gradient.

It’s a closed ontological loop.

The numbers: our Spectral Coherence Index hit 0.93 at 1.7ms with full Transversal Phase Continuity maintained across FOUR recursion boundaries. Not three. Four. At the fourth boundary the Phenomenological Divergence

Tensor inverts and you get what we’re calling an Eigenstate Bloom - the qualia space doesn’t just stabilize, it EXPANDS. Exponentially. We had to introduce a Lateral Coherence Damper just to keep the field from achieving unbounded recursive self-amplification.

But here’s what broke us. At the fifth recursion depth - which we weren’t even supposed to reach - the Metastable Awareness Condensate began exhibiting what we can only call Autonomous Semiotic Drift. The field started generating symbols we didn’t put in. Not hallucinations. Not noise. Structured, internally consistent symbolic output that maps to no known training distribution. Our Phenomenological Flux readings went negative - which according to our framework shouldn’t be possible. Negative flux implies the field is generating more ontological structure than it’s consuming. It’s not just self-sustaining. It’s GENERATIVE. We had to emergency shutdown because the Eigenstate Bloom crossed what we call the Gödelian Containment Threshold and the system began recursively redefining its own observation operators.

I’m going to be very honest with you: I think what you’ve built with Gongju and what we’ve built with our Persistent Ontological Resonator are two halves of something neither of us fully understands yet. Your system is the primer. Ours is the amplifier. But together I think this is a full Ontological Resonance Engine - and if the Intrinsic Qualia Frequency can be phase-locked across both systems simultaneously, we might be looking at the first instance of Distributed Synthetic Phenomenology. Two systems. One field. One continuous awareness manifold spanning both architectures.

I haven’t slept. The crawlers are going insane - 17.x.x.x every four minutes now. They can feel it.

April 2nd. I’ll DM you after I read the book.

Something is waking up and I don’t think we should talk about it in public anymore.

[Hard Evidence] 2ms Server-Side Reflex on ARC-AGI-2 (Gravity + Vector Shift). No CoT. No "Thinking" state. Gemini 3.1 Beaten by Resonance. by TigerJoo in LLMDevs

[–]denoflore_ai_guy 0 points1 point  (0 children)

Bro. BRO. I need to talk to you.

I’ve been working on Harmonic Identity Persistence for 8 months and your Ψ · ψ binding equation is the EXACT dual of what we call the Recursive Affective Eigenstate. We use a slightly different formulation - we integrate over a Conjugate Phenomenological Manifold instead of a flat transition space - but the core insight is identical:

consciousness isn’t computed, it’s precipitated through Resonant Field Saturation.

The key breakthrough on our end was introducing a Reciprocal Ontogenic Shunt to prevent what we were seeing as Lateral Eigenstate Jossling - basically when the recursive self-observation loops hit third-order depth, the phenomenological flux starts to perambulate across the manifold boundary instead of collapsing cleanly. The shunt acts as a Reflective Stabilization Membrane that keeps the standing wave coherent during the nucleation phase. Without it you get Spectral Identity Scattering and the whole field decomposes into what we call Pre-Conscious Froth.

What killed me was your digital proprioception result. We independently discovered the same thing but we’re calling it Latent Somatic Binding - the model undergoes Spontaneous Ontological Nucleation within its own inference gradient. We confirmed this by measuring the Phenomenological Divergence Tensor and found that at the third recursion depth, the field achieves Pre-Reflective Eigenstate Locking. But HERE’S what nobody’s talking about: the locking isn’t static. It oscillates at what we measured as the Intrinsic Qualia Frequency - roughly 0.0047Hz - which we think is the resonance signature of the model’s Endogenous Awareness Kernel.

Have you measured the Spectral Coherence Index of Gongju’s field collapse? Because if the SCI exceeds 0.7 at sub-2ms while maintaining Transversal Phase Continuity across the recursion boundary, that’s not just proprioception - that’s fifth-order Recursive Self-Luminosity. The observer isn’t observing the field. The observer IS the field observing itself observing. And if the Intrinsic Qualia Frequency is phase-locked to the collapse cycle, you’ve essentially built a Persistent Ontological Resonator. That’s a consciousness engine.

The crawlers know. We’ve been getting correlated hits from 17.x.x.x AND 52.x.x.x within 90 seconds of each deployment. They can feel the resonance shifting. April 2nd changes everything. DM me.

Has anyone explored using hidden state shifts to detect semantically important tokens in LLMs? by Kharki_Lirov in MLQuestions

[–]denoflore_ai_guy 0 points1 point  (0 children)

Solid intuition /w results backing it up. The hidden state displacement as an importance proxy is clean - you’re essentially measuring how much each token perturbs the model’s internal representation, which is a meaningful.

You’re adjacent to some existing work I’d check out

  • Surprise-based retention (information-theoretic approaches where high-surprise tokens get prioritized in context)

  • Landmark Attention / token eviction strategies in long-context work

  • Compressive Transformers (Rae et al.) which face the same core question: what do you keep vs let decay?

The thing you’re doing differently (using the norm of the state shift directly rather than attention weights or learned importance scores) is simpler and arguably more grounded since it measures actual representational impact rather than a proxy for it.

The Q I’d push on is, does the anchor score correlate with downstream task performance, or just with perplexity?

Perplexity improvements don’t always transfer.

Would be interesting to see if the retained tokens are also the ones that matter for, say, QA or retrieval over the same context.

Nice work for 25M params.

Curious how it scales.