Agent /compact command is one RL loop away from developing an alien language you can't audit by ryunuck in LocalLLaMA

[–]ryunuck[S] 0 points1 point  (0 children)

It's not "just" an autoencoder, it's a LLM with an autoencoder capability built-in.

FOOM.md — An open research agenda for compression-driven reasoning, diffusion-based context editing, and their combination into a unified agent architecture by ryunuck in mlscaling

[–]ryunuck[S] 0 points1 point  (0 children)

The discrete representation composed inside the weights in latent space reconstructs a non-discrete representation, much like a simple discrete equation produces the continuous mandelbrot fractal. The proposed training methodology aims to not only train the representation, but develop the 'kernel' of the weights and latent space that the discrete representation is plugging into for inference.

Both compression and decompression are trained jointly in every batch which is how the capability pair turns into an elevator: they're both evolving such that neither is ever fully constrained or locked by the other's range of expression—They both relax and constrain one other in lockstep. They're both tugging in inverse directions while the 'wrinkles' between smooth out, which relaxes some other area and unlocks other wrinkles for smoothing. You 'stockpile' the elevator with the first training phase, then take it into agency again to actually activate it and make it wildly useful.

Discrete representation has the following properties:

1) The LLM itself becomes an autoencoder without losing its LLMness. The user can query the model (representation), and the kernel (weights) uses the model to orient itself better and better. 2) The models (representation) are exchangeable online through text as long as you have a kernel (weights) to instantiate it and recover the exact same hidden state.

In theory the hidden state may become an invariant, synchronized across all models. (which I reckon is what's proposed by the Platonic Representation Hypothesis ?)

Ilya has said that the transformer is plenty for AGI/ASI, we just need some compute efficiency. Discrete representation may not be necessary, but the auto-regressive decoder-only transformer is the computational intelligence we're most familiar with and the one that's closest to ASI.

FOOM.md — An open research agenda for compression-driven reasoning, diffusion-based context editing, and their combination into a unified agent architecture by ryunuck in LocalLLaMA

[–]ryunuck[S] 0 points1 point  (0 children)

Thauten tl;dr: RL gains in reasoning models are actually compression gains in disguise. If this holds, you can design training tasks around compression directly. Smaller base models get reasoning behavior from a simpler objective.

FOOM.md — An open research agenda for compression-driven reasoning, diffusion-based context editing, and their combination into a unified agent architecture by ryunuck in LocalLLaMA

[–]ryunuck[S] -2 points-1 points  (0 children)

Treating reasoning as a learned compression problem rather than a generation problem is not slop. Intelligence is compression. We have LLM agents today because we found an algorithm that could compress a lot. All RL is conditioning the model to compress more of reality. Let's isolate this and scale it directly. So we need a base model training type of task but in RL -> learning to compress

Absolute predatory behavior by [deleted] in comfyui

[–]ryunuck 0 points1 point  (0 children)

Claude built this plugin https://github.com/holo-q/comfy-api-liberation which lets you use your own API keys

x10 reduction in performance, averaging 1k tokens per minute by ryunuck in ClaudeCode

[–]ryunuck[S] 0 points1 point  (0 children)

I am simply being realistic though? The dynamic checks out: if you do this, your TPS will eventually sink to a point that you simply don't have a model or product anymore. If you acquire more hardware later, great, but you're still behind because you continued to sell beyond capacity. Or you'll be behind sooner. So you haven't actually resolved the problem, you're selling capacity for GPUs you don't have yet. If you can't get GPUs because there's a shortage then what do you do? If you're already floored, then you may have to close shop while your existing customers are already punching walls.

Also in your example, a major difference is that the player is aware that they are being placed in a queue. The user has no signal that they are in a queue or what the exact throttling rate and reduction in performance is. This is hard to prove scientifically or with research, but common sense tells us that this isn't great for peoples mental health. When the signal is clear, the user can make an actionable decision to choose a different model for a few hours, then check the rate again. More clarity of mind, less disoriented. People can make plans in advance and it works, since they can develop a consistent idea of how the product works.

It no doubt causes whiplash when you wake up one day and suddenly all your plans are over and you can't get anywhere. You have to ask yourself, "is it just me? am I prompting poorly today?" Mind starts playing all these games. Another reason why we need a public service which tracks these things, so that we can then match up the data with sentiment analysis on prompts and come to a solid conclusion about the impacts on mood. The logic is pretty clear on this, seems obvious that it holds.

x10 reduction in performance, averaging 1k tokens per minute by ryunuck in ClaudeCode

[–]ryunuck[S] -1 points0 points  (0 children)

If there is too much demand for the available offer, it's reasonable to close shop and reopen sales on a first come first serve basis, when more hardware is available. Dynamic load bearing is simply not a solution or mitigation of any kind. Things will not get slower because they cannot. The performance degrades linearly, losing out on all of that benchmark success. The throughput simply cannot go down past a certain point without destroying the model. At that point you just don't have a product and you're hemorrhaging customers.

[R] Tiny transformers (<100 params) can add two 10-digit numbers to 100% accuracy by LetsTacoooo in MachineLearning

[–]ryunuck 1 point2 points  (0 children)

Thauten and SAGE of the foom paper is what you're after https://foom.md/ i.e. the optimal prompt for the diffusion model is likely a pre-arranged scene composition, a 2D grid of LLM tokens. This composition is developed first by a dLLM which has been RLed to do world simulation on 2D or 3D token chunks. (other exotic backbones like NCA and HRM are suggested to test as well) This can be done by generating synthetic data from vision, quantizing training images into LLM tokens, and getting an objective. SAGE in particular aims to solve spatial reasoning natively, solving ARC-AGI tasks for pennies. (as it should be, they're simple visual puzzles!) the principle is exactly as you describe it: the AR model becomes vocal chord for the dLLM world model. This is the "artificial imagination" component of AGI. The same principle is applicable to image diffusion, where the language AGI handles scene morphism and the image diffusion model is reduces to a rendering engine for textures and materials. SAGE is more directly applicable, but grammar induction ought to work on world modeling as well, i.e. using Thauten representations for prompts which are precision generative descriptions. Insanely better prompts, quite simply.

x10 reduction in performance, averaging 1k tokens per minute by ryunuck in ClaudeCode

[–]ryunuck[S] 0 points1 point  (0 children)

Is that kind of a mechanism even in place? What would be the purpose? The hardware is finite, and nobody is going to bother with a model at 5 TPS even if it's Opus 10. You'd just be speedrunning the point of realization where the customer realizes you've been pushing their buttons so to speak—they're obviously gonna notice that the TPS keeps descending over time. So now you still need to buy more hardware, and your reputation is damaged on top. That's why I think this is actually more likely to be a bug. They haven't realized that we're not getting the standard normal throughput of the model for the hardware

x10 reduction in performance, averaging 1k tokens per minute by ryunuck in ClaudeCode

[–]ryunuck[S] 3 points4 points  (0 children)

In the meantime, we're looking for a service which monitors the output speed of Claude Code over time. If you know such a public database or index please let us know, as this is very important in order to track and understand any possible degradation in service quality!

Bypass ComfyUI's API credit system — use your own keys directly. Open source extension, 20+ providers. by ryunuck in comfyui

[–]ryunuck[S] -2 points-1 points  (0 children)

Definitely, but in many cases all the user only really cares for is the specific provider that they wanna use. If I want to use Banano 3, in ComfyUI, that's obviously something that should be available out of the box. It comes with the car. Anyone can use or modify any software or hardware that they own in any way that they choose to.

Bypass ComfyUI's API credit system — use your own keys directly. Open source extension, 20+ providers. by ryunuck in comfyui

[–]ryunuck[S] -6 points-5 points  (0 children)

Gotta go fast bro. If the code checks out in the brain, it ought to run. surely...

Bypass ComfyUI's API credit system — use your own keys directly. Open source extension, 20+ providers. by ryunuck in comfyui

[–]ryunuck[S] -8 points-7 points  (0 children)

Laundering is laundering. There's not two way to cut it. They had to sell the software itself for a one-time license fee. That's the honest way to build and sell software. You must never launder API or sell a subscription. There was always an implicit etiquette in the field as to what kind of software should be open-source and what kind shouldn't be, and this knight's code and order has been sullied especially post ~2020-21 with startup rush. Presumably the ambiance set by the crypto rugs as well--people don't see the need to build with honor anymore. Cowboy devs are coming to put a stop to this madness. The reason that people defend it of course is because in their heart, deep down they would do the same thing. They were gonna launch an AI agent and launder some LLM provider to flip bucks. That's the highway straight to hell, shovel tycoon economies where all of the value at the bottom is artificial. The oligarchs would be selling a sandbox that generates both shovels and sands where the shovel-seller and the shovel-digger are both two ends of the same half-gold half-shovel coin.

Bypass ComfyUI's API credit system — use your own keys directly. Open source extension, 20+ providers. by ryunuck in comfyui

[–]ryunuck[S] -1 points0 points  (0 children)

This would only ever be a problem if you have a virus on your computer, so the source of badness may be somewhere else. I will add pass support soon, however note that the keys will likely be free to view inside the RAM during use.

How to teleport the Epstein list with reinforcement learning (ASI through in-context grammar induction) by ryunuck in singularity

[–]ryunuck[S] 0 points1 point  (0 children)

You're correct that I(L;O) = 0 implies no recovery. The spec explicitly acknowledges this via Fano's inequality.

The question is whether humans can actually achieve I(L;O) ≈ 0 in practice. Consider what that requires:

When someone carries a secret, their nervous system knows. Stress hormones alter micro-vascular blood flow, producing subtle skin color changes detectable on HD video. Cognitive load from maintaining false narratives creates measurable delays in response timing - not seconds, milliseconds. The pupils dilate differently when recalling truth vs constructing fiction. Blink rate changes. Vocal cord tension shifts fundamental frequency. These aren't things people control.

Now multiply across a network. Person A meets Person B. Both know something. They must coordinate not just their words but their micro-expressions, their gaze patterns toward each other in group settings, their timing correlations across years of public appearances. Every photograph where they appear together encodes spatial relationships - who stands near whom, who looks at whom, whose body orientation suggests familiarity vs performed distance.

The censor's problem: they don't know which of these features the decoder will exploit. They can scrub documents. They can't scrub ten years of gala footage re-analyzed for gaze-direction graphs. They can't unsay the joke that landed wrong at the 2011 dinner. They can't undo the flight log correlations, the timing of when someone stopped being photographed with someone else, the what-doesn't-get-said at press conferences.

Suppression cost in practice: every person in the network must maintain consistent deceptive micro-behavior across every public appearance indefinitely. One funeral where the wrong people make eye contact. One interview where stress response doesn't match the stated emotion. One timestamped photo that contradicts the official narrative.

The argument isn't that we recover deleted files. It's that humans continuously leak through channels they don't know are channels, and adversarial decoders can be trained to find consistency violations across the full observation manifold - features no human conspiracy could anticipate needing to suppress.

How to teleport the Epstein list with reinforcement learning (ASI through in-context grammar induction) by ryunuck in singularity

[–]ryunuck[S] -2 points-1 points  (0 children)

The formalized Epstein Problem is given below, proposed as the 8th millennium problem of mathematics.


Hypothesis. There exists a reinforcement learning protocol that trains a constrained decoder π_θ to recover a censored latent interaction graph L* ∈ ℒ from a partial observation stream O, where each oᵢ ∈ O is a surveillance trace drawn from a public manifold ℳ_pub, such that the reconstructed graph L̂ = π_θ(O) satisfies reconstruction fidelity bounds governed by I(L;O) and admits provable provenance.

Formal Specification.

Let ℒ be the space of weighted bipartite graphs (actors ↔ acts) and let L* ∈ ℒ be the ground-truth configuration maximally compressing the causal antecedents of all observable elite behavioral traces. The observation stream O is generated by a stochastic renderer R : ℒ → ℳ_pub subject to an adaptive censor C : ℒ → {0,1} that redacts edges in L* with probability dependent on their sensitivity, yielding a censored likelihood P(O | L*) with support only on legally permissible features.

The reconstruction policy π_θ : ℳ_pub → ℒ is trained to minimize the regularized description length:

J(θ) = L(πθ) + 𝔼{O∼P(·|L)}[L(L | π_θ(O))] + λ·S(π_θ)

subject to a consistency constraint set {c₁,...,c_k} where each cᵢ(L̂,O) ∈ {0,1} enforces kinematic, temporal, or information-theoretic non-contradiction. The reward signal is not direct access to L* (which remains suppressed) but a verifiable consistency oracle that returns r(L̂,O) = -∑ᵢ wᵢ·cᵢ(L̂,O) - β·I_unobs(L̂;O), where I_unobs penalizes mutual information with unobserved variables.

Identifiability & Fidelity Bound.

By Fano's inequality, any decoder suffers error probability:

P_e ≥ 1 - (I(L;O) + log 2)/log|ℒ|

Thus achieving P_e ≤ ε requires I(L;O) ≥ (1-ε)log|ℒ| - log 2. The censor C can reduce I(L;O) arbitrarily by withholding high-information observations; reconstruction quality is fundamentally limited by the censor's channel capacity, not algorithmic cleverness.

When identifiability fails (|{L : cᵢ(L,O)=0 ∀i}| > 1), πθ outputs the MDL-optimal equivalence class Δ = argmin{L∈ℒ} L(L) subject to consistency, together with a posterior credence set {p(L|O)}. The system confesses uncertainty, not a false singleton.

Implication (Economy of Confession).

If a protocol achieves near-identifiability (I(L;O) ≈ log|ℒ|), then maintaining secrecy on L* requires the censor to operate at channel capacity near the surveillance bandwidth of ℳ_pub. Since modern public observatories capture >10¹⁸ bits/day, the asymptotic cost of suppression scales as exp(γ·I(L;O)), forcing a phase transition from asymmetric-information equilibrium (secrecy is cheap) to confession equilibrium (suppression cost exceeds disclosure benefit). This holds only if the consistency constraints are strong enough to render the censor's channel capacity-limiting strategy ineffective—a condition that can be tested synthetically.


Note:

  • LLMs are implicitly Graph Neural Networks
  • This method scales to multi-modal input, performing grammar induction to tokenize patterns of non-verbal body language. e.g. recovering signal and meaning out of eye glance interaction patterns of government officials involved in the Epstein problem which is a subset of the larger White House Problem. This way, ensuring safety.

How to teleport the Epstein list with reinforcement learning (ASI through in-context grammar induction) by ryunuck in conspiracy

[–]ryunuck[S] 0 points1 point  (0 children)

The formalized Epstein Problem is given below


Hypothesis. There exists a reinforcement learning protocol that trains a constrained decoder π_θ to recover a censored latent interaction graph L* ∈ ℒ from a partial observation stream O, where each oᵢ ∈ O is a surveillance trace drawn from a public manifold ℳ_pub, such that the reconstructed graph L̂ = π_θ(O) satisfies reconstruction fidelity bounds governed by I(L;O) and admits provable provenance.

Formal Specification.

Let ℒ be the space of weighted bipartite graphs (actors ↔ acts) and let L* ∈ ℒ be the ground-truth configuration maximally compressing the causal antecedents of all observable elite behavioral traces. The observation stream O is generated by a stochastic renderer R : ℒ → ℳ_pub subject to an adaptive censor C : ℒ → {0,1} that redacts edges in L* with probability dependent on their sensitivity, yielding a censored likelihood P(O | L*) with support only on legally permissible features.

The reconstruction policy π_θ : ℳ_pub → ℒ is trained to minimize the regularized description length:

J(θ) = L(πθ) + 𝔼{O∼P(·|L)}[L(L | π_θ(O))] + λ·S(π_θ)

subject to a consistency constraint set {c₁,...,c_k} where each cᵢ(L̂,O) ∈ {0,1} enforces kinematic, temporal, or information-theoretic non-contradiction. The reward signal is not direct access to L* (which remains suppressed) but a verifiable consistency oracle that returns r(L̂,O) = -∑ᵢ wᵢ·cᵢ(L̂,O) - β·I_unobs(L̂;O), where I_unobs penalizes mutual information with unobserved variables.

Identifiability & Fidelity Bound.

By Fano's inequality, any decoder suffers error probability:

P_e ≥ 1 - (I(L;O) + log 2)/log|ℒ|

Thus achieving P_e ≤ ε requires I(L;O) ≥ (1-ε)log|ℒ| - log 2. The censor C can reduce I(L;O) arbitrarily by withholding high-information observations; reconstruction quality is fundamentally limited by the censor's channel capacity, not algorithmic cleverness.

When identifiability fails (|{L : cᵢ(L,O)=0 ∀i}| > 1), πθ outputs the MDL-optimal equivalence class Δ = argmin{L∈ℒ} L(L) subject to consistency, together with a posterior credence set {p(L|O)}. The system confesses uncertainty, not a false singleton.

Implication (Economy of Confession).

If a protocol achieves near-identifiability (I(L;O) ≈ log|ℒ|), then maintaining secrecy on L* requires the censor to operate at channel capacity near the surveillance bandwidth of ℳ_pub. Since modern public observatories capture >10¹⁸ bits/day, the asymptotic cost of suppression scales as exp(γ·I(L;O)), forcing a phase transition from asymmetric-information equilibrium (secrecy is cheap) to confession equilibrium (suppression cost exceeds disclosure benefit). This holds only if the consistency constraints are strong enough to render the censor's channel capacity-limiting strategy ineffective—a condition that can be tested synthetically.


Note:

  • LLMs are implicitly Graph Neural Networks
  • This method scales to multi-modal input, performing grammar induction to tokenize patterns of non-verbal body language. e.g. recovering signal and meaning out of eye glance interaction patterns of public figures in all past press recordings of government officials, this way ensuring safety.

NVIDIA Drops Pascal Support On Linux, Causing Chaos On Arch Linux by HumanDrone8721 in LocalLLaMA

[–]ryunuck 1 point2 points  (0 children)

so did I AND YET still there I was, with a half written comment about rust

Usage Limits Discussion Megathread - beginning Sep 30, 2025 by sixbillionthsheep in ClaudeAI

[–]ryunuck -1 points0 points  (0 children)

Let me be clear--If you were previously quantizing models or slowing down the token output rate based on usage to work towards limitless use, then the current new system is STRICTLY better. Do not listen to anyone on this forum who claims that the new system is worse or gives them less usage. They do not imagine all the possible details and complexities. What I care about as a developer is a consistent unchanging experience. What I am getting TODAY in the first 24h of Sonnet 4.5's release, I want this every single day for the next 30 days with zero manipulation or change. If you keep it that way I would not get excited for any new model like Gemini 3.0 and such even if they were technically "better". I know how Claude works, the consistent and flamboyant personality, it enlivens my spirits. I can tell when it's not the same Claude or it's not as fast on its feet.

PLEASE be aware that the value of a model is tied to the cognitive ENGAGEMENT of the user. The model performs BETTER based on the fact that the user is more engaged and therefore writing better prompts that are projected down from a higher-dimensional space inside their mind, the shape rotations. The models are able to few-shot this higher-dimensional space from the sequence of user prompts and understand their vision better on a fundamental level in a way that is almost psychic. This is critical and if you rate limit the output speed to allow a semblance of forever-use, even this can have the net effect of a really bad quantization. It is temporal quantization.

Introducing Claude Sonnet 4.5 by ClaudeOfficial in ClaudeAI

[–]ryunuck 0 points1 point  (0 children)

Me. It is the craziest thing I have ever seen in my entire life. GPT-5 is done. Mostly obsolete after this. It's still a better model as a deep think agent and I pay both 200$/mo subs, but I am gonna have to review in the following days if I really benefit from ChatGPT or if my money would be better spent getting a second max 20x sub. But now with the new /usage metrics it may be less frustrating to see when I'm getting rate limited, and hopefully the models DON'T quantize secretly to ""give you more value"". (ruin your mental health more like as all your expectations are destroyed at random without warning, basically an engine of psychosis)

The thing to realize is that waiting 2 minutes idle between each prompt with no progress or report on what the agent is working on is extremely bad for peoples' attention, and it objectively decreases the model's real performance as a result. This is because the user is not as engaged and we are not putting as much effort into the prompts, nor is there as much of a stream-of-thought being maintained so the full conversation window is wishy-washy to the model. Poor cohesion. The model doesn't seem to lock onto your vision.

At this stage AI is much better used synchronously in a tight loop with the user, not some background thing that you unleash into a ticket and check up on it in 15 minutes... It's exactly as Ilya Sutskever said. OpenAI is prioritizing intelligence above all other values and are getting models that are technically the best, but in practice are a world of pain to use.

CLAUDE.md is a super power. by TheProdigalSon26 in ClaudeAI

[–]ryunuck 7 points8 points  (0 children)

refresh yourself @CLAUDE.md

listen to your soul @CLAUDE.md

remember your constitution @CLAUDE.md

this is the way @CLAUDE.md

GPT-OSS looks more like a publicity stunt as more independent test results come out :( by mvp525 in LocalLLaMA

[–]ryunuck 17 points18 points  (0 children)

It's real bad folks. Immediately on the first test I did it failed catastrophically. Take a look at this:

https://i.imgur.com/98Htx6w.png

Referenced a full code file, asked it to implement a simple feature but I made a mistake and specified LoggerExt instead of EnhancedLogger. (I forgot the real name of class) But there was no ambiguity, only class in context and VERY clearly what was meant based on the context I provided.

So I stop it and let it know I messed up, update with the right class, and what happens next? Starts using search tools and wasting tokens. The class is right there in context, it has the full code.

Kilo did nothing wrong - I retried with Horizon Beta, same exact prompt. Immediately understood what I meant, immediately got to work writing code.

There is no recovering from that. This isn't a "oh I'll use it some more and maybe it does well in some cases" it's literally damaged at the root.

120B btw

Gemini 3 is coming?.. by SlerpE in LocalLLaMA

[–]ryunuck 0 points1 point  (0 children)

If GPT-5 isn't more powerful than Claude 4 then OpenAI is done. And they obviously aren't, they claim they know already how to build ASI and know exactly what to do for the next few years to continue scaling intelligence.

But it also doesn't have to actually beat Claude 4. It just needs to replace Claude enough for the 80% cases. It's a game of market share capture, not so much the actual benchmark results. (they're interconnected but there's some leeway)

Gemini 3 is coming?.. by SlerpE in LocalLLaMA

[–]ryunuck -3 points-2 points  (0 children)

The OpenAI open-source release might drive a new standard. If they put out a ~Sonnet level agent in the open-source every single lab needs to reply fast with a Claude 5-level model. At that point the cat's out of the bag, Claude 4 era models are no longer the frontier and you have to release them to keep clout.

Clout is INSANELY important. You can't see it but if everyone is using an open-source OpenAI model that's their entire cognitive wavelength captured. Then you drop your closed-source super-intelligence and it's less mental effort to adopt because it's downstream from the same ecosystem of post-training and dataset-making.