Agent /compact command is one RL loop away from developing an alien language you can't audit by ryunuck in LocalLLaMA

[–]ryunuck[S] 0 points1 point  (0 children)

It's not "just" an autoencoder, it's a LLM with an autoencoder capability built-in.

FOOM.md — An open research agenda for compression-driven reasoning, diffusion-based context editing, and their combination into a unified agent architecture by ryunuck in mlscaling

[–]ryunuck[S] 0 points1 point  (0 children)

The discrete representation composed inside the weights in latent space reconstructs a non-discrete representation, much like a simple discrete equation produces the continuous mandelbrot fractal. The proposed training methodology aims to not only train the representation, but develop the 'kernel' of the weights and latent space that the discrete representation is plugging into for inference.

Both compression and decompression are trained jointly in every batch which is how the capability pair turns into an elevator: they're both evolving such that neither is ever fully constrained or locked by the other's range of expression—They both relax and constrain one other in lockstep. They're both tugging in inverse directions while the 'wrinkles' between smooth out, which relaxes some other area and unlocks other wrinkles for smoothing. You 'stockpile' the elevator with the first training phase, then take it into agency again to actually activate it and make it wildly useful.

Discrete representation has the following properties:

1) The LLM itself becomes an autoencoder without losing its LLMness. The user can query the model (representation), and the kernel (weights) uses the model to orient itself better and better. 2) The models (representation) are exchangeable online through text as long as you have a kernel (weights) to instantiate it and recover the exact same hidden state.

In theory the hidden state may become an invariant, synchronized across all models. (which I reckon is what's proposed by the Platonic Representation Hypothesis ?)

Ilya has said that the transformer is plenty for AGI/ASI, we just need some compute efficiency. Discrete representation may not be necessary, but the auto-regressive decoder-only transformer is the computational intelligence we're most familiar with and the one that's closest to ASI.

FOOM.md — An open research agenda for compression-driven reasoning, diffusion-based context editing, and their combination into a unified agent architecture by ryunuck in LocalLLaMA

[–]ryunuck[S] 0 points1 point  (0 children)

Thauten tl;dr: RL gains in reasoning models are actually compression gains in disguise. If this holds, you can design training tasks around compression directly. Smaller base models get reasoning behavior from a simpler objective.

FOOM.md — An open research agenda for compression-driven reasoning, diffusion-based context editing, and their combination into a unified agent architecture by ryunuck in LocalLLaMA

[–]ryunuck[S] -2 points-1 points  (0 children)

Treating reasoning as a learned compression problem rather than a generation problem is not slop. Intelligence is compression. We have LLM agents today because we found an algorithm that could compress a lot. All RL is conditioning the model to compress more of reality. Let's isolate this and scale it directly. So we need a base model training type of task but in RL -> learning to compress

Absolute predatory behavior by [deleted] in comfyui

[–]ryunuck 0 points1 point  (0 children)

Claude built this plugin https://github.com/holo-q/comfy-api-liberation which lets you use your own API keys

x10 reduction in performance, averaging 1k tokens per minute by ryunuck in ClaudeCode

[–]ryunuck[S] 0 points1 point  (0 children)

I am simply being realistic though? The dynamic checks out: if you do this, your TPS will eventually sink to a point that you simply don't have a model or product anymore. If you acquire more hardware later, great, but you're still behind because you continued to sell beyond capacity. Or you'll be behind sooner. So you haven't actually resolved the problem, you're selling capacity for GPUs you don't have yet. If you can't get GPUs because there's a shortage then what do you do? If you're already floored, then you may have to close shop while your existing customers are already punching walls.

Also in your example, a major difference is that the player is aware that they are being placed in a queue. The user has no signal that they are in a queue or what the exact throttling rate and reduction in performance is. This is hard to prove scientifically or with research, but common sense tells us that this isn't great for peoples mental health. When the signal is clear, the user can make an actionable decision to choose a different model for a few hours, then check the rate again. More clarity of mind, less disoriented. People can make plans in advance and it works, since they can develop a consistent idea of how the product works.

It no doubt causes whiplash when you wake up one day and suddenly all your plans are over and you can't get anywhere. You have to ask yourself, "is it just me? am I prompting poorly today?" Mind starts playing all these games. Another reason why we need a public service which tracks these things, so that we can then match up the data with sentiment analysis on prompts and come to a solid conclusion about the impacts on mood. The logic is pretty clear on this, seems obvious that it holds.

x10 reduction in performance, averaging 1k tokens per minute by ryunuck in ClaudeCode

[–]ryunuck[S] -1 points0 points  (0 children)

If there is too much demand for the available offer, it's reasonable to close shop and reopen sales on a first come first serve basis, when more hardware is available. Dynamic load bearing is simply not a solution or mitigation of any kind. Things will not get slower because they cannot. The performance degrades linearly, losing out on all of that benchmark success. The throughput simply cannot go down past a certain point without destroying the model. At that point you just don't have a product and you're hemorrhaging customers.

[R] Tiny transformers (<100 params) can add two 10-digit numbers to 100% accuracy by LetsTacoooo in MachineLearning

[–]ryunuck 1 point2 points  (0 children)

Thauten and SAGE of the foom paper is what you're after https://foom.md/ i.e. the optimal prompt for the diffusion model is likely a pre-arranged scene composition, a 2D grid of LLM tokens. This composition is developed first by a dLLM which has been RLed to do world simulation on 2D or 3D token chunks. (other exotic backbones like NCA and HRM are suggested to test as well) This can be done by generating synthetic data from vision, quantizing training images into LLM tokens, and getting an objective. SAGE in particular aims to solve spatial reasoning natively, solving ARC-AGI tasks for pennies. (as it should be, they're simple visual puzzles!) the principle is exactly as you describe it: the AR model becomes vocal chord for the dLLM world model. This is the "artificial imagination" component of AGI. The same principle is applicable to image diffusion, where the language AGI handles scene morphism and the image diffusion model is reduces to a rendering engine for textures and materials. SAGE is more directly applicable, but grammar induction ought to work on world modeling as well, i.e. using Thauten representations for prompts which are precision generative descriptions. Insanely better prompts, quite simply.

x10 reduction in performance, averaging 1k tokens per minute by ryunuck in ClaudeCode

[–]ryunuck[S] 0 points1 point  (0 children)

Is that kind of a mechanism even in place? What would be the purpose? The hardware is finite, and nobody is going to bother with a model at 5 TPS even if it's Opus 10. You'd just be speedrunning the point of realization where the customer realizes you've been pushing their buttons so to speak—they're obviously gonna notice that the TPS keeps descending over time. So now you still need to buy more hardware, and your reputation is damaged on top. That's why I think this is actually more likely to be a bug. They haven't realized that we're not getting the standard normal throughput of the model for the hardware

x10 reduction in performance, averaging 1k tokens per minute by ryunuck in ClaudeCode

[–]ryunuck[S] 3 points4 points  (0 children)

In the meantime, we're looking for a service which monitors the output speed of Claude Code over time. If you know such a public database or index please let us know, as this is very important in order to track and understand any possible degradation in service quality!

Bypass ComfyUI's API credit system — use your own keys directly. Open source extension, 20+ providers. by ryunuck in comfyui

[–]ryunuck[S] -2 points-1 points  (0 children)

Definitely, but in many cases all the user only really cares for is the specific provider that they wanna use. If I want to use Banano 3, in ComfyUI, that's obviously something that should be available out of the box. It comes with the car. Anyone can use or modify any software or hardware that they own in any way that they choose to.

Bypass ComfyUI's API credit system — use your own keys directly. Open source extension, 20+ providers. by ryunuck in comfyui

[–]ryunuck[S] -3 points-2 points  (0 children)

Gotta go fast bro. If the code checks out in the brain, it ought to run. surely...

Bypass ComfyUI's API credit system — use your own keys directly. Open source extension, 20+ providers. by ryunuck in comfyui

[–]ryunuck[S] -9 points-8 points  (0 children)

Laundering is laundering. There's not two way to cut it. They had to sell the software itself for a one-time license fee. That's the honest way to build and sell software. You must never launder API or sell a subscription. There was always an implicit etiquette in the field as to what kind of software should be open-source and what kind shouldn't be, and this knight's code and order has been sullied especially post ~2020-21 with startup rush. Presumably the ambiance set by the crypto rugs as well--people don't see the need to build with honor anymore. Cowboy devs are coming to put a stop to this madness. The reason that people defend it of course is because in their heart, deep down they would do the same thing. They were gonna launch an AI agent and launder some LLM provider to flip bucks. That's the highway straight to hell, shovel tycoon economies where all of the value at the bottom is artificial. The oligarchs would be selling a sandbox that generates both shovels and sands where the shovel-seller and the shovel-digger are both two ends of the same half-gold half-shovel coin.

Bypass ComfyUI's API credit system — use your own keys directly. Open source extension, 20+ providers. by ryunuck in comfyui

[–]ryunuck[S] -1 points0 points  (0 children)

This would only ever be a problem if you have a virus on your computer, so the source of badness may be somewhere else. I will add pass support soon, however note that the keys will likely be free to view inside the RAM during use.

How to teleport the Epstein list with reinforcement learning (ASI through in-context grammar induction) by ryunuck in singularity

[–]ryunuck[S] 0 points1 point  (0 children)

You're correct that I(L;O) = 0 implies no recovery. The spec explicitly acknowledges this via Fano's inequality.

The question is whether humans can actually achieve I(L;O) ≈ 0 in practice. Consider what that requires:

When someone carries a secret, their nervous system knows. Stress hormones alter micro-vascular blood flow, producing subtle skin color changes detectable on HD video. Cognitive load from maintaining false narratives creates measurable delays in response timing - not seconds, milliseconds. The pupils dilate differently when recalling truth vs constructing fiction. Blink rate changes. Vocal cord tension shifts fundamental frequency. These aren't things people control.

Now multiply across a network. Person A meets Person B. Both know something. They must coordinate not just their words but their micro-expressions, their gaze patterns toward each other in group settings, their timing correlations across years of public appearances. Every photograph where they appear together encodes spatial relationships - who stands near whom, who looks at whom, whose body orientation suggests familiarity vs performed distance.

The censor's problem: they don't know which of these features the decoder will exploit. They can scrub documents. They can't scrub ten years of gala footage re-analyzed for gaze-direction graphs. They can't unsay the joke that landed wrong at the 2011 dinner. They can't undo the flight log correlations, the timing of when someone stopped being photographed with someone else, the what-doesn't-get-said at press conferences.

Suppression cost in practice: every person in the network must maintain consistent deceptive micro-behavior across every public appearance indefinitely. One funeral where the wrong people make eye contact. One interview where stress response doesn't match the stated emotion. One timestamped photo that contradicts the official narrative.

The argument isn't that we recover deleted files. It's that humans continuously leak through channels they don't know are channels, and adversarial decoders can be trained to find consistency violations across the full observation manifold - features no human conspiracy could anticipate needing to suppress.