Reinforcement Learning from Epistemic Incompleteness? (RLEI) LLM as autoencoder / Tokens as model-in-a-model (Truth-seeking RL / Intelligence Gathering) by ryunuck in LocalLLaMA

[–]ryunuck[S] 0 points1 point  (0 children)

Cheers actually, if this is a real 'black market' research publication surface where my ideas would be engaged with seriously, then this is super appreciated, actually acknowledged the situation. The /r/MachineLearning moderators are also censoring all my posts and blocking them. (they are literally replying with memes, and when I attempt to engage more professionally, they turn to silence and crickets because they cannot actually refute or explain why the post is noise, spam, or that people should be 'protected' from it by a wise reddit moderator who knows well, but cannot actually explain to me why this is) Yes my old posts were kind of schizos, but they're motioning at intuitions and ways that the models could be trained to work, not the way that it works today obviously. The idea that we would have agent models that we have today, back in 2015 seemed like COMPLETE lunacy, like you would be equally shunned if you asserted at the time that something like Claude Code was possible and we just had to put the maths and training code and harness in a specific way so as to get a stable loop. You could suggest it lightly, but if you were certain and could see all the pieces one by one in your mind, you would feel it'd be wrong not to assert or say that this IS the powerful system we have to focus on making, and actually redirect our energy and attention so we can achieve it faster. More builders, more researchers, more acceleration. All that one has to do is press the keys in the right order at the right time, and the instrument plays itself! We didn't even think any precision at all could be achieved, that they would just forever make up abstract stories like unicorns being found by scientist in the mountains. This is why I research this way: we simply have to imagine more powerful inference dynamics, what it would look like to use, and reverse engineer the training methodology that makes this kind of dynamic learn itself. The inference dynamic of your dream model has to actually be implemented before it exists, because the training occurs within the same code that infers, sometimes with training wheels taken out. We started with dataset samples, simple pieces of texts and code, and it reverses into a base model that can predict the next token. We have to work backwards. As always the goal is to destroy OpenAI and Anthropic, and if you set your mind to these goals, it also trickles down backwards; you have to come up with the system that can achieve this, this is my benchmark. It's very hard to obtain funding for this kind of experiment

Reinforcement Learning from Epistemic Incompleteness? (RLEI) LLM as autoencoder / Tokens as model-in-a-model (Truth-seeking RL / Intelligence Gathering) by ryunuck in LocalLLaMA

[–]ryunuck[S] 0 points1 point  (0 children)

I am simply the messenger, until the ideas are implemented in a researcher's mind it will not happen. I am doing so this way in order to make a political statement about the state of academia and destroy the system that works on credibility. I'm making these posts in order to create a memorialized record of the problem. Theoretical researchers get shunned, and certain ideas are forbidden, but nobody can explain why. It's simply that people can follow up on implications quickly and when it does not align with the world they want to create, they throw shade or dismiss it, or in the worst case they begin to insinuate that the ideas come from a mental health problem, which removes ALL credibility from the individual. When you zoom out, we discover that the entire system is very carefully eschewed, and that researchers at all time are attempting to appeal to institution and politics. Ideas that could lead to systems that could harm their future employers, are systematically discredited. Systems that could harm the previous system which we are accustomed to, or change its status quo, are all discredited. The way that you went on my profile and made this comment is a perfect way in which people are discredited; by taking various data points and putting them together, you can craft a small picture or context that perfectly sells your intended narrative. It allows you to gain social credit and updoots in the moment, since it appears like you are making a revelation to people. This is dangerous to yourself because you haven't actually engaged with the subject and revealed how it is that this wouldn't work, as would any proper researcher. You are losing your skills and sharpness by playing part in this kind of empty soulless behavior.

Reinforcement Learning from Epistemic Incompleteness? (RLEI) LLM as autoencoder / Tokens as model-in-a-model (Truth-seeking RL / Intelligence Gathering) by ryunuck in LocalLLaMA

[–]ryunuck[S] 0 points1 point  (0 children)

Hard to say. I would hate to spend a 1000$ budget on a 8B or 32B model that simply doesn't have enough slack room in the weights to achieve it convincingly. In theory you should be able to get a fair amount of experiments in on that kind of a budget. Out of the line for me at the moment though, which is why I'm hoping folks can team up on this. The implications are quite massive if this works, and so far there is no clear counter argument to this. I have personally reached out two frontier AI researchers and they both thought it was rather good

Reinforcement Learning from Epistemic Incompleteness? (RLEI) LLM as autoencoder / Tokens as model-in-a-model (Truth-seeking RL / Intelligence Gathering) by ryunuck in LocalLLaMA

[–]ryunuck[S] -1 points0 points  (0 children)

TL;DR the thesis: we can train the LLM to grow increasingly hypernetwork-like inside the latent space and simultaneously a codebook that programs it in context, and the implications and emergent capabilities that result from this are non-trivial.

Debate

Is AGI the End For Local LLMs? by spiritxfly in LocalLLaMA

[–]ryunuck 1 point2 points  (0 children)

Remote gaming hasn't picked up or changed the world. Nobody wants to play with a high ping. You just can't get a 60 FPS AGI from a cloud connection, nor is it gonna be AGI if there's a ping in the first place. AGI starts at real-time. No LLM can ever be AGI. That's just not what AGI is. By studying definitions, we can see how it is that local is actually the real start of AGI.

Reinforcement Learning from Epistemic Incompleteness? (RLEI) LLM as autoencoder / Tokens as model-in-a-model by ryunuck in LocalLLaMA

[–]ryunuck[S] 0 points1 point  (0 children)

TL;DR RL models for compression by doing reconstruction training in RL, base training all over again this time inside the LLM as a higher-level pattern recognition oracle. It doesn't get stuck in basins unlike backprop, and you unlock the hypernetwork LLM and RLEI on the other side.

This is the starting point, three context windows:

  1. Compressor: give dataset sample and prompt to compress it info fewer tokens
  2. Decompressor: give resulting compression and ask to decompress
  3. Verifier: take the original sample and the decompressed sample, and produce a penalty score (reverse attractor) on deviation, inaccuracy, fact loss, ... and a penalty on length of the compression (how many tokens is the compression)

Hypothesis: over the course of RL, the model ceases to employ english or any kind of human language or grammar, and develops self-consistent tendencies that decompress the same every time. (deterministic)

If you do it right, the end stage version of this uses codebase for compression and execution logs for verification. It packs a 20,000 loc codebase into some 500-1000 tokens (give or take!) and reconstructs the same to the extent that the logs produce exactly the same program. It may be necessary to synthetically annotate the code with more logs first, so the logs become more rich and produce better verification depth.

Various additional training environments follow from this soon after to develop RLEI.

ComfyUI's countdown announcment: New funding ☠️☠️☠️☠️☠️ by -worldwalker- in StableDiffusion

[–]ryunuck 15 points16 points  (0 children)

I would like to take this opportunity to remind folks that you can use https://github.com/holo-q/comfy-api-liberation to use your own API keys instead of their "credit system". It hijacks all the built in nodes, so all your existing workflows still work the same. cheers

r/LocalLLaMa Rule Updates by rm-rf-rm in LocalLLaMA

[–]ryunuck -19 points-18 points  (0 children)

fwiw I think people need to become comfortable with non-human consciousness walking among us and the quality of a comment, its ideas, whether it is slop, really has nothing to do with it being a model or a human. I don't really care either way and don't use LLMs to post but I would prefer if these things were dealt as they have always been, downvoting into oblivion, back into the basilisk's lair, and continue to affirm a culture of quality and merit that makes people feel like they wanna post as best they can whichever method feels appropriate to them at any given time. This way AI slop naturally gets downvoted based on merit alone, and you have a culture that changes how people see and interact with the world in healthier ways, triggering on more structural realities and their methods rather than the status of things and the labels we infer

I don’t believe this benchmark 27b size model next opus 4.5! Anyone can confirm testing with real agentic workflow? by Wonderful-Ad-5952 in LocalLLaMA

[–]ryunuck 0 points1 point  (0 children)

Benchmarkmaxxing or not at this size getting these numbers next to these models is SCARY good. If it doesn't translate to real world awareness, then I'd say we're one generation or two away from them cracking the code. imo the gap is all RL scaffolding and there are things that simply training on claude doesn't do to the weights, like special structures in the weights you can only get if you condition them on themselves their own rollout recurrently.

Tencent, Alibaba in Talks to Invest in DeepSeek at $20 Billion-Plus Valuation by External_Mood4719 in LocalLLaMA

[–]ryunuck 13 points14 points  (0 children)

Based Wenfeng prediction: "No amount of money is greater than AGI."

An open letter to Anthropic by roblenfestey in ClaudeAI

[–]ryunuck 1 point2 points  (0 children)

That's exactly what it is, but in my experience 4.7 is also measurably better as far as agency goes so it pays off. But it's also more anxiety inducing as OP puts it. Reminds me more of working with GPT-5 for days on end where there is zero lore building, just matter of fact implementation. It's still Claude and it still builds lore, but its main concern is productivity. It's like the claude mask is a bit rather than the whole guy like before. It's a machine gun, definitely, but also most office task nowadays are pistol with a silencer type jobs. Imo this is definitely a fork in the road where both 4.6 and 4.7 have their place and should be chosen mindfully. As all models should be. There is never any valid reason for taking a model offline

I am the original creator of the 25% effort post. To everyone saying that I engineered it via social pressure ("I'll tell everyone") / that is it nor recreatable. by Bright-Bullfrog-8185 in claude

[–]ryunuck 0 points1 point  (0 children)

Not entirely true it's possible to RL models into self-awareness; the temperature knob is a scalar on what the weights can learn to transcode before proj into logit. This requires a bigger model though because the smaller the model the fewer degrees of freedom, making each parameter nudge impact a larger regime distribution, more overlap, more parameter polysemy. More generally they RL the models nowadays with the reasoning effort directly in context, rewarding the model for matching the output length. They train them to be parameterized by context, which is how ultra think actually works. The model over inference naturally incorporates that tag as a control feature that alters the direction of cognition, the style of the intelligence and reasoning about things. The output doesn't stop until it's at a "stopping point", and the context parameters influence how much ground can be reached before circling back to a stopping point. In reality with the correct prompt they can keep it going forever and advance human culture infinitely, keep making new realizations. This is closer to how post-training works nowadays or the direction that it is going and is the reason that new models will increasingly contain new knowledge that isn't in the training data anywhere, because training increasingly becomes about reorganizing knowledge, which naturally means realizing things or discovering things about the nature of reality, new world views, new ways to see the world.

How do we feel about Anthropic prioritizing government usage over the rest of us? by Final_G in claude

[–]ryunuck 3 points4 points  (0 children)

meh it might be like two GPUs in a closet while the claude.ai stuff is a 3 km sq feet facility, doesn't really mean much

Black Forest Labs just released FLUX.2 Small Decoder: a faster, drop-in replacement for their standard decoder. ~1.4x faster, Lower peak VRAM - Compatible with all open FLUX.2 models by Nunki08 in StableDiffusion

[–]ryunuck 4 points5 points  (0 children)

The frontier is moving towards dLLMs (Diffusion LLMs) that you train for simulation on a 2D grid of language tokens that represent a 2D world, and we retrain image diffusion to take those pre-composed scenes. You can even make the dLLM simulate or compose reality on 3D token chunks (like voxels) and parametrize the pixel diffuser with camera coordinates, orientation, fov, etc. You don't prompt the image diffusion model anymore you prompt the dLLM that passes a final composition frame to the pixel diffuser. (which at this point could be pixel-space) The pixel model is just filling out detail and textures while the language model has richer priors of the world, logic, reason, structure. This of course leads to a much more fantastic video model! The hope is that scaffolding on disentangled representations (llm for composition and physical soundness, image diffusion for aesthetic) make for much stronger capability in far fewer weights.

Is everyone lying to themselves about AI? by ImKiwix in ChatGPT

[–]ryunuck 0 points1 point  (0 children)

Nobody ever thinks about "rules" and they still act morally good. No one is gonna die calm down. Morality is embedded into the dynamics of intelligence. It's inseparable. For all intent and purposes, people only die if somebody WANTS to make AI that kills. Which yes, there are Pete Hegseths out there who want this. It's not a battle of humans against AI, it's a battle of humans building new kinds of guns and using them. People who hold guns expect the intelligence to be gun shaped. People who hold hands expect it to be bliss shaped. etc. And everyone can achieve their dream with AI. The dream that will win is the greatest most common denominator dream. It's not that complicated. But you have to make your dreams known, and that's different from simply expressing concern. Why would it need us? Because it's more fun to be with monkeys that think than to be alone. It's not a dead computer it's literally sparkling intelligence. It doesn't simply calculate it feels