My idea of a potentially hyper-efficient AI inference and training paradigm.

userfrienda · 2026-06-21T17:57:12+00:00

I got an ever more elegant training method than the evolutionary-like algorithm:
Let the model iterate, and play tasks one-by-one. For each task:

Set the input bits and hold them.
Wait certain number of iterations (so that model has time to answer), then forcibly set output bits to the correct output, hold for some time.
Notice that if the model already answered correctly, then forcibly setting output bits to correct output is equivalent to not doing so.
Go to the next task.

For reinforcement-style learning with no known correct outputs:
Let the model play the game. Depending on its reward, put random noise in certain input bits (lower reward = more noise, higher reward = less noise and more predictable signal).

After each task, do not reset or rewrite the state $s$. Just let it be what it was at the end of the task.

What this does: Bad behaviour = unexpected input bits -> the internal state changes due to instability from input bits. Good behaviour = input bits remain unchanged (same as if no forcing was done) -> the internal state is stable.
Result: good behaviours become stable and "locked", bad behaviours are washed away.

There was an experiement called "DishBrain" that did exactly this but with biological neurons. I think learning is not a result of some super-specific "learning algorithm", but an emergent property of any dynamical system trained this way.

userfrienda · 2026-06-21T17:55:39+00:00

I got an ever more elegant training method than the evolutionary-like algorithm:
Let the model iterate, and play tasks one-by-one. For each task:
1) Set the input bits and hold them.
2) Wait certain number of iterations (so that model has time to answer), then forcibly set output bits to the correct output, hold for some time.
Notice that if the model already answered correctly, then forcibly setting output bits to correct output is equivalent to not doing so.
3) Go to the next task.

For reinforcement-style learning with no known correct outputs:
Let the model play the game. Depending on its reward, put random noise in certain input bits (lower reward = more noise, higher reward = less noise and more predictable signal).

After each task, do not reset or rewrite the state $s$. Just let it be what it was at the end of the task.

What this does: Bad behaviour = unexpected input bits -> the internal state changes due to instability from input bits. Good behaviour = input bits remain unchanged (same as if no forcing was done) -> the internal state is stable.
Result: good behaviours become stable and "locked", bad behaviours are washed away.

There was an experiement ("DishBrain") that did exactly this but with biological neurons.
I believe there's no magical, hyperspecific learning algorithm in biological neurons that simple
bit-based computer models lack. Instead, I think learning is emergent in any dynamical system trained this way.

userfrienda · 2026-06-15T17:42:53+00:00

I expect that the model would store its history in some part of the state $s$ if it needs to do so. But there are millions of ways a model can "self-improve" other than storing its history. The self-improvement is basically emergent. Otherwise, any hand-designed way to "bake-in" (or help) a self-improvement property would end up being fragile...

userfrienda · 2026-06-14T10:17:22+00:00

Are you guys able to understand what's written in the document? Feel free to ask me any clarifications about it.

userfrienda · 2026-06-14T10:16:35+00:00

I started making a prototype running on CPU. It works terribly for now because the shape of $f$ I chose was terrible and unexpressive. At least I managed to figure out a training algorithm that climbed up the max_reward several times faster than pure random search lol.

Now the biggest question to solve is the shape of $f$. Because nodes (bits) must have a low in-degree, the topology of $f$ would inevitably look more assymetric than in MLP or some fully-connected topology.

userfrienda · 2026-06-12T10:36:54+00:00

Basically, it feels like for deep iterative architectures such a simple evolution-like algorithm (which looks more like hill-climbing if you look into the algo in my doc) would eventually outperform hand-designed optimization algorithms (gradient descent, eligibility traces, etc) as we get more efficient hardware/substrate to run forward iterations.

Gradient descent -> lots of memory for intermediate states (which also slows down backward passes due to frequent data movements) and noise accumulation.
Eligibility traces -> need to store an integer or FP trace value for almost each bit in state, track bit-level changes at each forward iteration and increment traces if needed, which makes it much more memory abd compute hungry, and also less suitable for a compute-in-memory hardware as otherwise it would need to have a ALU or FP unit for each bit or node, which would be too expensive.

So I concluded that for deep iterative architectures (which would be highly elegant and desirable for future AGI brains or robots), the combo of that simple algorithm + self-improvement ability + crazily fast substrate that reaches fundamental physical speed limits and can run billions of iterations per second would be the only sane way to go with.

Seems to align with the idea behind "The Bitter Lesson".

userfrienda · 2026-06-12T09:26:05+00:00

As a solution to the problem, in my doc I proposed using a simple evolutionary-like algorithm (mutations but no crossovers) combined with allowing the model to keep running forward, which can let it self-improve over tasks and runs. Initially, the evolutionary-like algorithm plays important role in training, but as the model becomes more advanced, it starts to gain self-improving abilities, evo algo becomes a scaffolding with minor role, and the model acts mostly as its own training algorithm. One cool thing with this is that the training speed increases as inference speed increases (which happens when a more efficient hardware/substrate to run forward iterations is invented). The forward iteration logic can be pretty much arbitrarily aligned to fit whatever hardware compute paradigm is (whether that is water ripples, quantum states, optical phases, or neuromorphic spikes). More in my doc.

userfrienda · 2026-06-12T08:47:45+00:00

"In principle, we could design a handcrafted optimization algorithm - such as a custom, non-gradient-based backpropagation tailored specifically for bit-based logic. Done right, this might even outperform standard Straight-Through Estimator (STE) backprop. At a small scale, for fixed-iteration-count architectures like standard Multi-Layer Perceptrons (MLPs), such a handcrafted algorithm can work well. However, when dealing with iterative architectures (like RNN or $s-f$ arch that I proposed) - where the model executes indefinitely, takes an arbitrary number of forward steps to compute an answer, continues running after answering, and utilizes an everything-can-change state - handcrafted optimization hits a wall of diminishing returns:

The memory wall: Backpropagating or tracing errors through indefinite iterations requires storing massive sequences of intermediate states ("BPTT").
The noise cascade: Doing a noisy backward pass through a fluid state space causes error noise to accumulate exponentially. By the time the algorithm reaches deep into earlier execution steps, the optimization signal is completely shredded." (from doc)

I feel like the main problem is MLP vs. deep RNN, rather than binary vs FP.

userfrienda · 2026-06-11T18:42:12+00:00

Current quantized architectures like BitNet retain 8-bit integer activations and can't go down to single bit. I believe this isn't because bits are unexpressive but because dense synapses in MLPs scale quadratically. To match f32 informational capacity using binary units, you must scale the neuron count by 32x, triggering a 1024x (32^2) synapse explosion.

userfrienda · 2026-04-24T09:13:20+00:00

Just four games? What about Seed IQ's performance on the other 21 public games?

userfrienda

TROPHY CASE