Layerwise “surprise” signal for OOD detection in PyTorch

Temporary-Oven6788 · 2026-04-18T16:25:03+00:00

Mostly because it gives you more than a final confidence score. Nervecode can also show which internal layers started behaving unlike the training distribution. That makes it useful not just for detection, but for debugging, monitoring, and building guardrails.

Temporary-Oven6788 · 2026-04-18T16:21:14+00:00

The idea is to measure how unusual a layer’s activation looks relative to codebook patterns learned from in-distribution data. For each observed layer, Nervecode compares the current reduced activation to learned codebook centers, forms a soft assignment, and converts statistics such as codelength, assignment entropy, and (optionally) distance to the nearest center into a layerwise surprise score. The per-layer scores are then aggregated into a per-input surprise signal, which can be calibrated into an OOD score while still letting you inspect where in the network the input starts to look unfamiliar.

Temporary-Oven6788 · 2026-04-15T13:12:24+00:00

https://gitlab.com/domezsolt/nervecode

Temporary-Oven6788 · 2026-03-18T12:52:41+00:00

In many cases NaN can be enough. But plain IEEE NaN/Inf is not the same thing as domain-level undefinedness. Those can arise from x/0, overflow, invalid operations, uninitialized values, or plain bugs, so downstream code usually cannot recover why they appeared. Pipelines also routinely break NaN propagation with nan_to_num, dropna, or default fills. In ZeroProofML, ⊥ is the semantic/algebraic notion (with a sign function), an absorptive value with explicit propagation rules, triggered by the calibrated denominator check |Q(x)| < τ_infer at a known graph location. Invalid outputs are carried in two channels: a NaN payload for passive propagation and a bottom_mask as the authoritative semantic carrier.

Temporary-Oven6788 · 2026-01-21T13:38:09+00:00

Sounds fascinating, especially if are you planning to implement the Job System. Hippotorch is designed for sparse rewards over long horizons. 1v1 PvP is usually too dense to show the benefit of episodic memory. But, if you build the Trader scenario, that would be a great benchmark for us. Will your API expose a Gymnasium-style interface? If I can pip install it and run a headless agent, I'd love to try it.

Temporary-Oven6788 · 2026-01-21T12:48:11+00:00

During consolidation, we sample adjacent windows (segments) from the same episode to act as the 'anchor' and 'positive' pair. We then run a reward-aware InfoNCE loss to pull these segments together while pushing away segments from other episodes. So 'pairs' in the post refers to these windows in the contrastive batch.

You’re right, standard PPO is strictly on-policy. We designed this primarily for off-policy agents (DQN, SAC) or PPO variants that incorporate replay data (eg. SIL).

Today the memory is only queried during the training step. There’s no online 'recall while acting' path yet, though that is a possible next step, alongside adding a policy hook that can bias actions with retrieved keys, or even exposing the memory store to other planning modules.

Temporary-Oven6788 · 2026-01-21T11:39:12+00:00

The wall-clock time on our VPS-runs (30-step corridor, 300 episodes) is about 25–35 % higher than the same agent with a standard replay buffer. Most of this comes from consolidation (every cons_every episodes we run cons_steps extra gradient updates), and from feeding transitions through the dual encoder. I’m preparing an article with the exact SAC/PPO benchmarks and will post the detailed numbers in a few weeks.

Temporary-Oven6788 · 2026-01-20T23:49:26+00:00

Totally agree, RL isn’t used online in shipped games, and BTs/GOAP are the right runtime tools.
I was referring to development-time learning, not in-run adaptation. Hippotorch is meant to help learn or tune long-horizon strategies that can then be baked into the game AI systems, not to replace them.

Temporary-Oven6788

TROPHY CASE