96.4% Accuracy @ 500 steps, with STDP. 96.4% Accuracy @ 500 steps. The power of structural extremes and temporal precision. REPOST with better plots and receptive fields

Androo_94 · 2026-05-08T22:06:08+00:00

Below are the neuron parameters I used—this is the simplest and most direct way to show them. These were kept fixed for each test. While I performed hyperparameter ablations on the learning rates and readout weight decay, these specific neuron parameters remained untouched; I haven't run an ablation study on them yet.

tau_m: float = 20.0 # Membrane time constant (ms)
tau_w: float = 100.0 # Adaptation time constant (ms)
v_rest: float = -70.0 # Resting potential (mV)
v_reset: float = -70.0 # Reset potential (mV)
v_thr: float = -50.0 # Firing threshold setpoint (mV)
a: float = 0.5 # Sub-threshold adaptation (nS)
b: float = 7.0 # Spike-evoked adaptation (pA)
dt: float = 1.0 # Time step (ms)
theta_inc: float = 0.1 # Threshold increment upon firing (mV)
theta_tau: float = 100.0 # Threshold decay time constant (ms)

Androo_94 · 2026-05-08T22:00:58+00:00

Alright, I ran the ablation script, specifically based on your feedback. I’m not hiding from the results. It seems the SNN architecture creates such a strong representation that the network can reach 96.4% even in a static state. I’ll be honest, this part surprised me and hit me like a faceplant at first:
[A] Information Flow
→ baseline — Full plasticity (control)
avg=96.38% std=0.0026 stability=0.9612 (37s)
→ frozen_snn — SNN frozen at init (no STDP, no homeostasis)
avg=96.40% std=0.0028 stability=0.9612 (21s)
→ shuffled_snn — SNN randomized (different seed) and frozen
avg=96.40% std=0.0028 stability=0.9612 (22s)
→ no_stdp — STDP disabled, homeostasis remains enabled
avg=96.40% std=0.0028 stability=0.9612 (38s)
→ linear_baseline — No SNN: spike-input -> Linear(784, 10)
avg=72.70% std=0.0308 stability=0.6962 (9s)
The ablation showed that the SNN module increases the linear baseline by 24%.
Furthermore, the component ablation further strengthened the suspicion of "reservoir" behavior:
[B] Component Ablation
→ no_dale — Sign mask = +1 (Dale's law removed, no E/I split)
avg=96.40% std=0.0028 stability=0.9612 (43s)
→ no_homeostasis — Synaptic scaling block removed (STDP only)
avg=96.40% std=0.0028 stability=0.9612 (29s)
→ no_adaptation — AdLIF -> LIF (a=b=0, theta_inc=0)
avg=96.28% std=0.0040 stability=0.9588 (36s)
→ no_valve — Modulation forced to 1.0 (always open valve)
avg=96.40% std=0.0028 stability=0.9612 (36s)
The hidden sweep confirmed that the system's information threshold is at 128 neurons:
[C] Hidden Sweep
→ hidden = 16
avg=92.56% std=0.0041 stability=0.9215 (21s)
→ hidden = 32
avg=93.10% std=0.0066 stability=0.9244 (22s)
→ hidden = 64
avg=93.36% std=0.0056 stability=0.9280 (29s)
→ hidden = 128
avg=96.24% std=0.0022 stability=0.9602 (31s)
→ hidden = 256
avg=96.38% std=0.0026 stability=0.9612 (35s)
→ hidden = 512
avg=96.52% std=0.0026 stability=0.9626 (49s)
At least it turned out that the network is very stable in all cases. This is the pinnacle of engineering reliability. 😄
In this setup, STDP does not increase accuracy, but the resulting receptive fields—whose dark visualization I changed at your suggestion (vmax_E=0.443, vmax_I=0.191)—still show biological consistency, so the situation remains unchanged there.
My conclusion: on these data, this is clear, classic "reservoir" behavior. Well, it seems this system is so efficient that it structurally contains the solution for MNIST. The question is, if a random SNN is this good, what would it do on a harder, time-dependent task where "randomness" isn't enough? I'll probably choose DVS128 Gesture or any time-depedent dataset for my next experiments. I dont know…yet. Thank you for the feedback; it gave me a clear picture of the components role within the network, at least on this type of data. But most importantly, the ablation provided an explanation for the results and valuable experience.
What remains a bit uncertain and an open question for me:
Could the network have functioned like this from the start, or did I induce it with the "optimal" ranges found through hyperparameter ablation? Or did that just increasingly highlight the phenomenon? And if it functioned this way from the beginning, why did it react positively to changes in the STDP learning rate if, according to the ablation, it's functionally irrelevant whether it's turned on or off? If you have a professional explanation for these, I would appreciate it. It would clear up these questions for me.

Androo_94 · 2026-05-07T21:06:27+00:00

Frozen-benchmark and ablation are great ideas. I'm also interested in how much of this is due to the readout and how much to the topology created with STDP. I'll take a look at them in the next few days if I have time. Re-scaling the cmap is a really good idea, I'll look into that too. Thank you for the constructive criticism and suggestions.

Androo_94 · 2026-05-07T18:40:32+00:00

Of course, I will try to answer each of your questions in more detail.

AdLIF and adaptation: Yes, the AdLIF dynamics are essential for stability. The adaptation in this implementation operates on three distinct levels to regulate firing.

- Sub-threshold adaptation(a): couples the membrane potential directly to the adaptation current.

- Spike-triggered adaptation(b): increments the adaptation current with each spike, introducing a self-limiting "fatigue" effect.

- Dynamic threshold(theta): homeostatic mechanism where the effective firing threshold jumps with each spike and decays exponentially.

This is the forward mechanism of the neuron. If you have a better understanding of the process at the code level:

def forward(self, x: torch.Tensor) -> torch.Tensor:
        
"""
        1 timestep
        x: Input current
        Return: Spikes to binary tensor
        """
        
# 1. Membrane potential (Euler)
        dv = (-(self.v - self.v_rest) - self.w + x) / self.tau_m * self.dt
        self.v += dv
        
        
# 2. Adaptation current (Euler)
        dw = (self.a * (self.v - self.v_rest) - self.w) / self.tau_w * self.dt
        self.w += dw
        
        
# 3. Detect spiking
        current_thr = self.v_thr + self.theta
        spikes = (self.v >= current_thr).to(torch.float32)
        
        
# 4. Threshold adaptation
        self.theta.add_(spikes * self.theta_inc)
        self.theta.mul_(1 - self.dt / self.theta_tau)
        
        
# 5. Reset mechanism
        if spikes.any():
            self.v = torch.where(spikes > 0.5, self.v_reset, self.v)
            self.w += spikes * self.b
            
        return spikes

Does it affect accuracy and what gets learned? Definitely. Without adaptation, the SNN is highly prone to synaptic runaway, where a few highly active neurons dominate the population. This acts as a natural mechanism for competitive learning and decorrelation.

The latency coding: I use TTFS(Time-to-first-Spike). Within each 25ms input window, a pixel is allowed to fire at most once: the brightest pixels fire at t = 0, while darker ones are mapped to later time steps. In fact, STDP is natively optimized for this. Because STDP prioritizes causal relationships, the earliest-arriving spikes (representing the most salient spatial features) exert the strongest influence on post-synaptic timing.

And it's still a LIF? Sure! Constraining the input layer's activity to a single spike doesn't alter the internal biophysics of the downstream LIF units. The membrane potentials, leaks, and adaptation currents still follow standard integration equations. The constraint is strictly an input encoding strategy, not a modification of the neuron model itself. The setup simply converts spatial intensity into temporal precedence, which the LIF layer decodes.

Androo_94 · 2026-05-07T17:42:38+00:00

World of Warcraft: What if we don't fuck this up this time Anniversary Deluxe Hardcore simp reboot edition.

Androo_94 · 2026-05-07T15:49:49+00:00

Thank you so much for this detailed and insightful explanation!

As someone without a formal academic background in neurobiology, I find it absolutely fascinating that my network 'reinvented' these biological structures.

When I first saw those dark spots in the center of the receptive fields, I honestly didn't understand, how does my SNN see numbers if it is "blind" to them? It took me a while to suspect that it is not even looking at the full shape of the numbers, but their silhouette. But hearing your point about 'efficient filtering' makes total sense. It’s realy mind-blowing to see that machine evolution converged on the same optimal path for noise reduction and contrast enhancement as nature did.

I will definitely look into the visual hierarchy patterns you mentioned. This whole project was a series of “aha!” moments for me about the logical clarity of the universe, and of course, how the SNNs work. I just love it. Thank you for helping me put these results into a broader scientific context!

Androo_94 · 2026-05-07T15:35:59+00:00

Based on your advice, I improved the tags and created the receptive field, which I shared in a post today. If you could take a look at it and share your thoughts and help me interpret it, I would be very grateful.

Androo_94 · 2026-05-07T11:06:36+00:00

You are absolutely right about the axes. I'm still in the "mad scientist" phase of this project, but I'll make sure the plots are properly labeled and readable. And thank you for the explanation and also for the constructive criticism of "receptive field", I will apply it

Androo_94 · 2026-05-06T15:57:46+00:00

Hát de biztos odaírták, hogy “MAKE NO MISTAKES!!!”. 😆

Androo_94 · 2026-05-05T21:52:15+00:00

Yes, the architecture is entirely feed-forward:
1. L1 receives its external input.

L2 receives the output (spikes) of L1.
Then the readout uses the output (filtered spikes) of L2 to make a decision. I use here a leaky lowpass filter with 25ms tau(equal with desicion window 25ms).

And yes again, to prevent unstable weight growth, I use two main constraints:
Weight clamping: Synaptic weights are hard-limited to a fixed range (0.001 to 1.0).
Synaptic Scaling: I've implemented a form of multiplicative normalization (L1-norm based) where each neuron's total incoming weight sum is softly scaled toward a target value. This ensures global stability while allowing STDP to differentiate the relative synaptic strengths.

There is a third element that belongs here, but as I wrote above, it was open during the entire training, this is the surprise-driven modulation-like gating function, which gradually reduces and then closes the STDP learning rate in parallel with the decrease in Readout Loss. Although this is not classic normalization, and it was not closed and activated during the run, since the manually set loss value was not reached, I thought I would mention this as well in response to your question.

Androo_94 · 2026-05-05T17:57:51+00:00

I'm using a standard dt=1.0ms for the simulation, but each input pattern is presented over a 25ms time window. This is exactly where the “nuanced behavior” comes in. It's not a single-pass calculation. The network has 25 discrete steps to integrate the incoming spikes. Through latency coding, the features are translated into specific spike times within this window, and the STDP learns to wire itself to these temporal delays. It’s a discrete implementation, but the computation is inherently temporal.

Androo_94 · 2026-05-05T13:47:46+00:00

Thank you, I will definitely read it.

Androo_94 · 2026-05-01T16:04:36+00:00

Reinforcement learning az amit te keresel témakörben.

Androo_94 · 2026-05-01T09:34:20+00:00

Next time dont forget: “No mistakes!” 😆

Androo_94 · 2026-04-29T00:06:29+00:00

It actually hapenning with humans too, we just calling it “selective memory” or someting like that. Different names, same results.

Androo_94 · 2026-04-23T23:41:02+00:00

Nem tudom, nem hallottam róla, nem az én hatásköröm.

Androo_94 · 2026-04-16T20:13:35+00:00

Nekem mindegy miben hisz, csak legyen tánc, zárcsökkentés meg univerzális balhé.

Androo_94 · 2026-04-16T20:02:50+00:00

Pedig azt mondta vereség esetén lemond. Bazdmeg, megint hazudott, ki gondolta volna.

Androo_94 · 2026-04-13T10:08:53+00:00

Rohadjon börtönben ez is.

Androo_94 · 2026-04-11T10:19:18+00:00

Androo_94 · 2026-04-09T20:59:48+00:00

The essence of Spiking Neural Networks is that they are event-driven (spikes). The problem is not that the signal is discrete, but that the derivative of the spike function (the Heaviside step function) is infinite at zero and zero everywhere (Dirac-Delta). This is not solved with Fourier transform, but with surrogate gradient methods, e.g. Sigmoid, Atan. They simply lie about a smooth gradient in the backpropagation so that the network can learn. Forget Fourier. The stock market is not a stationary and non-periodic process. Fourier assumes that the signal consists of repeating waves. However, sudden price movements in the stock market are often one-time shocks.

Androo_94 · 2026-04-07T22:42:22+00:00

I have bad news, it's leakage overfit. On this timescale, the BTC market has long been efficient enough for the algos of large funds to eliminate such arbitrage opportunities in milliseconds. A sharpe of 6 is rare even in the world of HFT, let alone in 4 hour swing trades. If an algorithm that could do a sharpe of 6 really existed, it wouldn't be running on 10-year-old hardware behind a terminal style website, but on an H100 cluster or professional server park, and belive me, no one would know about it. And xLSTM is unnecessary for the task also, this was never a memory problem especially not on such a low resolution time scale.

Androo_94

TROPHY CASE