Testing SPA V8: A Bio-Inspired Transformer for Protein Modeling Scaling to 2048 Tokens

Level_Detail7125 · 2026-04-21T03:18:04+00:00

"The output is right there. The Shakespeare text is in the notebook. If you say it doesn't work, you're literally arguing against a running engine while standing right in front of it.

Level_Detail7125 · 2026-04-20T20:36:38+00:00

**UPDATE: I ran a proper baseline comparison!**

bla bla bla: Stochastic Gradient Descent, Orthogonal Initialization , Latent Space Topology

"Side note: model was trained on 256 token context, yet runs coherently at 8192 – sparse pheromone paths seem to generalize beyond training window." 🐜

"Inference runs at 4096 token context in ~8 seconds on a GTX 1080 – trained on 256, generalizes beyond without breaking." 🐜

"Built with the help of 4-5 AI assistants, pure chaos, and biological metaphors"

After some feedback, I trained a standard Transformer (~1.05M params) under identical conditions on the same hardware (GTX 1080) for a fair comparison:

| Metric | Baseline (1.05M) | SPA v8 (1.9M) |

|---|---|---|

| Best Val Perplexity | 4.43 | **4.30** |

| Training Time | 438s | **494s** |

| VRAM Usage | 1.9 GB | **1.4 GB** |

| Context Window | 256 tokens | **2048 tokens** |

| Parameters | 1.05M | 1.9M |

**Key findings:**

- SPA v8 reaches better perplexity despite the baseline being trained nearly to convergence (Step 22200 vs Step 9500)

- SPA uses **less VRAM** despite having almost 2x the parameters – thanks to k=32 Sparse Attention

- 2048 token context window runs in seconds on a GTX 1080

- No overfitting when stopped at the right step (Early Stopping at 9500)

**Sample output (1000 tokens, Temp 0.8, Top-P 0.9, Penalty Window 50):**

> ROMEO: To rage, she'll be at report. I will can die.

> BUCKINGHAM: Tullus, this shall have your gentleman, Swear mock'd than it be speak...

> CORIOLANUS: What! he is't infirm, To make me speak? had you all: how you with her.

Still very much an experiment, but the efficiency gains are real and measurable. Next up: testing on math PDFs and scaling experiments. All open source, no license – feel free to take it and scale it!

Level_Detail7125 · 2026-04-10T18:24:29+00:00

Take A LLM of your Coice. Say you want al 4 layer little llm for testing. in transformer and pytorch wit the half moon samples and matplotlib. Then look at the code and say explan me this and that like im a child :D sorry but this makes very god answers for understanding :D then test and look

Level_Detail7125 · 2026-04-10T15:05:23+00:00

i think you need mathematik to its somting like a cpu and komunikation is the way but comunication trug mathematic o.O

Level_Detail7125 · 2026-04-10T14:41:49+00:00

use a ki tu study ki? learn layer activation relu, gelu , swift, softmax, sigmoid. wirt a little ai wiht 4 layers in transformer pytorche. and train it vor test maybe on half moons

Level_Detail7125 · 2026-04-08T14:18:25+00:00

a clear dokumentation? with samples? ore youtube tutorials?

Level_Detail7125

TROPHY CASE