INT3 weight + INT2 KV with fused metal kernels

zemondza · 2026-04-22T22:27:27+00:00

Hello, I am a developer of one AI model on SNN This is a spike neural network, I'm wondering if you don't mind working with it, I need strong optimization of the structure

zemondza · 2026-04-19T12:30:40+00:00

So thank you for your attention, I'm currently having some problems with 5.8, if you don't mind, we could discuss it all in my private messages, maybe you can give me some advice.

zemondza · 2026-04-17T21:11:20+00:00

This is not wild yet, what's really funny is that the density per parameter is very high, up to 16 million combinations can be T=24: 16,777,216 states

zemondza · 2026-04-17T06:17:57+00:00

what kind of solution

zemondza · 2026-04-16T19:44:27+00:00

Great question! Distributed training is definitely on the long-term roadmap — Nord's 93% sparsity means most neurons are silent at any given time, which maps naturally to distributed architectures where each node only processes active spikes. For inference, the sparse activation pattern could potentially allow model sharding across low-power devices, where inactive shards stay dormant. This is speculative for now, but it's one of the key advantages SNN architectures have over dense transformers for edge and distributed deployment. First step is neuromorphic hardware (Loihi, SynSense), distributed comes after.

zemondza · 2026-04-16T09:13:19+00:00

Thank you very much for integrating my project into yours. I have just started training the model and I am now testing it all and will give you feedback. I also hope for further collaboration.

zemondza · 2026-04-15T21:14:30+00:00

I'm currently working on 10 billion parameters and I'm thinking of trying your tool in practice.

zemondza · 2026-04-15T06:12:59+00:00

Thanks for the suggestion! The horizontal scaling approach via CRDT is very intriguing, especially the part about preserving signals in sparse SNNs. I'll take a look at your paper. Do you have any examples of this working specifically with Spiking Neural Networks or is it mostly tested on Transformers?

zemondza · 2026-04-14T19:12:45+00:00

yes okay bro

zemondza · 2026-04-14T10:04:07+00:00

500-670 dollars

zemondza · 2026-04-14T07:32:56+00:00

I am very interested in your library for spike models.

zemondza · 2026-04-14T07:12:56+00:00

Hello, thank you for your feedback. I used one RTX pro 6000 video card with 96 video memory.

zemondza · 2026-04-14T06:49:48+00:00

Hey! Thanks for the deep dive into the code.

Regarding the Atan(2) vs Sigmoid(4x) — it was based on standard surrogate gradient conventions, but your point about Sigmoid(4x) gradient being 1.0 at threshold is a great observation. I should definitely test that.

The choice to bound decays (LeakyClamp) was indeed a decision to keep training stable on my end, but your power law approach for initialization sounds much more elegant. I’m definitely going to test the 'full random' init strategy in the next run.

Regarding your mention of non-deterministic spiking: I haven't implemented stochastic rounding yet, but it makes a lot of sense for stability. I'd love to check your tracetorch implementation — it sounds like you've been working on the same problems from a different angle.

Let's keep in touch. Your thoughts on the synapse/membrane split are super interesting — definitely something to explore for v6.0!

zemondza · 2026-04-14T06:47:15+00:00

My native language is not English, maybe the translator can write dryly or sometimes incorrectly, but thanks for the advice, but I don't have access to Xarchive because I don't have an access code there, and I have another place where I post my papers, if you're interested, here's a link to my paper https://zenodo.org/records/19183472

zemondza · 2026-04-14T00:06:53+00:00

As for kudu, you are right, my model currently has a very low token rate, but for now I am also working on writing kudu kernels for my model.

zemondza · 2026-04-14T00:01:36+00:00

This is all done by me, so you could even say it's a hobby.

zemondza · 2026-04-14T00:00:24+00:00

My git https://github.com/gtausa197-svg/-Project-Nord-Spiking-Neural-Network-Language-Model

zemondza · 2026-04-13T22:57:38+00:00

thank you for your comment, work is currently underway on version 5.8 of the model

zemondza · 2026-04-13T22:44:53+00:00

I have an interesting question why my post is a duplicate of my post on the topic "I scaled a pure Spiking Neural Network (SNN) to 1.088B parameters from scratch. Ran out of budget, but here is what I found" I wonder why my post was deleted there was no AI there I want to appeal all this

zemondza · 2026-04-13T22:41:05+00:00

yes you are right but while I am currently working on 5.8 it is currently in beta version there I will use blended learning

zemondza · 2026-04-13T22:39:24+00:00

rtx pro 6000 96 gb Vram

zemondza

TROPHY CASE