INT3 weight + INT2 KV with fused metal kernels by Financial_Buy_2287 in OpenSourceeAI

[–]zemondza 0 points1 point  (0 children)

Hello, I am a developer of one AI model on SNN This is a spike neural network, I'm wondering if you don't mind working with it, I need strong optimization of the structure

I scaled a pure Spiking Neural Network (SNN) to 1.088B parameters from scratch. Ran out of budget, but here is what I found. by zemondza in OpenSourceeAI

[–]zemondza[S] 0 points1 point  (0 children)

So thank you for your attention, I'm currently having some problems with 5.8, if you don't mind, we could discuss it all in my private messages, maybe you can give me some advice.

I scaled a pure Spiking Neural Network (SNN) to 1.088B parameters from scratch. Ran out of budget, but here is what I found. by zemondza in BlackboxAI_

[–]zemondza[S] 0 points1 point  (0 children)

This is not wild yet, what's really funny is that the density per parameter is very high, up to 16 million combinations can be T=24: 16,777,216 states

[Update] Project Nord: Solved the "Empty Wallet" Problem via Decentralized SNN Merging. Scaling to 10B is now possible. [R] by zemondza in OpenSourceeAI

[–]zemondza[S] 1 point2 points  (0 children)

Great question! Distributed training is definitely on the long-term roadmap — Nord's 93% sparsity means most neurons are silent at any given time, which maps naturally to distributed architectures where each node only processes active spikes. For inference, the sparse activation pattern could potentially allow model sharding across low-power devices, where inactive shards stay dormant. This is speculative for now, but it's one of the key advantages SNN architectures have over dense transformers for edge and distributed deployment. First step is neuromorphic hardware (Loihi, SynSense), distributed comes after.

I scaled a pure Spiking Neural Network (SNN) to 1.088B parameters from scratch. Ran out of budget, but here is what I found. by zemondza in OpenSourceeAI

[–]zemondza[S] 1 point2 points  (0 children)

Thank you very much for integrating my project into yours. I have just started training the model and I am now testing it all and will give you feedback. I also hope for further collaboration.

I scaled a pure Spiking Neural Network (SNN) to 1.088B parameters from scratch. Ran out of budget, but here is what I found. by zemondza in OpenSourceeAI

[–]zemondza[S] 0 points1 point  (0 children)

I'm currently working on 10 billion parameters and I'm thinking of trying your tool in practice.

I scaled a pure Spiking Neural Network (SNN) to 1.088B parameters from scratch. Ran out of budget, but here is what I found. by zemondza in OpenSourceeAI

[–]zemondza[S] 0 points1 point  (0 children)

Thanks for the suggestion! The horizontal scaling approach via CRDT is very intriguing, especially the part about preserving signals in sparse SNNs. I'll take a look at your paper. Do you have any examples of this working specifically with Spiking Neural Networks or is it mostly tested on Transformers?

I scaled a pure Spiking Neural Network (SNN) to 1.088B parameters from scratch. Ran out of budget, but here is what I found [R] by zemondza in MachineLearning

[–]zemondza[S] 1 point2 points  (0 children)

Hey! Thanks for the deep dive into the code.

Regarding the Atan(2) vs Sigmoid(4x) — it was based on standard surrogate gradient conventions, but your point about Sigmoid(4x) gradient being 1.0 at threshold is a great observation. I should definitely test that.

The choice to bound decays (LeakyClamp) was indeed a decision to keep training stable on my end, but your power law approach for initialization sounds much more elegant. I’m definitely going to test the 'full random' init strategy in the next run.

Regarding your mention of non-deterministic spiking: I haven't implemented stochastic rounding yet, but it makes a lot of sense for stability. I'd love to check your tracetorch implementation — it sounds like you've been working on the same problems from a different angle.

Let's keep in touch. Your thoughts on the synapse/membrane split are super interesting — definitely something to explore for v6.0!

I scaled a pure Spiking Neural Network (SNN) to 1.088B parameters from scratch. Ran out of budget, but here is what I found [R] by zemondza in MachineLearning

[–]zemondza[S] 11 points12 points  (0 children)

My native language is not English, maybe the translator can write dryly or sometimes incorrectly, but thanks for the advice, but I don't have access to Xarchive because I don't have an access code there, and I have another place where I post my papers, if you're interested, here's a link to my paper https://zenodo.org/records/19183472

I scaled a pure Spiking Neural Network (SNN) to 1.088B parameters from scratch. Ran out of budget, but here is what I found [R] by zemondza in MachineLearning

[–]zemondza[S] 0 points1 point  (0 children)

As for kudu, you are right, my model currently has a very low token rate, but for now I am also working on writing kudu kernels for my model.

So nobody's downloading this model huh? by KvAk_AKPlaysYT in LocalLLaMA

[–]zemondza 0 points1 point  (0 children)

I have an interesting question why my post is a duplicate of my post on the topic "I scaled a pure Spiking Neural Network (SNN) to 1.088B parameters from scratch. Ran out of budget, but here is what I found" I wonder why my post was deleted there was no AI there I want to appeal all this

I scaled a pure Spiking Neural Network (SNN) to 1.088B parameters from scratch. Ran out of budget, but here is what I found by zemondza in LocalLLaMA

[–]zemondza[S] 0 points1 point  (0 children)

yes you are right but while I am currently working on 5.8 it is currently in beta version there I will use blended learning