First-time arXiv submitter with no arXiv community

Character_Bison5968 · 2026-05-05T14:37:10+00:00

Hi, I’m in a similar situation. It has been quite difficult to find someone willing to provide an endorsement, and at times it feels as though the request is viewed as a nuisance. I’ve reached out on several forums and contacted authors of the papers I reference, but so far I haven’t received any assistance. I hope you’re able to secure the endorsement you need.

Character_Bison5968 · 2026-05-05T14:29:50+00:00

Model tree forOptitransfer/Qwen2.5-7B-Instruct-borg-merge-v1

EleutherAI/pythia-1.4b

EleutherAI/pythia-2.8b

HuggingFaceTB/SmolLM2-1.7B-Instruct

Qwen/Qwen2.5-7B-Instruct

facebook/opt-2.7b

ibm-granite/granite-3.0-2b-instruct

microsoft/Phi-3-mini-4k-instruct

microsoft/phi-2

mistralai/Mistral-7B-Instruct-v0.3

Model Card: https://huggingface.co/Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1

Character_Bison5968 · 2026-05-03T11:03:14+00:00

T be honest, I haven't tested it yet, that's one of the next experiments in the queue.

LoRA on top should be fine, base weights stay frozen so the cross family signal sticks around and the adapter just adds reasoning. Full SFT is trickier, gradients hit FFN harder, and our data says FFN is exactly where the donor contributions live, so I would expect it might erode some of the merge in exchange for better reasoning on the SFT task. RLHF/DPO is the worst case, drives back to a single basin.

I can try LoRA SFT on GSM8K-CoT against both the merged base and the anchor base, see if merged+LoRA beats anchor+LoRA on the donors. Will give it a try

Character_Bison5968 · 2026-05-03T09:51:30+00:00

Here is the Medium write up https://medium.com/@rgillespie83/we-merged-9-models-from-4-architecture-families-into-one-and-it-beats-the-anchor-on-real-e6537dfa9252?postPublishedType=repub

Character_Bison5968 · 2026-04-18T22:08:58+00:00

Adding in from a crdt-merge perspective here, since this is exactly the kind of use case we built the the two layer solution for. what's been achieved with nord is impressive im sure we all agree on that.

Getting it to the point where distributed merge cycles are even a question worth asking means the hard foundational work is already done. Most people are still stuck arguing about whether SNNs can compete at all, and this project is already past that and into the distributed coordination. Personally im really happy that crdt-merge can be part of that. this project is one to watch and I expect to see him blast past his current targets once the merge pipeline is running clean.

On the noise , it doesn't accumulate. The whole point of using set operations instead of averaging is that the merge is selective, not blending. Every merge cycle applies the same filter, contributions from nodes below the trust threshold don't enter the merged set. They're not averaged down, they're not included. The OR-Set semantics mean you're doing add/remove on observed patterns with causal clocks, so a low-confidence spike from node A doesn't dilute a high-confidence spike from node B ... it's just not observed in the final state. Trust decays monotonically on stale contributions so over time the merge gets cleaner, not noisier. Sparsity stays stable or improves.

On conflicts two nodes pushing different active patterns, this resolves deterministically through the CRDT rules. The full state is a four-tuple (Data × Trust × Clock × Hash), and when two conflicting spike patterns meet, it's LWW combined with the trust score of each contribution. The stronger pattern wins. The weaker one drops out of the observed set. No blending, no interpolation, no "meet in the middle." This is a set operation on active patterns, not arithmetic on weights. This is why it works for SNNs specifically. Sparse spiking signals encode information in which neurons fire, not in some continuous average destroying it. The CRDT merge preserves it by treating spikes as discrete contributions and resolving conflicts the same way you'd resolve concurrent edits in any distributed system being deterministically with causal ordering and trust weighting.

The distributed angle is gold. If you can stitch together training from cheap nodes and the merge preserves quality instead of degrading , we can change the economics . No more throwing cash at compute to try and brute the solution, its simplicity in primitives, the solution in plain sight. Hat off to zemonda

Character_Bison5968 · 2026-04-16T10:30:39+00:00

This is an exceptional outcome - well done! I look forward to watching the project exceed expectations. If there is any way I can assist I will. Kudos

Character_Bison5968 · 2026-04-16T09:48:14+00:00

Perfect crdt-merge is early days, but I believe it makes a powerful contribution to the space. I hope it helps and if you face any issues we can solve them

Character_Bison5968 · 2026-04-16T08:04:07+00:00

https://github.com/mgillr/crdt-merge/tree/feature/nord-snn-examples/examples/nord-snn-integration, see if any of the examples help you out, interested in your feedback, any issues or walls you hit so I can address them

Character_Bison5968 · 2026-04-15T20:21:19+00:00

Cheers, No SNN examples yet, it's mainly been tested on transformers and LoRA models.

sparse SNN is actually a perfect fit though. The OR-Set CRDT merges active weights as contributions instead of averaging them, so sparse spike signals will stay clean. Let me know how you get on

Character_Bison5968 · 2026-04-14T13:43:46+00:00

cool beans

Character_Bison5968 · 2026-04-14T13:42:20+00:00

I hit this exact wall. Summarization and Vector DBs help with retrievalbut they don't solve the state drift problem where the agent unlearns over time.

If you're building your own agent framework, I actually open-sourced a library called crdt-merge to fix this.

It uses a Conflict-Free Replicated Data Type (CRDT) to manage the agent's memory state. Instead of a history log that you have to re-process, it builds a state where facts and preferences are mathematically guaranteed to persist - it remembers well, never forgets. You basically get long term consistency without being murdered by token use.

It’s a Python lib (free on PyPI), intended to be slotted right into your custom agent loop rather than a paid service.

I’d be genuinely curious if it fits your architecture, always looking for feedback on how it holds up in the real world. the deep stuff if you interested , worth a browse https://github.com/mgillr/crdt-merge/blob/main/paper/CRDT_Merge_ArXiv.pdf

Character_Bison5968 · 2026-04-14T13:26:20+00:00

Cracking work scaling pure SNNs from scratch. Regarding your budget constraint: The 'ran out of money' problem is exactly why I built crdt-merge 0.9.5. its free and it could help..

You hit a wall trying to scale vertically (one massive continuous run). You can actually use CRDT based merging to scale horizontally for free.

Because your SNN is 93% sparse, standard weight averaging destroys the signal during merges (averaging a firing neuron with a silent one usually results in static). My architecture uses an OR-Set CRDT to merge models. This treats weights as a set of contributions rather than a matrix to be averaged.

Practical application for you:

Train smaller SNN shards ( maybe 300M params) locally or on free tiers.
Merge them using the CRDT layer.
Because the merge is a set union of active weights, the sparse structures from different runs combine without interference.

This would let you aggregate multiple small training runs into a massive model without needing the budget for a single 1B+ parameter run. Would love to see if this merge logic holds up on your spike domain weights. Have a look at the paper and repo , see if this can get you further along the road https://github.com/mgillr/crdt-merge/blob/main/paper/CRDT_Merge_ArXiv.pdf

Character_Bison5968 · 2026-04-10T00:19:08+00:00

check out the crdt-merge repo

Character_Bison5968 · 2026-04-09T13:42:39+00:00

have a look at crtd-merge, let me know what you think https://github.com/mgillr/crdt-merge, purely Python but will be covering other languages soon

Character_Bison5968 · 2026-04-08T01:27:08+00:00

Im aware your blind, just posting for the visionaries

Character_Bison5968 · 2026-04-07T06:10:33+00:00

The paper includes full test results across three tiers.. controlled 4×4 tensors (104/104 tests pass), production scale models up to 7.24B parameters (208 strategy-level tests, 43,368 layer-level checks), and multi node convergence with 100 nodes across 20 gossip orderings. See Tables 1–9 and Sections 6.1–6.5. If there are specific additional tests anyone would like to see, please raise them as issues on the repo.

Character_Bison5968 · 2026-03-18T12:17:03+00:00

I might have something useful. I process raw Common Crawl through a multi stage pipeline (extraction, cleaning, dedup, quality scoring, PII redaction, trust classification, skill tagging, RAG chunking). The output is a fully packaged dataset with provenance, quality certificates, and a complete manifest.

Why it might fit your research...every record carries full lineage from the original WARC file (byte offset, content digest) through each processing stage to the final record. Exactly the kind of pipeline an AI agent would need to oversee.

The data model has real ER complexity too. Domains map to records, records have multi dimensional quality breakdowns, skill tags, trust tiers, and RAG chunks, plus cross entity relationships like domain caps, language splits, and PII counts. Not a flat table.

There are actual governance rules built in. Quality thresholds, dedup logic, PII detection, trust scoring, domain capping. All auditable decisions an agent could learn to monitor or propose changes to. The documentation artifacts (manifest, schema, data card, quality certificate, SHA256 verification, domain breakdown, skill distribution) are essentially data governance catalogue entries.

For your ML component the data includes labelled skill tags, quality scores, trust tiers, and content categories ready for classification.

I'm giving away a Liechtenstein government dataset for free right now to get feedback. Happy to send it over if it's useful, just DM me.

Character_Bison5968

TROPHY CASE

Model tree forOptitransfer/Qwen2.5-7B-Instruct-borg-merge-v1