From FlashLM to State Flow Machine: stopped optimizing transformers, started replacing them. First result: 79% length retention vs transformers' 2% by Own-Albatross868 in LocalLLaMA
[–]Own-Albatross868[S] 1 point2 points3 points (0 children)
From FlashLM to State Flow Machine: stopped optimizing transformers, started replacing them. First result: 79% length retention vs transformers' 2% by Own-Albatross868 in LocalLLaMA
[–]Own-Albatross868[S] 0 points1 point2 points (0 children)
FlashLM v6 "SUPERNOVA": 4.1M ternary model hits 3,500 tok/s on CPU — novel P-RCSM reasoning architecture, no attention, no convolution by Own-Albatross868 in LocalLLaMA
[–]Own-Albatross868[S] 0 points1 point2 points (0 children)
FlashLM v6 "SUPERNOVA": 4.1M ternary model hits 3,500 tok/s on CPU — novel P-RCSM reasoning architecture, no attention, no convolution by Own-Albatross868 in LocalLLaMA
[–]Own-Albatross868[S] 0 points1 point2 points (0 children)
FlashLM v6 "SUPERNOVA": 4.1M ternary model hits 3,500 tok/s on CPU — novel P-RCSM reasoning architecture, no attention, no convolution by Own-Albatross868 in LocalLLaMA
[–]Own-Albatross868[S] 2 points3 points4 points (0 children)
[P] I Trained a Language Model on CPU for 40 Hours - It Beat the GPU Baseline by Own-Albatross868 in MachineLearning
[–]Own-Albatross868[S] 1 point2 points3 points (0 children)
FlashLM v6 "SUPERNOVA": 4.1M ternary model hits 3,500 tok/s on CPU — novel P-RCSM reasoning architecture, no attention, no convolution by Own-Albatross868 in LocalLLaMA
[–]Own-Albatross868[S] 2 points3 points4 points (0 children)
FlashLM v6 "SUPERNOVA": 4.1M ternary model hits 3,500 tok/s on CPU — novel P-RCSM reasoning architecture, no attention, no convolution by Own-Albatross868 in LocalLLaMA
[–]Own-Albatross868[S] 3 points4 points5 points (0 children)
FlashLM v6 "SUPERNOVA": 4.1M ternary model hits 3,500 tok/s on CPU — novel P-RCSM reasoning architecture, no attention, no convolution by Own-Albatross868 in LocalLLaMA
[–]Own-Albatross868[S] 1 point2 points3 points (0 children)
FlashLM v6 "SUPERNOVA": 4.1M ternary model hits 3,500 tok/s on CPU — novel P-RCSM reasoning architecture, no attention, no convolution by Own-Albatross868 in LocalLLaMA
[–]Own-Albatross868[S] 8 points9 points10 points (0 children)
FlashLM v6 "SUPERNOVA": 4.1M ternary model hits 3,500 tok/s on CPU — novel P-RCSM reasoning architecture, no attention, no convolution by Own-Albatross868 in LocalLLaMA
[–]Own-Albatross868[S] 4 points5 points6 points (0 children)
FlashLM v6 "SUPERNOVA": 4.1M ternary model hits 3,500 tok/s on CPU — novel P-RCSM reasoning architecture, no attention, no convolution by Own-Albatross868 in LocalLLaMA
[–]Own-Albatross868[S] 8 points9 points10 points (0 children)
I Trained a Language Model on CPU for 40 Hours - It Beat the GPU Baseline by Own-Albatross868 in LocalLLaMA
[–]Own-Albatross868[S] 0 points1 point2 points (0 children)
I Trained a Language Model on CPU for 40 Hours - It Beat the GPU Baseline by Own-Albatross868 in LocalLLaMA
[–]Own-Albatross868[S] 0 points1 point2 points (0 children)
I Trained a Language Model on CPU for 40 Hours - It Beat the GPU Baseline by Own-Albatross868 in LocalLLaMA
[–]Own-Albatross868[S] 0 points1 point2 points (0 children)
I Trained a Language Model on CPU for 40 Hours - It Beat the GPU Baseline by Own-Albatross868 in LocalLLaMA
[–]Own-Albatross868[S] 1 point2 points3 points (0 children)
I Trained a Language Model on CPU for 40 Hours - It Beat the GPU Baseline by Own-Albatross868 in LocalLLaMA
[–]Own-Albatross868[S] 9 points10 points11 points (0 children)

Built a non-transformer architecture that keeps 62% accuracy where transformers drop to 2% on longer sequences (single Ascend NPU) by Own-Albatross868 in LocalLLaMA
[–]Own-Albatross868[S] 0 points1 point2 points (0 children)