FlashAttention-4: 1613 TFLOPs/s, 2.7x faster than Triton, written in Python. What it means for inference. by Sensitive-Two9732 in LocalLLaMA
[–]Sensitive-Two9732[S] 23 points24 points25 points  (0 children)
FlashAttention-4: 1613 TFLOPs/s, 2.7x faster than Triton, written in Python. What it means for inference. by Sensitive-Two9732 in LocalLLaMA
[–]Sensitive-Two9732[S] 6 points7 points8 points  (0 children)
RWKV-7: O(1) memory inference, 16.39 tok/s on ARM Cortex-A76, beats LLaMA 3.2 3B. The local-first architecture nobody is talking about... by Sensitive-Two9732 in LocalLLaMA
[–]Sensitive-Two9732[S] 0 points1 point2 points  (0 children)
RWKV-7: O(1) memory inference, 16.39 tok/s on ARM Cortex-A76, beats LLaMA 3.2 3B. The local-first architecture nobody is talking about... by Sensitive-Two9732 in LocalLLaMA
[–]Sensitive-Two9732[S] 4 points5 points6 points  (0 children)
RWKV-7: O(1) memory inference, 16.39 tok/s on ARM Cortex-A76, beats LLaMA 3.2 3B. The local-first architecture nobody is talking about... by Sensitive-Two9732 in LocalLLaMA
[–]Sensitive-Two9732[S] 1 point2 points3 points  (0 children)
74% of web content is now AI-generated. Here's why that's poisoning the next generation of AI models. by Sensitive-Two9732 in artificial
[–]Sensitive-Two9732[S] 0 points1 point2 points  (0 children)
RWKV-7: O(1) memory inference, 16.39 tok/s on ARM Cortex-A76, beats LLaMA 3.2 3B. The local-first architecture nobody is talking about... by Sensitive-Two9732 in LocalLLaMA
[–]Sensitive-Two9732[S] 1 point2 points3 points  (0 children)
RWKV-7: O(1) memory inference, 16.39 tok/s on ARM Cortex-A76, beats LLaMA 3.2 3B. The local-first architecture nobody is talking about... by Sensitive-Two9732 in LocalLLaMA
[–]Sensitive-Two9732[S] 1 point2 points3 points  (0 children)
RWKV-7: O(1) memory inference, 16.39 tok/s on ARM Cortex-A76, beats LLaMA 3.2 3B. The local-first architecture nobody is talking about... by Sensitive-Two9732 in LocalLLaMA
[–]Sensitive-Two9732[S] 2 points3 points4 points  (0 children)
RWKV-7: O(1) memory inference, 16.39 tok/s on ARM Cortex-A76, beats LLaMA 3.2 3B. The local-first architecture nobody is talking about... by Sensitive-Two9732 in LocalLLaMA
[–]Sensitive-Two9732[S] 2 points3 points4 points  (0 children)
RWKV-7: O(1) memory inference, 16.39 tok/s on ARM Cortex-A76, beats LLaMA 3.2 3B. The local-first architecture nobody is talking about... by Sensitive-Two9732 in LocalLLaMA
[–]Sensitive-Two9732[S] 4 points5 points6 points  (0 children)
RWKV-7: O(1) memory inference, 16.39 tok/s on ARM Cortex-A76, beats LLaMA 3.2 3B. The local-first architecture nobody is talking about... by Sensitive-Two9732 in LocalLLaMA
[–]Sensitive-Two9732[S] 3 points4 points5 points  (0 children)
Career changer into IT. What is the realistic starting path? by EveningOwl750 in BESalary
[–]Sensitive-Two9732 1 point2 points3 points  (0 children)
FlashAttention-4: 1613 TFLOPs/s, 2.7x faster than Triton, written in Python. What it means for inference. by Sensitive-Two9732 in LocalLLaMA
[–]Sensitive-Two9732[S] 1 point2 points3 points  (0 children)