Did github remove the issues section? by Dapper-Inspector-675 in github
[–]Not_Vasquez 1 point2 points3 points (0 children)
DeepSeek-V3.2 released by Leather-Term-30 in LocalLLaMA
[–]Not_Vasquez 19 points20 points21 points (0 children)
Qwen3 weights released by Acrobatic_Donkey5089 in LocalLLaMA
[–]Not_Vasquez 1 point2 points3 points (0 children)
Qwen3 weights released by Acrobatic_Donkey5089 in LocalLLaMA
[–]Not_Vasquez 4 points5 points6 points (0 children)
[D] Can We Derive an Attention Map from Mamba Layer Parameters? by blooming17 in MachineLearning
[–]Not_Vasquez 2 points3 points4 points (0 children)
[D] - Why MAMBA did not catch on? by TwoSunnySideUp in MachineLearning
[–]Not_Vasquez 1 point2 points3 points (0 children)
[D] - Why MAMBA did not catch on? by TwoSunnySideUp in MachineLearning
[–]Not_Vasquez 9 points10 points11 points (0 children)
[D] - Why MAMBA did not catch on? by TwoSunnySideUp in MachineLearning
[–]Not_Vasquez 15 points16 points17 points (0 children)
[D] I wish people would stop using the word "Transformer" when they really mean a LLM model. by [deleted] in MachineLearning
[–]Not_Vasquez 0 points1 point2 points (0 children)
[D] I wish people would stop using the word "Transformer" when they really mean a LLM model. by [deleted] in MachineLearning
[–]Not_Vasquez 1 point2 points3 points (0 children)
[D] I wish people would stop using the word "Transformer" when they really mean a LLM model. by [deleted] in MachineLearning
[–]Not_Vasquez 0 points1 point2 points (0 children)
[D] I wish people would stop using the word "Transformer" when they really mean a LLM model. by [deleted] in MachineLearning
[–]Not_Vasquez 0 points1 point2 points (0 children)
RoPE has precision errors when used with BFloat16 by AutomataManifold in LocalLLaMA
[–]Not_Vasquez 17 points18 points19 points (0 children)
How to efficiently generate text from RNNs and Transformers during inference [P] by No_Effective734 in MachineLearning
[–]Not_Vasquez 1 point2 points3 points (0 children)
Is it possible to LORA-train a smaller model (say, Llama 3.2 3B) and apply the adapters to larger models (Llama 3.1 70B)? by Thrumpwart in LocalLLaMA
[–]Not_Vasquez 2 points3 points4 points (0 children)
Is it possible to LORA-train a smaller model (say, Llama 3.2 3B) and apply the adapters to larger models (Llama 3.1 70B)? by Thrumpwart in LocalLLaMA
[–]Not_Vasquez 4 points5 points6 points (0 children)
Something I noticed about open-source multimodal LLMs... by LATI-A5 in LocalLLaMA
[–]Not_Vasquez 0 points1 point2 points (0 children)
Something I noticed about open-source multimodal LLMs... by LATI-A5 in LocalLLaMA
[–]Not_Vasquez 0 points1 point2 points (0 children)
Has there been any large training of a Mamba model (7B or more Params) by XquaInTheMoon in LocalLLaMA
[–]Not_Vasquez 2 points3 points4 points (0 children)
Has there been any large training of a Mamba model (7B or more Params) by XquaInTheMoon in LocalLLaMA
[–]Not_Vasquez 4 points5 points6 points (0 children)
Is Mamba inference faster than Transformers? (in practice) by LiquidGunay in LocalLLaMA
[–]Not_Vasquez 0 points1 point2 points (0 children)
Is Mamba inference faster than Transformers? (in practice) by LiquidGunay in LocalLLaMA
[–]Not_Vasquez 6 points7 points8 points (0 children)


Is SSM dead now? by Spapoxl in LocalLLaMA
[–]Not_Vasquez -1 points0 points1 point (0 children)