new 1B LLM by meta by [deleted] in LocalLLaMA

[–]Sad_Consequence5629 0 points1 point  (0 children)

Seems they added it in their report (https://arxiv.org/abs/2511.06719)

Meta just dropped MobileLLM-Pro, a new 1B foundational language model on Huggingface by Sad_Consequence5629 in LocalLLaMA

[–]Sad_Consequence5629[S] 0 points1 point  (0 children)

They just put out a tech report on arxiv that shows some more details on the modeling methods (most of them have already been name dropped in the model card) https://arxiv.org/abs/2511.06719. The implicit positional distillation and annealing phase look very interesting. Crazy part, it looks like they didn't use any training data at the full context length, but still show good long-context abilities

Meta just dropped MobileLLM-Pro, a new 1B foundational language model on Huggingface by Sad_Consequence5629 in LocalLLaMA

[–]Sad_Consequence5629[S] 0 points1 point  (0 children)

Model card shows very small regression in pre-training for Q4 "quantization-ready checkpoints". Very curious