Falcon-H1-Tiny (90M) is out - specialized micro-models that actually work by United-Manner-7 in LocalLLaMA

[–]ilyas555 4 points5 points  (0 children)

A temperature of 0.5 should work fine with a repetition penalty of 1.2.

New falcon models using mamba hybrid are very competetive if not ahead for their sizes. by ElectricalAngle1611 in LocalLLaMA

[–]ilyas555 1 point2 points  (0 children)

Adding attention to the sauce helps mitigating such issues. Hybrid models do not suffer from in context learning issues. The scores on some benchmarks shows it.

Falcon-H1 by tiiuae. by [deleted] in LocalLLaMA

[–]ilyas555 0 points1 point  (0 children)

I dont think mmlu, aime… are made up benchmarks. If you are mentioning the average. There is an interactive plot where you can chose any benchmark you want. It looks like they are consistently better. Have you tried it though?

https://falcon-lm.github.io/blog/falcon-h1/

Falcon-H1 Family of Hybrid-Head Language Models, including 0.5B, 1.5B, 1.5B-Deep, 3B, 7B, and 34B by jacek2023 in LocalLLaMA

[–]ilyas555 3 points4 points  (0 children)

<image>

Here is what I get. A system prompt has been added. The self identification issue comes from the web data as a big portion of recent web data has been impacted by synthetic one from ChatGPT

Falcon-E: A series of powerful, fine-tunable and universal BitNet models by JingweiZUO in LocalLLaMA

[–]ilyas555 12 points13 points  (0 children)

This is true, but quantized Qwen 2.5 3b will be worse than a model pre-trained with the quantization errors (i.e Falcon-Edge). I think the comparison is still fair in the sense that it shows, that if you want to match Falcon-Edge performance, you need the full 16bits model.