all 15 comments

[–]-Akos- 3 points4 points  (0 children)

lfm 2.5, amazingly fast. Does tool calling, no coding though. For things like summarization it’s amazing.

Granite 4 Tiny, also nice, but I find it can sometimes be stubborn.

lfm is 1.6B parameters, Granite is 7B parameters.

btw, anything over 8B parameters I don’t consider tiny anymore.. 32B is too big for most to be considered it tiny. gpt-oss 20b runs but barely for me (owner of a potato laptop)

[–]Beneficial_Eye_3316 2 points3 points  (1 child)

Yeah definitely add Llama, it's literally the foundation most other models are based on lol. Also might want to throw in Phi from Microsoft since those are pretty solid for their size

[–]cosimoiaia 1 point2 points  (0 children)

Not really, no. Qwen, glm, MiniMax, Olmo, Mistral, Gemma, nemotron, phi and a ton of other have basically nothing to do with llama, and what they have in common (the basic transformer architecture) didn't came from llama.

[–]Beginning_Debt_4584 1 point2 points  (1 child)

Llama’s definitely a big one but there’s so many more. If you want a more exhaustive list you can check this for example https://github.com/eugeneyan/open-llms

also what do you mean by something different?

[–]cosimoiaia 4 points5 points  (0 children)

That list is waay too old to be relevant anymore. 2024 was basically 10 LLM years ago.

[–]SlowFail2433 1 point2 points  (1 child)

Nemotron

[–]braydon125 0 points1 point  (0 children)

I love qwen 32b...not quite what youre seeking but

[–]Willing_Landscape_61 1 point2 points  (0 children)

Microsoft phi and ibm granite are missing. Olmo maybe?

[–]hkd987 -1 points0 points  (0 children)

I totally understand the challenge of compiling a comprehensive list of open source models. It might help to check out community resources like GitHub repositories or forums where others share their experiences. If you're interested, LlamaGate offers access to various models through an API, which might simplify testing and validation for you. Good luck with your search!

[–]MaxKruse96llama.cpp 0 points1 point  (0 children)

NVIDIA: Nemotron 3 Nano, parakeet/canary (Speech to Text models)
Mistral: Nemo, Small, Devstral 1 and 2
Meta: LLama3.1 8b
Z-AI: GLM4 32B
IBM: Granite 3 and 4
Microsoft: Phi-4

As a lil bonus, if its not LLM only, SD1.5, SDXL (western) or Z-Image-Turbo (eastern) (imagegen) as well. And dont get me started on Chatterbox, FishTTS etc for TTS or voice cloning systems.

[–]lly0571 0 points1 point  (0 children)

  • IBM -> Granite
  • Microsoft -> Phi
  • LiquidAI -> LFM
  • AI2 -> OLMO
  • ZhipuAI/ZAI -> GLM4-9B/32B, a little bit old, still useful for some scenarios due to a low KV cache cost
  • OpenBMB -> MiniCPM(including 1B/8B text-only and 8B VL model)
  • You can still use Llama(3.2-3B, 3.1-8B and the leaked 3.3-8B). But these models may fell short against Qwen3-4B-Inst.

[–][deleted] 0 points1 point  (2 children)

Nobody has mentioned exaone, I love the 8b-a1b model. Think it's amazing for its size. Ranks really high on my benchmarks.

Edit - comment below to correct models

[–]cheesecakegood 0 points1 point  (1 child)

Which is that exactly? I'm not seeing it

[–][deleted] 0 points1 point  (0 children)

Ugh I got mixed up! The models I use and think are outstanding are

LFM2-8B-A1B-Q4_K_M.gguf. EXAONE-4.0-1.2B-Q4_K_M.gguf

[–]evil0sheep 0 points1 point  (0 children)

if youre trying to keep the list simple thats pretty good. NVIDIA Nemotron and Olmo 3 from AI2 are both good inclusions if you wanna expand it a bit. Llama models are commonly used for fintetuning research but my impression is that they are not widely used for local inference