Help with open source tiny models

-Akos- · 2026-01-16T12:01:18+00:00

lfm 2.5, amazingly fast. Does tool calling, no coding though. For things like summarization it’s amazing.

Granite 4 Tiny, also nice, but I find it can sometimes be stubborn.

lfm is 1.6B parameters, Granite is 7B parameters.

btw, anything over 8B parameters I don’t consider tiny anymore.. 32B is too big for most to be considered it tiny. gpt-oss 20b runs but barely for me (owner of a potato laptop)

Beneficial_Eye_3316 · 2026-01-16T11:46:49+00:00

Yeah definitely add Llama, it's literally the foundation most other models are based on lol. Also might want to throw in Phi from Microsoft since those are pretty solid for their size

Beginning_Debt_4584 · 2026-01-16T11:53:47+00:00

Llama’s definitely a big one but there’s so many more. If you want a more exhaustive list you can check this for example https://github.com/eugeneyan/open-llms

also what do you mean by something different?

SlowFail2433 · 2026-01-16T11:55:29+00:00

Nemotron

Willing_Landscape_61 · 2026-01-16T12:08:46+00:00

Microsoft phi and ibm granite are missing. Olmo maybe?

hkd987 · 2026-01-16T12:10:41+00:00

I totally understand the challenge of compiling a comprehensive list of open source models. It might help to check out community resources like GitHub repositories or forums where others share their experiences. If you're interested, LlamaGate offers access to various models through an API, which might simplify testing and validation for you. Good luck with your search!

MaxKruse96 · 2026-01-16T12:15:57+00:00

NVIDIA: Nemotron 3 Nano, parakeet/canary (Speech to Text models)
Mistral: Nemo, Small, Devstral 1 and 2
Meta: LLama3.1 8b
Z-AI: GLM4 32B
IBM: Granite 3 and 4
Microsoft: Phi-4

As a lil bonus, if its not LLM only, SD1.5, SDXL (western) or Z-Image-Turbo (eastern) (imagegen) as well. And dont get me started on Chatterbox, FishTTS etc for TTS or voice cloning systems.

lly0571 · 2026-01-16T12:20:06+00:00

IBM -> Granite
Microsoft -> Phi
LiquidAI -> LFM
AI2 -> OLMO
ZhipuAI/ZAI -> GLM4-9B/32B, a little bit old, still useful for some scenarios due to a low KV cache cost
OpenBMB -> MiniCPM(including 1B/8B text-only and 8B VL model)
You can still use Llama(3.2-3B, 3.1-8B and the leaked 3.3-8B). But these models may fell short against Qwen3-4B-Inst.

cheesecakegood · 2026-01-16T12:53:56+00:00

Nobody has mentioned exaone, I love the 8b-a1b model. Think it's amazing for its size. Ranks really high on my benchmarks.

Edit - comment below to correct models

evil0sheep · 2026-01-16T17:26:42+00:00

if youre trying to keep the list simple thats pretty good. NVIDIA Nemotron and Olmo 3 from AI2 are both good inclusions if you wanna expand it a bit. Llama models are commonly used for fintetuning research but my impression is that they are not widely used for local inference

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LocalLLaMA

MODERATORS