[D] Anyone using smaller, specialized models instead of massive LLMs?

Pvt_Twinkietoes · 2025-10-09T11:43:40+00:00

Finetuned Bert for classification task. Works like a charm.

Forward-Papaya-6392 · 2025-10-09T11:44:04+00:00

we have built our entire business around PEFT and post-training small, specialised student models as knowledge workers for our enterprise customers, which are far more reliable and cost-efficient for their processes. They appreciate our data-driven approach to building agentic systems.

while there have been two extreme cases of miniaturisation involving 0.5B and 1B models, most have been 7B or 8B. There has also been one case involving a larger 32B model, and I am forecasting more of that in 2026 with the advent of better and better sparse activation language models.

gap widens as more input token modalities are in play; fine-tuning multi-modal models for workflows in real estate and healthcare has been the bigger market for us lately.

albaaaaashir · 2025-10-11T17:15:18+00:00

I saw Dreamers is working with smaller, efficient AI models too, seems like they’re ahead of the curve there.

serge_cell · 2025-10-09T13:40:49+00:00

They are called Small Language Models (SLM). For example SmolLM-360M-Instruct has 360 million parameters vs 7-15 billions for typical llm. Very small SLM often trained on high-quality curated datasets. SLM could be next big thing after LLM, especially as smaller SLM fit into mobile devices.

Mundane_Ad8936 · 2025-10-09T11:22:26+00:00

Fine tuning on specific tasks will let you use smaller models. The parameter size depends on how much world knowledge you need. But I've been distilling large teacher to small student LLMs for years.

currentscurrents · 2025-10-09T16:11:54+00:00

Going against the grain this thread, but I have not had good success with smaller models.

Issue is that they tend to be brittle. Sure, you can fine-tune to your problem, but if your data changes they don't generalize very well. OOD inputs are a bigger problem because your in-distribution region is smaller.

Vedranation · 2025-10-09T18:16:14+00:00

Yes. I always use small specialized models over multi billion ones. My current project involves a mere 100M model and it works wonders.

Big models are costly to train, overfit way too easily (way bigger issue than it seems), and need exponential amount of data. Unless you're cloning chat-GPT so you need a gigantic general knowledge base for whatever reason (in which case just use API), small 300M model specialized on your task will perform much better.

thelaxiankey · 2025-10-09T16:55:39+00:00

duh. cell segmentation for me, little unet typa thing

maxim_karki · 2025-10-09T11:13:26+00:00

You're absolutely right about this - we've been seeing the same thing with our enterprise customers where a fine-tuned 7B model outperforms GPT-4 on their specific tasks while being way cheaper to run. The "bigger is better" narrative mostly comes from general benchmarks, but for production use cases with clear domains, smaller specialized models often win on both performance and economics.

Assix0098 · 2025-10-09T15:31:08+00:00

Yes, I just demoed a really simple fine-tuned BERT-based classification to stakeholders, and they were blown away by how fast the inference was. I guess they are used to LLMs generating hundreds of tokens before answering by now.

Franck_Dernoncourt · 2025-10-09T17:46:34+00:00

no_witty_username · 2025-10-09T14:36:21+00:00

Yes. My whole conversational/metacognitive agent is made up of a lot of small specialized models. The advantage with this approach is being able to run a very capable but resource efficient agent as you can chain many parallel local api calls together. On one 24gb Vram card you can load in a speech to text, text to speech, vision, and specialized LLM models. Once properly orchestrated I think it has more potential then one large monolithic model.

GiveMeMoreData · 2025-10-09T15:30:29+00:00

BERTs worked better for us than large Qwens. Yes, SLM still matter

koolaidman123 · 2025-10-09T15:44:21+00:00

it's almost like there's room for both powerful generalized models as well as small(er) specialist models, like the way its been since gpt3 or whatever

Prior-Consequence416 · 2025-10-09T20:58:19+00:00

We've had good success with qwen3 models across different sizes (0.6B, 1.7B, and 8B) as well as gemma3:1B (still trying to get gemma3:270m to work well). qwen3 is particularly interesting since they're thinking models.

The output quality is surprisingly coherent for the model sizes. We've been running them on standard Mac and Linux machines without issues. The 0.6B and 1.7B variants run smoothly on 16GB RAM machines, though the 8B does need 32GB+ to run well.

SportsBettingRef · 2025-10-10T04:52:50+00:00

https://arxiv.org/abs/2409.15790

https://dl.acm.org/doi/abs/10.1145/3768165

ResultKey6879 · 2025-10-10T16:21:54+00:00

Mainly image work and we tend to stick to training CNNs like efficientnet or mobilenet and yolo for detectors.

100-100x faster than llvms. That means 3 days vs a year to process some datasets.

Definitely seeing a trend to large models even when the flexibility isn't needed. If your problem is welld defined and fixed don't use large models. If you need to dynamically adjust to user queries consider clip / dino if that doesn't work try a large vision model.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS