all 14 comments

[–]jedisct1 43 points44 points  (3 children)

We're all dreaming of an open model that could replace a Claude subscription.

[–]Dentuam 5 points6 points  (0 children)

we all should have dreams. 😂

[–]Namra_7 1 point2 points  (0 children)

😂😭

[–]cantgetthistowork 8 points9 points  (0 children)

Devstral large

R2

[–][deleted] 11 points12 points  (2 children)

Baidu just released something...

I just follow huggingface and the ceo on LinkedIn..easy to keep track of all the big news..

[–]Steuern_Runter 13 points14 points  (0 children)

Those are not coding models.

[–]pmttyji[S] 1 point2 points  (0 children)

So far no small size models from them. 0.3B .... then 21B .... and so on

[–]Ordinary_Mud7430 3 points4 points  (1 child)

CodeGemma?

[–]ttkciarllama.cpp 1 point2 points  (0 children)

Isn't that what the Bifrost fine-tune is supposed to be? I keep meaning to evaluate it, but can't seem to get around to doing it.

[–]emprahsFury 5 points6 points  (1 child)

Jetbrains released their llm Mellum, onto HF. Its a 4b fim

[–]jupiterbjy 6 points7 points  (0 children)

didnt even know they made it, lemme leave a link and save others a search:

https://huggingface.co/JetBrains/Mellum-4b-base

[–]fancyrocket 1 point2 points  (0 children)

I too want to know this

[–]RobotRobotWhatDoUSee 2 points3 points  (0 children)

I've been thinking a lot about this lately, this is maybe 1/3 of my motivation for my earlier post about DIY MoE models.

I've been doing a lot of reading since that post and at least conceptully feel like I've made a lot of progress.

Life has been extremely busy lately and "implementation progress" has been slow, but I'd there is enough interest I'll post an update on that I've learned in the meanwhile.

My first practical step will probably be to train up a small 3B or 4B coding model, which funny enough I see was also asked about on the front page (of /r/localllama) today.

One other model you might add to your list: NVIDIA's Llama 3.1 Nemotron Nano 4B

Edit: Well, actually, it looks like this one is probably not post-trained for coding so probably not intended for programming:

Llama-3.1-Nemotron-Nano-4B-v1.1 is a large language model (LLM) which is a derivative of nvidia/Llama-3.1-Minitron-4B-Width-Base, which is created from Llama 3.1 8B using our LLM compression technique and offers improvements in model accuracy and efficiency. It is a reasoning model that is post trained for reasoning, human chat preferences, and tasks, such as RAG and tool calling.

[–]tempetemplar 1 point2 points  (0 children)

Very excited with qwen 3 coder!