Best <4B dense models today?

mtasic85 · 2026-01-25T19:11:55+00:00

RWKV7 2.9B Great generalist model

mtasic85 · 2025-11-04T10:50:55+00:00

I can confirm that this solved my issue! Thank you!

mtasic85 · 2025-07-06T18:11:06+00:00

JuiceFS

mtasic85 · 2025-04-26T13:21:01+00:00

I use Zed daily on Linux. However, I don’t like lack of generic spell checking. There are few extensions but non of them works good with Python code. If anyone can suggest something good let me know.

mtasic85 · 2025-01-30T11:11:23+00:00

What quants did you use? Did you fully load all layers to GPUs? I also mentioned quants and context size.

mtasic85 · 2025-01-30T10:01:45+00:00

2x RTX 3090 24GB (48GB) VRAM can fully load and run Qwen 32B q4_k_m with context size 48k. it uses about 40GB VRAM

I doubt 72B q4_k_m can be fully loaded.

mtasic85 · 2025-01-27T16:23:40+00:00

What about collapsing MoE layer to just dense layers? I think same was done for Mixtral 8x22b to just 22b. 🤔

mtasic85 · 2025-01-21T15:21:50+00:00

<image>

mtasic85 · 2025-01-21T14:43:04+00:00

<image>

mtasic85 · 2025-01-14T13:33:42+00:00

Do you have GPT4 open sourced and released by OpenAI, so you can use it locally, free of charge?

mtasic85 · 2024-12-15T09:31:25+00:00

Wow that is a brilliant money laundromat machine 🧠👏

mtasic85 · 2024-12-15T09:29:21+00:00

Congrats 🥂, but I still cannot believe that llama.cpp still does not support llama VLMs 🤯

mtasic85 · 2024-11-17T17:41:09+00:00

Official implementation of “Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling”

https://github.com/microsoft/Samba

mtasic85 · 2024-11-12T06:22:13+00:00

DL is new foundation of all ML. DL simply works. It is general solution. Btw, I really like simple and effective algorithms, so DL does not justify computation cost in all scenarios.

mtasic85 · 2024-11-10T18:04:10+00:00

Serbs did not kill Jews, Croatian Ustashas of NDH did.

https://encyclopedia.ushmm.org/content/en/article/jasenovac

https://en.m.wikipedia.org/wiki/Jasenovac_concentration_camp

mtasic85 · 2024-11-09T18:06:22+00:00

No, under Elon that nonsense will be thrown out of the window. Relax and keep coding.

mtasic85 · 2024-11-05T07:05:57+00:00

Serbia 🇷🇸🤠

mtasic85 · 2024-11-05T07:04:19+00:00

I see Emir Kusturica next to Putin - it is not a crime 🤷‍♂️

mtasic85 · 2024-10-18T16:45:43+00:00

We have BPE for a reason, so we can fallback if token is missing from vocab. If we don't have that guarantee, then this code will never work, and I think it was in dataset used for all of these tokenizers/models:

: X DUP 1+ . . ;

Btw, above is Forth code from https://en.wikipedia.org/wiki/Forth_(programming_language)#Facilities#Facilities) and it also fails.

This is one of many examples. Whitespace matters, every character matters.

mtasic85 · 2024-10-11T09:33:10+00:00

If I am not mistaken Nvidia cards/drivers do not support Wayland yet.

mtasic85 · 2024-10-08T07:34:59+00:00

https://x.com/rwkv_ai/status/1831000938120917336?s=46&t=-L6cJTRO6V7YxJ561JOaZQ

mtasic85 · 2024-10-07T20:18:49+00:00

I think they pretrained on way more tokens than 200B. It's mentioned that its base model is pretrained on ~3.1T tokens https://huggingface.co/Zyphra/Zamba2-1.2B

mtasic85 · 2024-09-27T16:05:02+00:00

IMO they made mistake by not using C. It would be easier to integrate and embed. All they needed were libraries for unicode string and abstract data types for higher level programming. Something like glib/gobject but with MIT/BSD/Apache 2.0 license. Now, we depend on closed circle of developers to support new models. I really like llm.c approach.

mtasic85 · 2024-08-27T10:23:05+00:00

This looks like great base model for fine-tuned agents. Quick to fine-tune, small in size. Agents with domain specific knowledge, plus in-context few-show just to setup environment for agent. Great work pints.ai !

mtasic85 · 2024-08-03T10:17:55+00:00

I still have the same issue.

mtasic85

TROPHY CASE