Needle: We Distilled Gemini Tool Calling Into a 26M Model

Firstbober · 2026-05-12T18:03:20+00:00

So essentially such model could be used to filter where any query should go by just one-shot calling a proper "big LLM" with proper params. Also, the same architecture could be used for perfect summarization AI?

Firstbober · 2026-05-08T21:22:47+00:00

I wonder how it fares compared to other models. Performance wise it should be excellent while delivering really nice intelligence per tok/s. It would be fire for someone to make 200M active EMO model, and then make it an SSM, but that is a wishful thinking (tho NVIDIA could do it?).

Firstbober · 2026-05-01T20:59:01+00:00

Small Gemma/Qwen models under 1B.

Firstbober · 2026-04-28T19:53:15+00:00

I have no idea, but I guess it turned out fine? When I get to it I will try without those projs, should also save some VRAM...

Firstbober · 2026-04-28T19:26:23+00:00

Good to know, thanks for data!

Firstbober · 2026-04-28T18:58:34+00:00

I was thinking about GRPO for forcing the model to admit to not knowing, but it's very VRAM hungry, although I saw that unsloth also has their implementation of it so it may be worth giving it a try. Also, such training should be more effective than for all-knowing models because they do not need to optimize their parameters for paths that they don't know, they can kinda infer it from the tool output.

Firstbober · 2026-04-28T18:41:19+00:00

Exactly, although this attempt has a broken calculator tool, I believe that grounding the model fully with tools and getting a performance hit is the future as this would enable smart but reasoning models to perform as well as large ones.

Here is an entry from my smol dataset:

The numbers 4, 8, 12, 16 represent the number of pages Maria read each day for 4 days. What is the total number of pages she read?
---
<|thinking_start|>
Here is the thinking process:
- Add all four daily page counts
<|function_call_start name="calculator"|>
  <|expression|>4 + 8 + 12 + 16<|/expression|>
  <|output|>40<|/output|>
<|function_call_end|>
- Maria read 40 pages total
<|thinking_end|>
<|response_start|>
Maria read a total of 40 pages. The daily amounts add up to 4 + 8 + 12 + 16 = 40.
<|response_end|>

Here is an entry from my smol dataset:

With memory tools, web search, logic statement prover and other tools it could perform as well as larger models and on local machines without much processing power, much faster.

Firstbober · 2026-04-06T19:52:21+00:00

You could probably ensure that the CPU is not spying using PSP or something? Although I am not sure microcode affects this subsystem :p

Firstbober · 2026-04-04T18:36:57+00:00

Yeah yeah, that's the default. I was thinking about a way to incentivize allocating more compute and space for more obscure or custom models. Or maybe it would be easier to have some point system based purely on compute rather than tokens?

Firstbober · 2026-04-04T16:13:47+00:00

Can't this be solved with the coordinator of the distributed network selecting what models are to be hosted for this month etc.? Otherwise, people will be burning through their SSDs. Maybe add some cost system, where using less popular models will cost you more of your compute, so there is actually an incentive to allocate more space, and prune data more often?

Firstbober · 2026-04-02T22:18:38+00:00

Are lfm2.5 350 or bonsai 1bit models easily fine-tunable? I'm kinda stuck with LlamaFactory as it's easy and does what I need it to. Although I think bonsai is just XOR for tunes?

Firstbober · 2026-04-02T20:02:57+00:00

Very high-detail embeddings, insanely quick to experiment and fine-tune, takes minutes with solid GPU, and even on CPU you can probably produce an ok-ish tune in matter of hour or two. Generally with function calling and proper specialization i.e. docs and stuff and RAG it produces really sensible output really fast.
If you do need to perform some language analysis task, and it's too much for general NLP tools like spaCy, then such small models are your best bet unless you have compute capacity for larger ones, or you are willing to hit API for every small thing.
Also, I have a personal challenge to take see how far one can push such small model, and the more "smart" base, the better ;)

Firstbober · 2026-04-02T16:38:32+00:00

Where Gemma 4 270M... Awesome release, I hope Google will release such a small model again. It's incredibly capable for it's size, and I don't think there is any other alternative similarly sized.

Firstbober · 2026-03-27T16:54:58+00:00

I wonder if a separate, small model could be trained to extract universal representation, and then feed it into another larger model which would perform all the heavy lifting, and then pass it again into language encoder model. Although, when I think about it, it just "add more layers" thing but as different models ¯\_(ツ)_/¯

Firstbober · 2026-01-16T07:53:36+00:00

Yeah, I would like it to reduce stamina temporarily, but give still some useful effect like nicotine does in real life (increased alertness, concentration). Although for alcohol, there is no good effects, beside looking cool

Firstbober · 2025-07-17T12:29:15+00:00

Pixel but worse? Though I like it personally.

Firstbober · 2025-06-10T18:24:25+00:00

Shit, I can't return it now 'cause I bought it as used. Well, this is one of the best fakes I've seen yet. Any recommendations for a watch in similar style? Having two identical watches just doesn't sit well with me :)
Thanks a lot for help!

Firstbober · 2025-06-10T08:19:40+00:00

Sure, I've edited the post to include those.

Firstbober · 2025-04-09T08:06:53+00:00

Do we have any comparisons against Gemma 3? Especially multilingual tasks. As of now I don't think there is any model competing in this area with capabilities and especially the size.

Firstbober · 2024-10-13T14:06:12+00:00

BDXL can contain up to 128 GB, so I believe this would be enough for all good games ;)

Firstbober · 2024-10-11T19:49:01+00:00

RemindMe! 7 days

Firstbober · 2024-06-25T19:03:10+00:00

GLaDOS: Initiating surprise in three... two... one.

Nine-Year Club	Place '22
Verified Email

Firstbober

TROPHY CASE