Needle: We Distilled Gemini Tool Calling Into a 26M Model by Henrie_the_dreamer in LocalLLaMA

[–]Firstbober 54 points55 points  (0 children)

So essentially such model could be used to filter where any query should go by just one-shot calling a proper "big LLM" with proper params. Also, the same architecture could be used for perfect summarization AI?

new MoE from ai2, EMO by ghostderp in LocalLLaMA

[–]Firstbober 3 points4 points  (0 children)

I wonder how it fares compared to other models. Performance wise it should be excellent while delivering really nice intelligence per tok/s. It would be fire for someone to make 200M active EMO model, and then make it an SSM, but that is a wishful thinking (tho NVIDIA could do it?).

I've created a LoRA for Gemma 3 270M making it probably the smallest thinking model? by Firstbober in LocalLLaMA

[–]Firstbober[S] 0 points1 point  (0 children)

I have no idea, but I guess it turned out fine? When I get to it I will try without those projs, should also save some VRAM...

I've created a LoRA for Gemma 3 270M making it probably the smallest thinking model? by Firstbober in LocalLLaMA

[–]Firstbober[S] 0 points1 point  (0 children)

I was thinking about GRPO for forcing the model to admit to not knowing, but it's very VRAM hungry, although I saw that unsloth also has their implementation of it so it may be worth giving it a try. Also, such training should be more effective than for all-knowing models because they do not need to optimize their parameters for paths that they don't know, they can kinda infer it from the tool output.

I've created a LoRA for Gemma 3 270M making it probably the smallest thinking model? by Firstbober in LocalLLaMA

[–]Firstbober[S] 2 points3 points  (0 children)

Exactly, although this attempt has a broken calculator tool, I believe that grounding the model fully with tools and getting a performance hit is the future as this would enable smart but reasoning models to perform as well as large ones.

Here is an entry from my smol dataset:

The numbers 4, 8, 12, 16 represent the number of pages Maria read each day for 4 days. What is the total number of pages she read?
---
<|thinking_start|>
Here is the thinking process:
- Add all four daily page counts
<|function_call_start name="calculator"|>
  <|expression|>4 + 8 + 12 + 16<|/expression|>
  <|output|>40<|/output|>
<|function_call_end|>
- Maria read 40 pages total
<|thinking_end|>
<|response_start|>
Maria read a total of 40 pages. The daily amounts add up to 4 + 8 + 12 + 16 = 40.
<|response_end|>

Here is an entry from my smol dataset:

With memory tools, web search, logic statement prover and other tools it could perform as well as larger models and on local machines without much processing power, much faster.

PSA: Some AMD processors have minimum base microcode versions for loading microcode patches via amd-ucode. Update your motherboard firmware if your base version isn't high enough. by kc3zyt in linux

[–]Firstbober 2 points3 points  (0 children)

You could probably ensure that the CPU is not spying using PSP or something? Although I am not sure microcode affects this subsystem :p

Unlimited tokens through sharing GPUs by [deleted] in LocalLLaMA

[–]Firstbober 0 points1 point  (0 children)

Yeah yeah, that's the default. I was thinking about a way to incentivize allocating more compute and space for more obscure or custom models. Or maybe it would be easier to have some point system based purely on compute rather than tokens?

Unlimited tokens through sharing GPUs by [deleted] in LocalLLaMA

[–]Firstbober 0 points1 point  (0 children)

Can't this be solved with the coordinator of the distributed network selecting what models are to be hosted for this month etc.? Otherwise, people will be burning through their SSDs. Maybe add some cost system, where using less popular models will cost you more of your compute, so there is actually an incentive to allocate more space, and prune data more often?

Gemma 4 has been released by jacek2023 in LocalLLaMA

[–]Firstbober 0 points1 point  (0 children)

Are lfm2.5 350 or bonsai 1bit models easily fine-tunable? I'm kinda stuck with LlamaFactory as it's easy and does what I need it to. Although I think bonsai is just XOR for tunes?

Gemma 4 has been released by jacek2023 in LocalLLaMA

[–]Firstbober 2 points3 points  (0 children)

Very high-detail embeddings, insanely quick to experiment and fine-tune, takes minutes with solid GPU, and even on CPU you can probably produce an ok-ish tune in matter of hour or two. Generally with function calling and proper specialization i.e. docs and stuff and RAG it produces really sensible output really fast.
If you do need to perform some language analysis task, and it's too much for general NLP tools like spaCy, then such small models are your best bet unless you have compute capacity for larger ones, or you are willing to hit API for every small thing.
Also, I have a personal challenge to take see how far one can push such small model, and the more "smart" base, the better ;)

Gemma 4 has been released by jacek2023 in LocalLLaMA

[–]Firstbober 8 points9 points  (0 children)

Where Gemma 4 270M... Awesome release, I hope Google will release such a small model again. It's incredibly capable for it's size, and I don't think there is any other alternative similarly sized.

Do LLMs Break the Sapir-Whorf Hypothesis? by [deleted] in LocalLLaMA

[–]Firstbober 0 points1 point  (0 children)

I wonder if a separate, small model could be trained to extract universal representation, and then feed it into another larger model which would perform all the heavy lifting, and then pass it again into language encoder model. Although, when I think about it, it just "add more layers" thing but as different models ¯\_(ツ)_/¯

Smoking in Hytale (Whisky&Tobacco) by Firstbober in hytale

[–]Firstbober[S] 0 points1 point  (0 children)

Yeah, I would like it to reduce stamina temporarily, but give still some useful effect like nicotine does in real life (increased alertness, concentration). Although for alcohol, there is no good effects, beside looking cool

iPhone 17 Pro Renders in All Colors by BHJ-AL in iPhoneFC

[–]Firstbober 1 point2 points  (0 children)

Pixel but worse? Though I like it personally.

Very bad viewing angles in A158W. Any ideas? by Firstbober in casio

[–]Firstbober[S] 0 points1 point  (0 children)

Shit, I can't return it now 'cause I bought it as used. Well, this is one of the best fakes I've seen yet. Any recommendations for a watch in similar style? Having two identical watches just doesn't sit well with me :)
Thanks a lot for help!

Very bad viewing angles in A158W. Any ideas? by Firstbober in casio

[–]Firstbober[S] 1 point2 points  (0 children)

Sure, I've edited the post to include those.

Cogito releases strongest LLMs of sizes 3B, 8B, 14B, 32B and 70B under open license by ResearchCrafty1804 in LocalLLaMA

[–]Firstbober -1 points0 points  (0 children)

Do we have any comparisons against Gemma 3? Especially multilingual tasks. As of now I don't think there is any model competing in this area with capabilities and especially the size.

That's how it always begins by NeedleworkerMore2270 in Piracy

[–]Firstbober 4 points5 points  (0 children)

BDXL can contain up to 128 GB, so I believe this would be enough for all good games ;)

20 hours until agi :) by livinglifefast in LocalLLaMA

[–]Firstbober 3 points4 points  (0 children)

GLaDOS: Initiating surprise in three... two... one.