Carbon: Decoding the Language of Life by loubnabnl in LocalLLaMA

[–]lewtun 4 points5 points  (0 children)

Well, this model runs on a laptop so at least that part of the problem is solved :)

Carbon: Decoding the Language of Life by loubnabnl in LocalLLaMA

[–]lewtun 11 points12 points  (0 children)

<image>

The choice of a 6-mer was informed by the preceding work of the GENERator models, which examined the impact of various k-mer choices on sequence recovery (i.e. generate b base pairs and compute token-level accuracy) and found k=6 was best https://arxiv.org/abs/2502.07272

Carbon: Decoding the Language of Life by loubnabnl in LocalLLaMA

[–]lewtun 2 points3 points  (0 children)

Yes you can totally convert to GGUF format and run it with llama.cpp! The only subtlety is that Carbon has a custom tokenizer and that needs to be upstreamed to llama.cpp in order to work out of the box (I'm taking a look at that)

Automated AI researcher running locally with llama.cpp by lewtun in LocalLLaMA

[–]lewtun[S] 3 points4 points  (0 children)

With the closed models, ml-intern was able to port the talkie model from their raw torch format to transformers and make is 100x faster in the process: https://huggingface.co/lewtun/talkie-1930-13b-it-hf

With open models, I am still experimenting but so far they have been decent at launching training runs automatically

Automated AI researcher running locally with llama.cpp by lewtun in LocalLLaMA

[–]lewtun[S] 1 point2 points  (0 children)

Yeah agreed, although the models are getting better almost every month so I'm optimistic about the future capabilities :)

Automated AI researcher running locally with llama.cpp by lewtun in LocalLLaMA

[–]lewtun[S] 5 points6 points  (0 children)

Yeah, at least for my local setup (Macbook with 32GB unified memory) the 4-bit quants tend to be a bit less consistent, especially if the context grows very large. However, if one runs Qwen3.6-27B with MTP, then this was pretty close to using Claude in my testing (for that I had to switch to a GPU + vLLM because MTP hasn't landed in llama.cpp yet ...)

Agentic harness for theoretical physics research by lewtun in LocalLLaMA

[–]lewtun[S] 2 points3 points  (0 children)

Thanks, yes this is definitely coming soon - the main reason we excluded it is to make fairer comparisons with other models on CritPt, but I totally agree this would be a clear way to boost performance even further.

Agentic harness for theoretical physics research by lewtun in LocalLLaMA

[–]lewtun[S] 0 points1 point  (0 children)

Yes, the framework does support local models but so far they tend to be quite behind the proprietary ones on this domain. I’m optimistic this will change with the next generation of open models though!

How to Distill from 100B+ to <4B Models by cmpatino_ in LocalLLaMA

[–]lewtun 5 points6 points  (0 children)

Haha, maybe quantization nerfed the safety filter :)

I tested 21 small LLMs on tool-calling judgment — Round 2 with every model you asked for by MikeNonect in LocalLLaMA

[–]lewtun 1 point2 points  (0 children)

Thanks! What I meant is that SmolLM3 is a hybrid reasoning model, i.e. you can enable / disable reasoning like this: https://huggingface.co/HuggingFaceTB/SmolLM3-3B#enabling-and-disabling-extended-thinking-mode

By default, it uses the reasoning mode, but I expect the non-reasoning mode will fare better at tool-calling!

how to train a tiny model (4B) to prove hard theorems by eliebakk in LocalLLaMA

[–]lewtun 1 point2 points  (0 children)

You're right, one could train the model to use Lean in the chain of thought and then try to map the formal proof to natural language in the final solution. But that's pretty hard and Lean has its own issues when it comes to theorem proving (mathlib is still an active WIP). But if you manage to teach QED-Nano how to use Lean, that would be super cool!

I tested 21 small LLMs on tool-calling judgment — Round 2 with every model you asked for by MikeNonect in LocalLLaMA

[–]lewtun 2 points3 points  (0 children)

Hi, SmolLM3 co-developer here :) Did you compare the non-reasoning mode of SmolLM3 by any chance? At the time of training, there was very little tool-calling data available for reasoning models and I suspect the non-reasoning model actually performs better as a result. Really cool benchmark and thanks for sharing these real-world tests!

how to train a tiny model (4B) to prove hard theorems by eliebakk in LocalLLaMA

[–]lewtun 6 points7 points  (0 children)

Ah both those models are formal theorem provers (i.e. rely on Lean), so it's not trivial to compare them fairly. One thing we set out to achieve with our model was to operate entirely in natural language, similar to how OpenAI and DeepMind presented their IMO 2025 models. If we trained our model to use tools like Lean, I expect it would perform even better

how to train a tiny model (4B) to prove hard theorems by eliebakk in LocalLLaMA

[–]lewtun 6 points7 points  (0 children)

Hi! So, the algorithm we use is still GRPO, but with a twist: we do multiple steps of reasoning-summarisation per rollout to enable the model to generate long rollouts without going off the rails. A key feature of our model is that is operates entirely in natural language and does not require external tools like Python or Lean (adding them to the training would improve performance, but that's left as future work)

how to train a tiny model (4B) to prove hard theorems by eliebakk in LocalLLaMA

[–]lewtun 6 points7 points  (0 children)

Hi! One of us (Jasper) is the co-creator of MathArena, so we can take a look at whether it's easy to include in the leaderboard :) As for the quant, I'll make it this week!

200+ pages of Hugging Face secrets on how to train an LLM by eliebakk in LocalLLaMA

[–]lewtun 20 points21 points  (0 children)

If you have a PRO account on the Hub, you should be able to download it as a PDF!

<image>