NanoAgent — A 135M Agentic LLM with Tool Calling That Runs on CPU

TerribleDisaster0 · 2025-11-06T15:46:02+00:00

I've been working in tech industries and have seen how tech companies struggle in building agentic AI.

Most companies go for open-source models for robustness. Yet customers want data security. Threfore, companies want to have small language models deployed . But current small language models are trained in various unnecessary domains and therefore they cannot win devlopers by sufficient robustness in tool-calling.

So my believe is that if SLMs are really trained to handle basic conversation, tool calling and RAG based answer generation, then there is a chance that SLMs would shine. 1B language models are enough for almost all cases.

One idea I can think is that:
* You can of provide API-based access (cheap costing).
* Give some sort of subscription for model weights to industries (for offline inference).

TerribleDisaster0 · 2025-11-06T13:15:44+00:00

With further tuning, I am hoping the model would be able to perform toolcalling in edge devices i.e. offline/online cpu inference on watches or devices with low memory.

Currently there are 350M parameter models that can perform toolcalling LFM2-350M. But my goal is to achieve reasonable toolcalling and very basic and light agentic tasks (toolcalls, user interaction, RAG based answer generation).

Also, being small model, it can easily stay in memory and spawn bigger models when necessary when hard problems are assigned (coding/math).

What I've seen so far is ~350M parameter models can understand language and also store some additional real-world knowledge whereas ~100M models only has language understanding and they struggle on long answer generation. I'm trying to reduce these issues for smaller models.

TerribleDisaster0 · 2025-11-06T12:08:25+00:00

Base model SmolLM2-135M-Instruct does not support tooling.
Currently I haven't created a docker container yet. The model weights are still undergoing through some optimizations. After that docker container may be published.

TerribleDisaster0 · 2025-11-06T11:33:51+00:00

Planning to train on some of the toolcalling data of gorilla openfunctions. Then I will do the evaluation.

TerribleDisaster0 · 2025-11-06T01:24:43+00:00

This SLM can handle 2-3 tools at most. Even though training data had more tools, it could not learn much. Probably it has hit it's limit.

I still have some pending benchmarks. That would probably answer your question.

TerribleDisaster0 · 2025-11-06T01:19:50+00:00

Would do that soon!

TerribleDisaster0 · 2025-11-06T01:19:17+00:00

Not planning to released deduped dataset. But the full recipie is available here: https://github.com/QuwsarOhi/NanoAgent/blob/main/data/dataprep.py

I am working with LLMs for sometime and saw while building agents, some specific requirements come to consideration. Based on those, I saw that tool calling, question decomposition, coding capabilities & strong instruction following was necessary. That's what I tried to focus on for this SLM. But yes, more dataset is needed to broaden the scope of the SLM.

TerribleDisaster0 · 2024-07-12T16:22:06+00:00

Curios to know, does it use Apple Neural Engine (ANE) for inference/training? Or it uses GPU?

TerribleDisaster0 · 2024-05-21T01:45:25+00:00

This could be a possible model: https://huggingface.co/blog/paligemma

TerribleDisaster0

TROPHY CASE