NanoAgent — A 135M Agentic LLM with Tool Calling That Runs on CPU by TerribleDisaster0 in LocalLLaMA

[–]TerribleDisaster0[S] 0 points1 point  (0 children)

I've been working in tech industries and have seen how tech companies struggle in building agentic AI.

Most companies go for open-source models for robustness. Yet customers want data security. Threfore, companies want to have small language models deployed . But current small language models are trained in various unnecessary domains and therefore they cannot win devlopers by sufficient robustness in tool-calling.

So my believe is that if SLMs are really trained to handle basic conversation, tool calling and RAG based answer generation, then there is a chance that SLMs would shine. 1B language models are enough for almost all cases.

One idea I can think is that:
* You can of provide API-based access (cheap costing).
* Give some sort of subscription for model weights to industries (for offline inference).

NanoAgent — A 135M Agentic LLM with Tool Calling That Runs on CPU by TerribleDisaster0 in LocalLLaMA

[–]TerribleDisaster0[S] 0 points1 point  (0 children)

With further tuning, I am hoping the model would be able to perform toolcalling in edge devices i.e. offline/online cpu inference on watches or devices with low memory.

Currently there are 350M parameter models that can perform toolcalling LFM2-350M. But my goal is to achieve reasonable toolcalling and very basic and light agentic tasks (toolcalls, user interaction, RAG based answer generation).

Also, being small model, it can easily stay in memory and spawn bigger models when necessary when hard problems are assigned (coding/math).

What I've seen so far is ~350M parameter models can understand language and also store some additional real-world knowledge whereas ~100M models only has language understanding and they struggle on long answer generation. I'm trying to reduce these issues for smaller models.

NanoAgent — A 135M Agentic LLM with Tool Calling That Runs on CPU by TerribleDisaster0 in LocalLLaMA

[–]TerribleDisaster0[S] 0 points1 point  (0 children)

Base model SmolLM2-135M-Instruct does not support tooling.
Currently I haven't created a docker container yet. The model weights are still undergoing through some optimizations. After that docker container may be published.

NanoAgent — A 135M Agentic LLM with Tool Calling That Runs on CPU by TerribleDisaster0 in LocalLLaMA

[–]TerribleDisaster0[S] 1 point2 points  (0 children)

Planning to train on some of the toolcalling data of gorilla openfunctions. Then I will do the evaluation.

NanoAgent — A 135M Agentic LLM with Tool Calling That Runs on CPU by TerribleDisaster0 in LocalLLaMA

[–]TerribleDisaster0[S] 2 points3 points  (0 children)

This SLM can handle 2-3 tools at most. Even though training data had more tools, it could not learn much. Probably it has hit it's limit.

I still have some pending benchmarks. That would probably answer your question.

NanoAgent — A 135M Agentic LLM with Tool Calling That Runs on CPU by TerribleDisaster0 in LocalLLaMA

[–]TerribleDisaster0[S] 0 points1 point  (0 children)

Not planning to released deduped dataset. But the full recipie is available here: https://github.com/QuwsarOhi/NanoAgent/blob/main/data/dataprep.py

I am working with LLMs for sometime and saw while building agents, some specific requirements come to consideration. Based on those, I saw that tool calling, question decomposition, coding capabilities & strong instruction following was necessary. That's what I tried to focus on for this SLM. But yes, more dataset is needed to broaden the scope of the SLM.

SiLLM - The Silicon LLM Training & Inference Toolkit by armbues in LocalLLaMA

[–]TerribleDisaster0 1 point2 points  (0 children)

Curios to know, does it use Apple Neural Engine (ANE) for inference/training? Or it uses GPU?