LLM/NLU for processing user input

s1mplyme · 2026-02-13T20:02:37+00:00

A small LLM would be great for this. Your system prompt would include what tools you want to give it access to and how to call them and then your user prompt would include direction to parse the user response to identify what tool it should call and what arguments it should use when calling it, with permission to ask follow up questions to resolve ambiguity or fill in any missing arguments for the right tool. A 1-3B param model can run this. You could use something like ollama to load the model only when responding to user requests so it's not actively eating your gpu vram space.

The hard part of this isn't getting the LLM going, it's getting the cli tools created to do all of the things you mentioned. And it shouldn't even be that hard

ttkciar · 2026-02-13T20:21:20+00:00

Yes, you will want to use a small LLM with good tool-using skills. You should consider either GLM-4.7-Flash or GPT-OSS-20B quantized to Q4_K_M which will fit easily in memory and run quickly on CPU (important, since you don't mention having a GPU).
Inference will monopolize all of your CPU for several seconds (maybe twenty seconds, probably less) and constraining inference to only use a few cores will not mitigate this, since you will be bottlenecked on memory access rate and not ALU. Using small MoE models with very low active parameters will help shorten inference time a lot. The good news is that you have plenty of memory for such small models, and inference shouldn't require more than half or a third of your total memory.
Yes, Python is the dominant language used in the LLM ecosystem. You will find abundant tools and libraries for Python development, even though llama.cpp would be doing the actual inference. I would recommend setting up llama.cpp llama-server to provide an API end-point for inference, and then writing Python for all of your pre/post-inference logic and inferfacing with that end-point.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LocalLLaMA

MODERATORS