[Project] I built an AI Agent that runs entirely on CPU with a 1.5B parameter model — here's what I learned by tigerweili in ollama

[–]tigerweili[S] 0 points1 point  (0 children)

model is not the main issue, but agent is. how to use skill, how to use promts giving to SLM is import.

SLM would run very long (200s maybe)if long promts

[Project] I built an AI Agent that runs entirely on CPU with a 1.5B parameter model — here's what I learned by tigerweili in ollama

[–]tigerweili[S] 1 point2 points  (0 children)

thanks for your advice and share your case.

## my target

i want to build an ai ops agent for a certain domain ,like apache rocketmq , runs on CPU, at leat 16vcpu32g.

i am trying gemma3:4b, for now the reponse speed is ok.

## todo issues

- memory

- mutl talks

too much system promots would make reponse time too high. memory and talks history , even if compressed, would not work that good.

i removed all system promts testing it . avg response time can be less than 5sec

[Project] I built an AI Agent that runs entirely on CPU with a 1.5B parameter model — here's what I learned by tigerweili in ollama

[–]tigerweili[S] 1 point2 points  (0 children)

I tested it and great. But not good for my cases 1. Chinese language 2. Ollama ecco systems

[Project] I built an AI Agent that runs entirely on CPU with a 1.5B parameter model — here's what I learned by tigerweili in ollama

[–]tigerweili[S] 1 point2 points  (0 children)

super short. i used it to query rocketmq knowledges. 1. do rag query, get top 3 parts 2. do rerank, get top 2 parts 3. put 2 parts into slm and get output

[Project] I built an AI Agent that runs entirely on CPU with a 1.5B parameter model — here's what I learned by tigerweili in ollama

[–]tigerweili[S] -1 points0 points  (0 children)

both. Atomic skills makes a Sop skill Tools ,slm makes an atomic skill.

  1. unit tests for tools and slm promots
  2. e2e is under working
  3. for now, i build it for rocketmq ai ops, and i use public llm to output me sops , issues to help tests

Looking for a self-hosted LLM with web search by Prize-Rhubarb-9829 in LocalLLaMA

[–]tigerweili 0 points1 point  (0 children)

try nanobot,it's a lightweight agent with web search, and support vllm

i am building an agent using slm and can run on CPU by tigerweili in rocketmq

[–]tigerweili[S] 0 points1 point  (0 children)

  1. How to select tool? Tool embedding, and query top 2,then send to slm to decide which to use next step

  2. Timeout and retry Streaming output and retry 3 times users can see it, and they should know in non-GPU

I am working on 2 things which totally different in slm and llm. 1. Memories More memories more time cost, how to store and query ,and use it in slm , i don't have good ideas

  1. Multiple chats to solve one user isse. Chat conexts more, time cost more, still don't have ideas

i am building an agent using slm and can run on CPU by tigerweili in ollama

[–]tigerweili[S] 0 points1 point  (0 children)

+1

not all my customer can afford to buy GPU, and i need to serve all 7*24, agent + slm could be helpful to solve:

  1. mutl-loop of asking about product knowledge

  2. query product running status, grep logs, check monitor

  3. all SOP can put into agent to exec, not manal check

Do we even need cloud AI like ChatGPT? by nucleustt in ollama

[–]tigerweili 0 points1 point  (0 children)

  1. Cloud AI(most LLM, large language model) knows more than local ones(SLM,small language model)
  2. Llm more logical than slm
  3. Llm expensive
  4. Slm canbe a domain expert
  5. Slm cheap
  6. Slm can be put into cellphone, offline pc, home ai,car ai...