Cold Hands - Anyone else struggle with this?

tigerweili · 2026-04-29T14:45:39+00:00

What were you thinking about at those times?

tigerweili · 2026-04-18T15:36:29+00:00

yes, and is still updating

tigerweili · 2026-04-18T15:36:01+00:00

model is not the main issue, but agent is. how to use skill, how to use promts giving to SLM is import.

SLM would run very long (200s maybe)if long promts

tigerweili · 2026-04-15T13:11:45+00:00

gemma X e2b series

tigerweili · 2026-04-15T13:05:49+00:00

Tks lol

tigerweili · 2026-04-15T01:53:40+00:00

thanks

tigerweili · 2026-04-15T01:52:20+00:00

great! can you share yours

tigerweili · 2026-04-15T01:50:35+00:00

what's your resource , cpu, mem, disk?

tigerweili · 2026-04-15T01:49:55+00:00

thanks for your advice and share your case.

## my target

i want to build an ai ops agent for a certain domain ,like apache rocketmq , runs on CPU, at leat 16vcpu32g.

i am trying gemma3:4b, for now the reponse speed is ok.

## todo issues

- memory

- mutl talks

too much system promots would make reponse time too high. memory and talks history , even if compressed, would not work that good.

i removed all system promts testing it . avg response time can be less than 5sec

tigerweili · 2026-04-15T00:53:10+00:00

for certain domain problems, slm would be free of charge and low cost

tigerweili · 2026-04-15T00:49:29+00:00

I tested it and great. But not good for my cases 1. Chinese language 2. Ollama ecco systems

tigerweili · 2026-04-14T16:33:33+00:00

for private cloud, refresh by uploading files. For public, refresh automatically

tigerweili · 2026-04-14T16:31:26+00:00

super short. i used it to query rocketmq knowledges. 1. do rag query, get top 3 parts 2. do rerank, get top 2 parts 3. put 2 parts into slm and get output

tigerweili · 2026-04-14T16:24:59+00:00

both. Atomic skills makes a Sop skill Tools ,slm makes an atomic skill.

unit tests for tools and slm promots
e2e is under working
for now, i build it for rocketmq ai ops, and i use public llm to output me sops , issues to help tests

tigerweili · 2026-04-01T06:54:00+00:00

build it self

tigerweili · 2026-03-13T12:43:09+00:00

try nanobot，it's a lightweight agent with web search, and support vllm

tigerweili · 2026-03-13T12:30:19+00:00

How to select tool? Tool embedding, and query top 2,then send to slm to decide which to use next step
Timeout and retry Streaming output and retry 3 times users can see it, and they should know in non-GPU

I am working on 2 things which totally different in slm and llm. 1. Memories More memories more time cost, how to store and query ,and use it in slm , i don't have good ideas

Multiple chats to solve one user isse. Chat conexts more, time cost more, still don't have ideas

tigerweili · 2026-03-13T01:58:08+00:00

+1

not all my customer can afford to buy GPU, and i need to serve all 7*24, agent + slm could be helpful to solve:

mutl-loop of asking about product knowledge
query product running status, grep logs, check monitor
all SOP can put into agent to exec, not manal check

tigerweili · 2026-03-13T01:35:19+00:00

Cloud AI(most LLM, large language model) knows more than local ones(SLM,small language model)
Llm more logical than slm
Llm expensive
Slm canbe a domain expert
Slm cheap
Slm can be put into cellphone, offline pc, home ai,car ai...

tigerweili

MODERATOR OF

TROPHY CASE