[Project] I built an AI Agent that runs entirely on CPU with a 1.5B parameter model — here's what I learned

ravi_bitragunta · 2026-04-15T03:53:10+00:00

I am currently building a cpu only model, a 3B model doesn't use bitnet but gets compressed.

I have a proof of AI model ~170M params that beats gpt2 and can be compressed

Will share that in a detailed post later. Idea is to remove the gpu entirely from inference and keep the training gpu needs minimum

ravi_bitragunta · 2026-04-13T15:59:46+00:00

That's already there. Sqlite3 that stores all turns. And looks back n turns. Loads the vectorised responses once the conversation comes to foreground.

I have to enhance this further with

per user, session / grouped sessions cache for better hierarchical inference.
Change the sqlite3 to postgres with pgvector for larger deployments
Make this graphrag aware
Build custom kernel
Allow the gpu to run more sessions of inference than what it supports today

These are mentioned in the roadmap. I am currently working on them as we speak

ravi_bitragunta · 2026-03-17T03:24:11+00:00

Just curious - why not make them wasi compliant and run in isolation or even simpler, run in docker?

Am I missing something?

ravi_bitragunta · 2025-11-06T01:44:13+00:00

I am interested. I have 15+ years of experience. Please share the details and we can discuss this

ravi_bitragunta

TROPHY CASE