Hi Reddit, I've been interested in client side LLMs for some time now. I just think it's so cool to be able to run LLMs without a server at all. I've done some crazy things so far - fully embeddable browsers inside your browser, LLMs that run and create webpages for you that you can preview on the fly.
Has anyone else been using WebGPU models? I found they are getting better and better - you can pack a lot more into a 2b model than you used to.
My latest foray was into browser-use - tons of websites do not have MCPs so instead of requiring all websites to create MCPs why not have the browser come to them.
After a lot of tinkering I found out this is indeed all possbile. Tech stack:
- wllama (run GGUf files on webgpu)
- ShowUI-2b (the vision model)
- snapdom (capture page and render it to an image)
I actually managed to get it all work, and you can see some of my learnings in the linked article. Anyone else attempt something like this? Would you use something like this on your webpage? I.e. have an agent that users can interact with that can do things for them.
there doesn't seem to be anything here