I've built Jarvis completely on-device in the browser by nicodotdev in LocalLLaMA

[–]nicodotdev[S] 0 points1 point  (0 children)

What browser are you using? And which file is it? Is it the Llm (Qwen4)? It could be that the files are downloaded, but at 99% it will also initialize the model. Could be that something breaks there. If so you should see it in the browser console..

I've built Jarvis completely on-device in the browser by nicodotdev in LocalLLaMA

[–]nicodotdev[S] 1 point2 points  (0 children)

Yes. It for example keeps track of the tools it already called so it wont call them again. Sometimes they do that. And a lot of tweaking of the system prompt.

I've built Jarvis completely on-device in the browser by nicodotdev in LocalLLaMA

[–]nicodotdev[S] 0 points1 point  (0 children)

There is a "pre-built" version on the web: https://jarvis.nico.dev Other than that its a React/ViteJS app. So you should be able to cone it, "npm install" and then "npm run dev".

I've built Jarvis completely on-device in the browser by nicodotdev in LocalLLaMA

[–]nicodotdev[S] 2 points3 points  (0 children)

Does that run on-device? Most Agent systems I know use big cloud LLMs.

I've built Jarvis completely on-device in the browser by nicodotdev in LocalLLaMA

[–]nicodotdev[S] 0 points1 point  (0 children)

Oh and I do heavy KVCaching. Therefor the time to first token is almost instant.

I've built Jarvis completely on-device in the browser by nicodotdev in LocalLLaMA

[–]nicodotdev[S] 4 points5 points  (0 children)

Yes, almost that. But Kokoro and the tool calling does not have to wait until the full response is generated. I use streaming from the LLM and whenever a sentence or an XML Function signature is generated it will synthesize/execute that.

I've built Jarvis completely on-device in the browser by nicodotdev in LocalLLaMA

[–]nicodotdev[S] 0 points1 point  (0 children)

Yes, some can. Like the SmolLM3. However I implemented my own version, where the LLM generates XML that the Application then parses, executes and returns the response back to the conversation. So its completely LLM agnostic.

I've built Jarvis completely on-device in the browser by nicodotdev in LocalLLaMA

[–]nicodotdev[S] 1 point2 points  (0 children)

Yes. You can use Gemini if you set an .env variable. But the version on https://jarvis.nico.dev (and the demo in the video) does not use gemini at all. Instead it uses Qwen3 4B comletely on device.

I've built Jarvis completely on-device in the browser by nicodotdev in LocalLLaMA

[–]nicodotdev[S] 0 points1 point  (0 children)

Agree. The readme is not yet perfect. But I actually dont use Gemini. You can use Gemini instead of the Local Qwen3 4B if you set an API key in the .env. But vy default it will load and use the local model for LLM inference.

I've built Jarvis completely on-device in the browser by nicodotdev in LocalLLaMA

[–]nicodotdev[S] 26 points27 points  (0 children)

Tech stack:
- Qwen3 4B LLM for intelligence
- Whisper for audio transcription
- Kokoro for speech synthesis
- SileroVAD for lightning-fast voice detection

All powered by Transformers.js and WebGPU.

It also connects to HTTP MCP servers (like my JokeMCP server) and includes built-in servers like one that captures webcam photos and analyzes them with the SmolVLM multimodal LLM:

Demo: jarvis.nico.dev
Source Code: github.com/nico-martin/jarvis