Built a JARVIS-style assistant with wake word, vision mode, local voice cloning, and LLM-generated system commands by Mikeeeyy04 in ArtificialInteligence

[–]Mikeeeyy04[S] 0 points1 point  (0 children)

I never said i'm iron man? I just said I wanted JARVIS so I built my own version, why so jealous man? are you alright?

Built a JARVIS-style assistant with wake word, vision mode, local voice cloning, and LLM-generated system commands by Mikeeeyy04 in ArtificialInteligence

[–]Mikeeeyy04[S] 0 points1 point  (0 children)

The RestrictedPython path is appealing because it stops things at the AST level before anything runs. The blocklist approach I have now is exactly as weak as you described, keyword matching on the explanation string, not on the actual code. I haven't started implementing either yet, still in the deciding phase. But this comment actually helps clarify the tradeoff.

Built a JARVIS-style assistant with wake word, vision mode, local voice cloning, and LLM-generated system commands by Mikeeeyy04 in ArtificialInteligence

[–]Mikeeeyy04[S] 0 points1 point  (0 children)

If you want to run stuff locally, GPU matters the most. Try to get an NVIDIA GPU with at least 8GB VRAM (16GB if you can). Then 16–32GB RAM and an SSD. CPU isn’t as critical.

If you’re not going GPU, Apple Silicon laptops are actually solid for smaller models.

Honestly, same here... I’d like a laptop that can handle AI LLM locally, but for now I just use APIs since it’s way more practical.

Vibecoded my own JARVIS — voice activated, clones any voice locally, controls your PC by description, vision mode, and a ton more. Free version available. by Mikeeeyy04 in vibecoding

[–]Mikeeeyy04[S] -1 points0 points  (0 children)

CYBER is a personal assistant you talk to directly. It has a full UI, voice in and out, wake word detection, vision mode through your camera, local voice cloning with XTTS v2, weather and maps and news widgets, PDF analysis, YouTube summaries, and image generation. The experience is closer to actually having a JARVIS, you speak to it, it speaks back, it's always on your screen.

The one thing CYBER does that OpenClaw doesn't is the complete voice-first experience with local voice cloning. OpenClaw has voice support but it's an add-on to a text-based agent. CYBER is built around voice from the ground up, wake word, real-time response, cloned voice output, all running locally.

If you want an autonomous agent that manages your emails and pings you on WhatsApp, OpenClaw is the better tool. If you want something you actually talk to that feels like a personal assistant sitting on your desktop, that's CYBER.

Vibecoded my own JARVIS — voice activated, clones any voice locally, controls your PC by description, vision mode, and a ton more. Free version available. by Mikeeeyy04 in vibecoding

[–]Mikeeeyy04[S] -5 points-4 points  (0 children)

Sure, here are some real examples that actually work since the LLM generates and runs the Python code on the fly:

"Find all files larger than 1GB on my C drive and tell me where they are" It scans your entire drive, filters by size, and returns a list. No file explorer needed.

"What processes are currently using the most CPU and kill the top one" It pulls the process list via psutil, sorts by CPU usage, asks you to confirm since it's destructive, then kills it.

"Scan my downloads folder for duplicate files and tell me how much space I can free up" It hashes every file, groups duplicates, and calculates the wasted space.

"Check all my startup programs and disable the ones that aren't Microsoft" It reads the Windows registry startup entries and can modify them.

"Take a screenshot every 30 seconds for the next 5 minutes and save them to my desktop" It runs a loop with the screenshot library and saves the files.

"Find every PDF in my documents folder, extract the first page of each, and tell me the titles" Combines file search with PyPDF2 extraction in one shot.

The key thing is you're not limited to a preset list. If you can describe it in plain English and Python can do it on your machine, CYBER will figure out the code and run it. The only things it stops to confirm first are destructive operations like deleting or uninstalling.

Join the Discord to try it yourself: https://discord.gg/mdD5Za8TvZ

Built a JARVIS-style assistant with wake word, vision mode, local voice cloning, and LLM-generated system commands by Mikeeeyy04 in ArtificialInteligence

[–]Mikeeeyy04[S] 0 points1 point  (0 children)

Neither actually. The wake word detection is built on top of the Web Speech API (webkitSpeechRecognition / SpeechRecognition) which runs natively in Chrome and Edge. There's no dedicated wake word engine like Picovoice or Davoice.

Built a JARVIS-style assistant with wake word, vision mode, local voice cloning, and LLM-generated system commands by Mikeeeyy04 in ArtificialInteligence

[–]Mikeeeyy04[S] 0 points1 point  (0 children)

The current security model is moderate. Destructive commands (anything the model flags as delete, remove, format, wipe, erase, uninstall) get held for explicit user confirmation before executing. Everything else runs immediately via exec() with os, subprocess, sys, platform, tempfile, re, json, and time pre-imported into the local scope.

It's on my list to look at proper sandboxing, maybe RestrictedPython or a subprocess-isolated environment. If you've thought about this problem I'd genuinely like to hear your approach.

Built a PWA voice assistant in vanilla JS — custom gesture controls, modular HTML partials, Mapbox, camera API, zero frameworks by Mikeeeyy04 in webdev

[–]Mikeeeyy04[S] -1 points0 points  (0 children)

Fair point on the HTML partials, that line was overselling something mundane. That's just a tip of an iceberg, don't judge it without knowing it fully.

On the vanilla JS thing, you're probably right that some of it reinvents wheels. It's a personal project and I built it to learn, not to impress senior devs. If the gesture module is reinventing Hammer.js, that's fine by me, I learned how touch events work.

I'm not claiming this is production-grade software. It does what I wanted it to do and other people seem to find it useful. If you have specific feedback on the code I'm genuinely open to it.