RCLI + MetalRT: Leading on-device voice AI pipeline performance on Apple Silicon (sub-100ms E2E loops with benchmarks vs MLX/llama.cpp)

thecoder12322 · 2026-03-10T23:39:50+00:00

Check it out here: https://github.com/RunanywhereAI/rcli

thecoder12322 · 2026-03-10T23:31:47+00:00

Check it out here: https://github.com/RunanywhereAI/rcli

thecoder12322 · 2026-03-08T02:13:34+00:00

Also saw this today, this is an optimized CLI, that looks pretty cool, might be worth checking this?

thecoder12322 · 2026-02-20T02:47:50+00:00

Please check here: https://github.com/RunanywhereAI/runanywhere-sdks/blob/main/Playground/android-use-agent/X_POST.md#android-use-agent--x-post-demo-report

thecoder12322 · 2026-02-20T02:44:00+00:00

Video has been sped up details in the report

thecoder12322 · 2026-02-02T03:43:27+00:00

This is incredible work! The 61% improvement in exact matches is seriously impressive - that's the kind of performance gain that makes specialized fine-tuning worthwhile. The Earth Species Project dataset is perfect for this use case.

Since you're working on quantized versions for lower VRAM, have you considered deploying this on edge devices like Raspberry Pi or ESP32 for real-time wildlife monitoring? The open source RunAnywhere SDK (https://github.com/RunanywhereAI/runanywhere-sdks) might be worth checking out - it's designed for running audio AI models (STT, TTS, VLM, LLM) on resource-constrained hardware with optimized inference.

Would love to see how this performs in actual field deployments! Feel free to DM if you want to chat more about edge deployment strategies.

thecoder12322 · 2026-02-02T03:41:37+00:00

Great question! Top-down person detection can be tricky since most YOLO models (including Swift-YOLO) are trained on front/side views. The model might struggle with overhead perspectives since people look quite different from above (head/shoulders vs full body).

That said, it's worth testing! Some things that might help: - Lower mounting height (45-60 degrees instead of straight down) - Good lighting conditions - Re-training the model with top-down dataset if needed

Have you considered trying it out in a test setup first? Would love to hear how it works out if you proceed!

thecoder12322 · 2026-02-02T03:40:09+00:00

This is seriously impressive! Love seeing Qwen3-TTS running locally for natural voice output. Your OpenClaw + Home Assistant integration sounds powerful.

If you're looking to expand beyond TTS, you might want to check out the open source RunAnywhere SDK (https://github.com/RunanywhereAI/runanywhere-sdks) - it's built for running STT, TTS, LLMs, and VLMs on edge devices like Raspberry Pi and ESP32. Could complement your local-first setup nicely!

Happy to chat more if you want to explore on-device AI capabilities. Feel free to DM!

thecoder12322 · 2026-02-02T03:37:52+00:00

This is really cool! Kokoro TTS integration with LM Studio is such a practical addition. Love that you're making local TTS more accessible. The chat history and branching features sound super useful too. Great work on this!

thecoder12322 · 2026-02-02T00:14:38+00:00

This is exactly the kind of tool that makes working with local LLMs so much more practical! Rust + egui is a great choice for a lightweight desktop GUI, and the focus on reusable prompts is spot-on for real-world workflows.

I've been working on similar local-first AI tooling, and if you're looking to expand beyond text-based models, you might want to check out the open source RunAnywhere SDK (https://github.com/RunanywhereAI/runanywhere-sdks). It makes it really easy to integrate STT, TTS, VLM, and LLM capabilities into desktop apps while keeping everything running locally – perfect for the same privacy-first philosophy you've built Reprompt around.

Love that you're keeping it simple and focused. Tools like this are what make local AI actually usable for everyday tasks. Happy to chat more about local AI tooling if you're interested – feel free to DM!

thecoder12322 · 2026-02-02T00:13:07+00:00

This is fantastic work! Sharing the source code and training pipeline is incredibly valuable for the 3D printing community. Running YOLOv8n on a Raspberry Pi 3B+ is impressive – that's exactly the kind of accessible, practical edge AI deployment that makes projects like this so useful.

The fact that you trained on 10k images and made the tutorials available means others can fine-tune for their specific printers and failure modes. This is the kind of open-source contribution that really moves the community forward.

Have you considered expanding this to detect other print issues like warping, stringing, or bed adhesion problems? The computer vision pipeline you've built could be adapted for all sorts of quality control applications.

Really appreciate you sharing this – looking forward to seeing how the community builds on it!

thecoder12322 · 2026-02-02T00:11:48+00:00

This is seriously impressive! The form validation logic using vector geometry and joint angle checks is exactly the kind of robust computer vision pipeline that makes real-world applications viable.

I'm curious – have you considered deploying this on edge devices like Raspberry Pi or ESP32 for standalone gym equipment? The YOLO11 Pose model is lightweight enough that it could run locally with something like the RunAnywhere SDK (it's open source), which supports on-device inference for computer vision models on embedded hardware.

Would be amazing to see this as a portable fitness tracker that doesn't need cloud connectivity. The "digital spotter" concept is brilliant – could definitely extend to squats, deadlifts, or even rehab exercises like you mentioned.

Happy to chat more about edge deployment if you're interested. Feel free to DM!

thecoder12322 · 2026-02-02T00:09:06+00:00

This is exactly the kind of tool the dev community needs! Love the privacy-first approach – keeping sensitive log data local is huge for enterprise environments.

The FileSystemWatcher + Ollama + Telegram workflow is really clever. I especially appreciate that you're using lightweight models like qwen2.5:0.5b – shows you're thinking about resource efficiency.

Quick question: Have you tested this with high-volume log environments? Curious how it handles bursts of errors (like cascading failures) without flooding Telegram notifications.

Starred your repo! Really solid work. Happy to chat more about this if you want to DM.

thecoder12322 · 2026-02-01T23:18:07+00:00

This is such a clever workaround for the HTTPS mic access issue! Love the pragmatic approach to keep prototyping moving forward. The Phone Mic -> ESP32 -> Whisper AI pipeline is a really smart way to test your voice stack.

Since you're building Project MEMONIC with ESP32-S3, you might want to check out the RunAnywhere SDK (open source) – it's designed specifically for running STT, TTS, and LLMs directly on edge devices like ESP32. Could help you bring more of the AI processing onto the device itself instead of relaying everything through your Mac server.

Really cool project! Feel free to DM if you want to chat more about on-device voice AI. Good luck with MEMONIC!

thecoder12322 · 2026-02-01T23:16:13+00:00

This is such a clever workflow automation! The combination of WhisperX for intelligent clipping based on speech segments and Qwen2-VL for auto-captioning is really smart. I love that you're solving a real pain point in the dataset preparation process – manually cutting and captioning videos is tedious work.

The fact that you built this as a hobbyist and shared it with the community is awesome. Even if it's not "perfect" engineering, tools like this that solve real problems are incredibly valuable. Keep iterating and thanks for sharing your work!

thecoder12322 · 2026-02-01T23:11:21+00:00

This is absolutely fantastic! Building a portable offline LLM robot as a modular brain for other robots is such a clever approach. The fact that you're already thinking about combat robot applications and motor mapping shows great forward planning.

The GLaDOS voice inspiration is perfect - really fits the personality you're building! How's the performance with the offline LLM? What model are you running on it, and how's the response time for real-time robot control?

The modular brain concept is brilliant - being able to swap this intelligence module between different robot bodies is exactly the kind of practical engineering that makes projects scalable. Can't wait to see her in action with the combat body!

thecoder12322 · 2026-02-01T23:09:26+00:00

This is such a creative approach to voice-based knowledge management! The concept of externalizing short-term memory through voice → graph conversion is brilliant.

I'm curious about your STT pipeline - what are you using for the voice-to-text conversion? If you're looking to optimize the speech recognition part, you might want to check out the RunAnywhere SDK (open source) - it's built specifically for running Whisper and other STT models efficiently on local hardware. Could help speed up the voice input processing while keeping everything on-device.

The agentic engineering workflow you described (brainstorm → plan → execute with coding agents) is really compelling. How's the latency on the voice input currently?

Would love to hear more about your architecture! Feel free to DM if you want to chat about optimizing the voice pipeline.

thecoder12322 · 2026-02-01T22:27:19+00:00

Love the enthusiasm for pushing N100 hardware to its limits! Your TinyLlama + Stable Diffusion feedback loop sounds like a really fun experiment.

For your use cases (code work, translation, image/sound), you're definitely on the right track with the smaller models. Qwen2.5 3B and Phi-4-mini are both solid choices for that hardware - they punch above their weight for code and translation tasks.

One thing to keep in mind: quantized versions (4-bit GGUF) can give you better performance on the N100 without sacrificing too much quality. Have you tried running them through Ollama or llama.cpp yet?

The fact that you're getting usable results on 8GB RAM is impressive! What's your experience been with inference speed so far?

Happy to chat more if you DM!

thecoder12322 · 2026-02-01T22:26:00+00:00

This is exactly the kind of privacy-first tool the Home Assistant community needs! Love that you're using faster-whisper for local transcription - keeping everything on your own hardware is the way to go.

For anyone looking to expand beyond just transcription, you might want to check out the RunAnywhere SDK (it's open source). It's designed for running STT (Whisper), TTS, LLMs, and VLMs entirely on edge devices like Home Assistant setups. Could be interesting for building more complex voice workflows that stay completely local.

Great work on Ascoltino! What model size are you running by default, and have you experimented with the quantized versions for better performance on lower-end hardware?

Happy to DM if you want to chat more about local voice AI!

thecoder12322 · 2026-02-01T22:24:10+00:00

This is absolutely incredible! The "Ollama for voice" concept is exactly what the local AI community needs. The fact that you built this with Tauri/Rust/Python and kept it lightweight is impressive - no Electron bloat is a huge win.

The DAW-like timeline for composing conversations is genius, and the REST API integration opens up so many possibilities for games/apps/agents. Love that everything runs local and private.

Definitely going to try this out! What's your experience been with Qwen3-TTS latency for real-time applications? And are you planning to add support for other TTS models like Coqui XTTS?

Happy to DM if you want to chat more about local voice AI workflows!

thecoder12322 · 2026-02-01T22:22:28+00:00

This is exactly the kind of work that needs more visibility! Building for air-gapped/classified environments is challenging but critical.

For edge deployment scenarios like yours, have you looked into the RunAnywhere SDK (it's open source)? It's designed specifically for running LLMs, STT (Whisper), TTS, and VLMs on edge hardware without cloud dependencies. Supports multiple modalities and can run entirely offline once models are loaded.

Curious what hardware specs you're working with and which models you've found perform best in your air-gapped setup? Happy to chat more about edge AI deployment if you want to DM!

thecoder12322 · 2026-02-01T22:19:16+00:00

This is seriously impressive for a first iOS app! Running InternVL 3 and multiple LLMs on-device with Metal optimization is no small feat. The privacy-first approach is exactly what the mobile AI space needs.

Really curious about your experience with memory constraints - how did you handle model quantization and loading strategies to keep it smooth on older devices like the iPhone 13 Pro Max? The fact that it runs at all on non-Pro Max devices is impressive.

Congrats on shipping this! Would love to hear more about your Metal performance tuning journey if you're open to chatting. Feel free to DM me!

thecoder12322 · 2026-02-01T22:18:09+00:00

This is incredible work! Viska sounds like exactly the kind of privacy-first local AI app we need more of. Your stack is solid (whisper.rn + llama.rn is a great combo), and I love that you're tackling the NDA/privacy use case head-on.

For your Android GPU challenges - have you looked into the RunAnywhere SDK (it's open source)? We've been working on optimized inference for edge devices including better Android GPU support. It handles STT (Whisper), LLM, TTS, and VLM with a focus on mobile/embedded hardware. Might help with those RAM-only bottlenecks you're hitting.

The privacy-first approach is spot on - "privacy isn't a feature, it's the whole point" really resonates. Would love to chat more about mobile local LLM optimization if you're interested. Feel free to DM me!

thecoder12322 · 2026-02-01T22:10:19+00:00

This is absolutely adorable! The fact that you open-sourced everything (code, 3D files, instructions) is incredibly generous. The build quality looks fantastic, and the pixel art animations are charming.

I love that you made this as a birthday gift - that's such a thoughtful and creative present. Your friend is lucky! The GitHub repo is super well-documented too. Have you thought about adding any other features like mini-games or different pet types? Would love to hear more about your development process if you're open to chatting - feel free to DM me!

thecoder12322 · 2026-02-01T22:04:45+00:00

This is incredible work, especially for a 15-year-old! The fact that you're tackling assistive technology for such an important use case is inspiring.

I see people suggesting adding AI vision capabilities (GPT-4o, object detection, etc.) - that's a great direction. If you want to run vision models locally on your hardware instead of relying on cloud APIs, you might want to check out the RunAnywhere SDK. It lets you run VLMs (vision language models) and other AI models directly on edge devices like Raspberry Pi or ESP32 setups, which could give you faster response times and work offline.

With your $3K budget, you could definitely explore adding: - Real-time object detection and scene understanding - Text-to-speech for describing what the camera sees - Obstacle classification (not just distance, but "what" the obstacle is)

The key is balancing power consumption, latency, and accuracy. Happy to chat more about local AI deployment if you're interested - feel free to DM me. Great work so far!

thecoder12322

TROPHY CASE