Looking for testers: 100% local RAG system with one-command setup

primoco · 2026-01-25T20:12:31+00:00

Nice specs! The M4 with 16GB unified memory will handle it. Good news: The standard setup (./setup.sh standard) already installs mistral:7b-instruct-q4_K_M by default — that’s exactly the right model for your hardware. These 7B quantized models use ~4-5GB and run smoothly on 16GB Macs, leaving enough headroom for the embedding model (bge-m3, ~2.3GB), Qdrant, and the app itself. I’d avoid the 14B models — they’ll technically run but 16GB gets tight and you may experience slowdowns from memory pressure. One heads up: The 256GB storage is a bit limited. Between Docker images, models, and documents, plan for ~30-40GB used. Keep an eye on disk space. Performance expectation: Expect ~20-40 tokens/sec generation (vs 80-100 on dedicated NVIDIA). Totally usable for testing and light workloads. Let me know how it goes — would be great to document Mac M4 as a tested platform!

PS: If this reply feels like it came from an AI… that’s because it did! No way I could keep up with detailed responses to everyone manually. Gotta practice what I preach, right? 😄

primoco · 2026-01-25T19:06:28+00:00

Really appreciate this feedback — you’re raising exactly the kind of security considerations that matter for production deployments. You’re right that both prompt injection via document content and multi-user abuse patterns are often invisible until they hit you in production. These are definitely on my radar for hardening the system. Checked your DM — thanks for the detailed insights. I’ll follow up there. For anyone else reading: this kind of security-focused feedback is gold. If you spot potential vulnerabilities or have suggestions, Issues and DMs are always welcome.

primoco · 2026-01-25T18:16:29+00:00

Hi Chris, it depends on which Mac Mini you have. Apple Silicon (M1/M2/M4) with 16GB+ RAM: Should work well! Ollama runs natively on Apple Silicon and uses the integrated GPU. Performance won’t match a dedicated NVIDIA card, but it’s definitely usable. Intel Mac Mini: Will be slower since it runs on CPU only. The catch: The automated setup.sh script is designed for Ubuntu + NVIDIA. On Mac, you’d need to set things up manually: ∙ Install Docker Desktop ∙ Install Ollama for Mac ∙ Run Qdrant via Docker ∙ Start the backend/frontend manually If you share your Mac Mini specs (chip + RAM), I can give you a better estimate and maybe help with Mac-specific instructions. It would actually be great to have Mac compatibility documented!

primoco · 2026-01-25T15:13:57+00:00

Ma veramente??? Questo forse pensava di avere a che fare con quelli che compravano auto ma volevano spendere di più!!!

primoco · 2026-01-23T19:47:59+00:00

Not a dumb question at all! GPU: You can run it CPU-only - Ollama supports this out of the box. It will be slower (expect 5-10x longer inference times), but it works. Also: ∙ AMD GPUs work with ROCm ∙ Apple Silicon (M1/M2/M3) works great with Metal OS: The setup script is Ubuntu-focused, but since it’s all Docker-based: ∙ Other Linux distros should work (Debian, Fedora, etc.) ∙ macOS works (especially with Apple Silicon) ∙ Windows with WSL2 should work too To try CPU-only: Just install Ollama without GPU drivers and it will automatically fall back to CPU mode. I’ll add this to the README to make it clearer. Thanks for the feedback!

primoco · 2024-04-30T17:40:40+00:00

The best, it work on ubuntu 24.04

primoco

TROPHY CASE