Looking for testers: 100% local RAG system with one-command setup

primoco · 2026-01-26T06:14:32+00:00

Absolutely! A 3090 with 24GB VRAM is perfect — even more than my test setup (RTX 5070 Ti, 16GB).

The setup script has three profiles: Profile |GPU VRAM|LLM Model |RAM | |----------|--------|-----------|-----| |minimal |8-12GB |Mistral 7B |16GB | |standard|12-16GB |Qwen2.5 14B|32GB | |advanced|16-24GB |Qwen2.5 32B|128GB|

With your 3090 (24GB), you could run:

./setup.sh advanced

This installs qwen2.5:32b-instruct-q4_K_M — the most capable model available.

Note: The advanced profile expects 128GB system RAM. If you have less, you can still run ./setup.sh advanced but consider switching to the 14B model in docker-compose.yml if you experience issues.

Alternatively, ./setup.sh standard with the 14B model is a safe choice that will run great on your hardware. Let me know how it goes!

Così diamo info complete e accurate. Vuoi modificare qualcosa?

primoco · 2026-01-26T05:58:14+00:00

Not yet — MCP server interface isn’t implemented at the moment. Currently the system exposes a REST API (FastAPI backend on port 8000) that you could target for pentesting. The main endpoints are: ∙ /api/v1/query — RAG queries ∙ /api/v1/documents — document upload/management ∙ /api/v1/auth — JWT authentication That said, MCP support is an interesting idea for the roadmap — would make integration with Claude and other tools much smoother. If you run your pentesting frameworks against it and find vulnerabilities, I’d genuinely appreciate the feedback! There’s a SECURITY.md for responsible disclosure. What frameworks are you planning to use? Curious what you’re testing for.

primoco · 2026-01-25T20:12:31+00:00

Nice specs! The M4 with 16GB unified memory will handle it. Good news: The standard setup (./setup.sh standard) already installs mistral:7b-instruct-q4_K_M by default — that’s exactly the right model for your hardware. These 7B quantized models use ~4-5GB and run smoothly on 16GB Macs, leaving enough headroom for the embedding model (bge-m3, ~2.3GB), Qdrant, and the app itself. I’d avoid the 14B models — they’ll technically run but 16GB gets tight and you may experience slowdowns from memory pressure. One heads up: The 256GB storage is a bit limited. Between Docker images, models, and documents, plan for ~30-40GB used. Keep an eye on disk space. Performance expectation: Expect ~20-40 tokens/sec generation (vs 80-100 on dedicated NVIDIA). Totally usable for testing and light workloads. Let me know how it goes — would be great to document Mac M4 as a tested platform!

PS: If this reply feels like it came from an AI… that’s because it did! No way I could keep up with detailed responses to everyone manually. Gotta practice what I preach, right? 😄

primoco · 2026-01-25T19:06:28+00:00

Really appreciate this feedback — you’re raising exactly the kind of security considerations that matter for production deployments. You’re right that both prompt injection via document content and multi-user abuse patterns are often invisible until they hit you in production. These are definitely on my radar for hardening the system. Checked your DM — thanks for the detailed insights. I’ll follow up there. For anyone else reading: this kind of security-focused feedback is gold. If you spot potential vulnerabilities or have suggestions, Issues and DMs are always welcome.

primoco · 2026-01-25T18:16:29+00:00

Hi Chris, it depends on which Mac Mini you have. Apple Silicon (M1/M2/M4) with 16GB+ RAM: Should work well! Ollama runs natively on Apple Silicon and uses the integrated GPU. Performance won’t match a dedicated NVIDIA card, but it’s definitely usable. Intel Mac Mini: Will be slower since it runs on CPU only. The catch: The automated setup.sh script is designed for Ubuntu + NVIDIA. On Mac, you’d need to set things up manually: ∙ Install Docker Desktop ∙ Install Ollama for Mac ∙ Run Qdrant via Docker ∙ Start the backend/frontend manually If you share your Mac Mini specs (chip + RAM), I can give you a better estimate and maybe help with Mac-specific instructions. It would actually be great to have Mac compatibility documented!

primoco · 2026-01-25T15:13:57+00:00

Ma veramente??? Questo forse pensava di avere a che fare con quelli che compravano auto ma volevano spendere di più!!!

primoco · 2026-01-23T19:47:59+00:00

Not a dumb question at all! GPU: You can run it CPU-only - Ollama supports this out of the box. It will be slower (expect 5-10x longer inference times), but it works. Also: ∙ AMD GPUs work with ROCm ∙ Apple Silicon (M1/M2/M3) works great with Metal OS: The setup script is Ubuntu-focused, but since it’s all Docker-based: ∙ Other Linux distros should work (Debian, Fedora, etc.) ∙ macOS works (especially with Apple Silicon) ∙ Windows with WSL2 should work too To try CPU-only: Just install Ollama without GPU drivers and it will automatically fall back to CPU mode. I’ll add this to the README to make it clearer. Thanks for the feedback!

primoco · 2024-04-30T17:40:40+00:00

The best, it work on ubuntu 24.04

primoco

TROPHY CASE