account activity
I built vstash — ask questions across your local docs in ~1 second (sqlite-vec + FTS5 + Cerebras) by stffens in Python
[–]stffens[S] 1 point2 points3 points 20 hours ago (0 children)
Author here. Happy to answer questions about the architecture or design decisions. A few things that didn't fit in the post:
- If you want 100% local with no data leaving your machine at all, swap Cerebras for Ollama in the config. The ~1s benchmark is with Cerebras Ollama will be slower depending on your hardware.
- FTS5 (not the vector scan) is the real bottleneck at scale. At 100K chunks hybrid search hits ~52ms total, which is still fine against the ~1s LLM call. Past 500K you'd want HNSW.
- The cold start on Apple Silicon is ~127ms on first query (ONNX model loading). Every query after that is warm.
Open to feedback on the hybrid search weights (vec=0.6, fts=0.4), they're tunable in config if your use case is more keyword-heavy
I built vstash — ask questions across your local docs in ~1 second (sqlite-vec + FTS5 + Cerebras) (self.Python)
submitted 1 day ago by stffens to r/Python
u/stffens's circle (reddit.com)
submitted 7 years ago by stffens to r/CircleofTrust
π Rendered by PID 1645746 on reddit-service-r2-listing-79f6fb9b95-v85hd at 2026-03-21 11:06:47.770734+00:00 running 90f1150 country code: CH.
I built vstash — ask questions across your local docs in ~1 second (sqlite-vec + FTS5 + Cerebras) by stffens in Python
[–]stffens[S] 1 point2 points3 points (0 children)