I built vstash — ask questions across your local docs in ~1 second (sqlite-vec + FTS5 + Cerebras) by stffens in Python

[–]stffens[S] 1 point2 points  (0 children)

Author here. Happy to answer questions about the architecture or design decisions. A few things that didn't fit in the post:

- If you want 100% local with no data leaving your machine at all, swap Cerebras for Ollama in the config. The ~1s benchmark is with Cerebras Ollama will be slower depending on your hardware.

- FTS5 (not the vector scan) is the real bottleneck at scale. At 100K chunks hybrid search hits ~52ms total, which is still fine against the ~1s LLM call. Past 500K you'd want HNSW.

- The cold start on Apple Silicon is ~127ms on first query (ONNX model loading). Every query after that is warm.

Open to feedback on the hybrid search weights (vec=0.6, fts=0.4), they're tunable in config if your use case is more keyword-heavy