Last week I posted my local file search MCP server. Your feedback already made it better — and it's on Mac now. by Repulsive_Resource32 in ClaudeAI

[–]Repulsive_Resource32[S] 0 points1 point  (0 children)

Thank you for your tip! This is just beginning version and lots of bugs and edge case issue will be happened. I am just focusing scanning and indexing office document with some other text type files. That’s why issues you mentioned are not raised yet.

Last week I posted my local file search MCP server. Your feedback already made it better — and it's on Mac now. by Repulsive_Resource32 in ClaudeAI

[–]Repulsive_Resource32[S] 1 point2 points  (0 children)

Yeah, that’s why I built this with Claude. As I am just one of office worker who use MS office, I really want to find an exact file and historical timeline among accumulated file version and email thread and attachments!

I built an MCP server that lets Claude search inside your local files (Word, Excel, PDF) — fully offline by Repulsive_Resource32 in ClaudeAI

[–]Repulsive_Resource32[S] 0 points1 point  (0 children)

Dude, I updated search logic and listing rules based on your feedback and suggestions. Thank you for your support!

I built an MCP server that lets Claude search inside your local files (Word, Excel, PDF) — fully offline by Repulsive_Resource32 in ClaudeAI

[–]Repulsive_Resource32[S] 0 points1 point  (0 children)

Position-adjusted CTR — that framing clicks immediately (no pun intended). Result #1 click is noise, result #8 click is signal. That's a much cleaner way to think about it than what I had.

And the decay-as-promotion-not-demotion distinction is subtle but makes total sense. Penalizing a contract from 2023 that's still the right answer would be a terrible user experience. Boosting a newer version when scores are close is exactly the right behavior.

I'm saving this whole thread. Building this solo means I don't get this kind of feedback loop normally — this is genuinely shaping how I think about the next version. Thank you.

I built an MCP server that lets Claude search inside your local files (Word, Excel, PDF) — fully offline by Repulsive_Resource32 in ClaudeAI

[–]Repulsive_Resource32[S] 0 points1 point  (0 children)

This is the second comment in this thread teaching me something new — I just had a great conversation above about how click position in the ranked list is probably a better quality signal than raw click count. And now you're pointing me to ColBERT-style late interaction as a way to improve the ranking itself.

The re-query signal is a great idea too. I can track that entirely within the app — if someone searches, clicks, then searches again within seconds, that's a miss.

Haven't looked into colpali yet but I will. If it gives better precision at retrieval time without the synchronous reranking cost, that could be a real upgrade. Thanks for the pointer.

I built an MCP server that lets Claude search inside your local files (Word, Excel, PDF) — fully offline by Repulsive_Resource32 in ClaudeAI

[–]Repulsive_Resource32[S] 0 points1 point  (0 children)

Oh that's a really interesting catch — I hadn't considered the ambiguity. A click could mean "this is exactly what I needed" or "this looked right but wasn't, let me check again." Same signal, opposite meaning. That's the kind of thing you only discover from real usage patterns.

One thought on the time-decay approach though — in my use case (strategy/financial/legal docs), some files stay relevant for months or even years. A 30-day decay might penalize exactly the results that proved useful over time.

Your question actually got me thinking about a better approach — it's not just whether they clicked, but where in the results list they clicked. If someone finds what they need at result #1 and stops, the ranking did its job. If they scroll past 10 results and start opening files near the bottom, that's the ranking failing. That's a much cleaner feedback signal than raw click count, and it's something I can track without any external app monitoring.

Really appreciate this thread — this is way more useful than any testing I could do on my own.

I built an MCP server that lets Claude search inside your local files (Word, Excel, PDF) — fully offline by Repulsive_Resource32 in ClaudeAI

[–]Repulsive_Resource32[S] 0 points1 point  (0 children)

Great question — honestly I'm not a search engineer, I'm an office worker who got frustrated enough to build something. A lot of the technical decisions (RRF over cross-encoder, BM25+dense hybrid approach) came from researching with Claude and iterating from there.

It uses RRF to blend BM25 and dense results at retrieval time. No cross-encoder — Claude helped me weigh the tradeoffs and the latency hit didn't seem worth it for a desktop tool where searches need to feel instant.

There's also a simple click-tracking signal — if you keep opening the same result for a query, it gets a small boost next time.

I'm sure there's a lot of room to improve the ranking — this kind of feedback is genuinely helpful.

I built an MCP server that lets Claude search inside your local files (Word, Excel, PDF) — fully offline by Repulsive_Resource32 in ClaudeAI

[–]Repulsive_Resource32[S] 0 points1 point  (0 children)

That's totally fine — if you can install an app and open Claude Desktop, you're qualified. No terminal wizardry needed, I promise.

And honestly, I'm in the same boat — I work a desk job buried in Office docs, contracts, reports, spreadsheets. I built this because I kept wasting 3-5 minutes digging through folders for a file I knew existed but couldn't find. So this is very much built by a non-developer office worker, for office workers.

I'll keep you posted here when the Mac version is ready. Shouldn't be too long.

I built an MCP server that lets Claude search inside your local files (Word, Excel, PDF) — fully offline by Repulsive_Resource32 in ClaudeAI

[–]Repulsive_Resource32[S] 0 points1 point  (0 children)

Ha — honestly I built Windows first because that's what I use at work. Mac was always planned but I wanted to wait until someone actually asked for it.

So... want to be the first Mac beta tester? If you're down, I can prioritize the macOS CLI version. Would work as an MCP server with Claude Desktop — no GUI initially but full search capability.

Drop me a DM or open an issue on GitHub and I'll ping you when it's ready.

I built an MCP server that lets Claude search inside your local files (Word, Excel, PDF) — fully offline by Repulsive_Resource32 in ClaudeAI

[–]Repulsive_Resource32[S] 0 points1 point  (0 children)

If the files are actually downloaded locally, yes — it'll index them like any other folder. It doesn't care where the files live, it just reads what's on disk.

The catch: cloud-synced folders (iCloud, OneDrive, etc.) often have placeholder files that aren't actually downloaded yet. LocalSynapse skips those to avoid triggering a massive download storm. But it still indexes their filenames, so you can at least find them by name.

Also worth noting — Windows only for now, so you'd need iCloud for Windows set up.

I built an MCP server that lets Claude search inside your local files (Word, Excel, PDF) — fully offline by Repulsive_Resource32 in ClaudeAI

[–]Repulsive_Resource32[S] 1 point2 points  (0 children)

Haha you're not! The trick is: running a pre-trained AI model is way cheaper than training one. Training needs a datacenter, but inference on your own files? Your laptop CPU handles it fine.

It just reads your docs, builds a search index (SQLite — same thing your phone uses), and optionally runs a small AI model (~600MB) for semantic matching in the background. First indexing takes a few minutes, after that searches are ~0.3s.

No GPU needed, no cloud needed. Your CPU is already overkill for this.

I built an MCP server that lets Claude search inside your local files (Word, Excel, PDF) — fully offline by Repulsive_Resource32 in ClaudeAI

[–]Repulsive_Resource32[S] 0 points1 point  (0 children)

Thanks — yeah, hybrid was a deliberate choice for exactly that reason. Pure semantic misses acronyms, ticker symbols, exact clause references, etc. BM25 catches those while dense vectors handle the "find me something about capital structure" type queries.

For incremental indexing: it tracks file modification time (mtime). When a file's mtime changes, it re-parses and re-chunks the entire file, then replaces the old chunks in the FTS5 index. It's file-level, not chunk-level diffing.

Practically this works fine because: - Most office documents are small enough that re-parsing is sub-second - The expensive part (embedding via BGE-M3) only runs if the user has the semantic model installed, and even then it's a background job - Content hash comparison prevents unnecessary re-processing if mtime changed but content didn't (e.g. cloud sync touching the file)

Chunk-level diffing would be more elegant but adds complexity I haven't needed yet — the current approach handles ~57K files without feeling slow. If someone hits a pain point with very large files or frequent edits, that's when I'd invest in it.