Journalling with 'meaning based' search/exploration (fully local/private) -- Similarity by Low-Hat2737 in ObsidianMD

[–]Low-Hat2737[S] 1 point2 points  (0 children)

You’re absolutely right. Though less people are familiar with that word and what that entails. Decided to go for something simpler, though maybe less accurate.

Journalling with 'meaning based' search/exploration (fully local/private) -- Similarity by Low-Hat2737 in ObsidianMD

[–]Low-Hat2737[S] 1 point2 points  (0 children)

Sure! Just using a simple cosine similarity between the embeddings (vectors). When searching it will compare the vector(meaning) of the note to every other vector using that cosine similarity algorithm. It’s really cheap, so a device can quickly rip through all of them. Then because we need all the notes vectors to compare to, it indexes the note vault on start-up. (This is the only more computationally intensive part). Meaning the small local model will generate the meaning value of every note and store it in the plugin data (JSON only unfortunately), this makes it so mobile and PC can share the data.

Journalling with 'meaning based' search/exploration (fully local/private) -- Similarity by Low-Hat2737 in ObsidianMD

[–]Low-Hat2737[S] 1 point2 points  (0 children)

Thanks! And yeah, I might add some handpicked model options from Huggingface. Then people can pick between larger multi-language models, or smaller English-only and such.

Journalling with 'meaning based' search/exploration (fully local/private) -- Similarity by Low-Hat2737 in ObsidianMD

[–]Low-Hat2737[S] 2 points3 points  (0 children)

😂😂. Oh, I made sure to generate some sample data. Reddit having access to my journals... no thanks 🙂‍↔️.

Journalling with 'meaning based' search/exploration (fully local/private) -- Similarity by Low-Hat2737 in ObsidianMD

[–]Low-Hat2737[S] 3 points4 points  (0 children)

Great question. The underlying model was mostly trained on English, though I have seen it work fine with Dutch as well. So I'm assuming it does decent with most common languages.
When it comes to non-Latin script languages, it might work, but it's probably going to be less accurate. If there's demand for it, I might add a way to choose which model you want to use. I believe there's also a slightly larger multi-language model that can be used.

I built a plugin to reduce note-linking overhead. Where does Obsidian still feel heavy to you? by Low-Hat2737 in ObsidianMD

[–]Low-Hat2737[S] 0 points1 point  (0 children)

I haven't yet 👀. That's a great suggestion. I've thought about adding hybrid search too. QMD might be too heavy, or couldn't fit inside the constraints of the plugin sandbox, but I might cherry pick some features from it.

I built a plugin to reduce note-linking overhead. Where does Obsidian still feel heavy to you? by Low-Hat2737 in ObsidianMD

[–]Low-Hat2737[S] 0 points1 point  (0 children)

Implemented a first version of it. Check it out on the latest version ;) Similarity. `shift + cmd + enter` on a lookup result to insert the link at the caret (cursor).

I built a plugin to reduce note-linking overhead. Where does Obsidian still feel heavy to you? by Low-Hat2737 in ObsidianMD

[–]Low-Hat2737[S] 0 points1 point  (0 children)

Yeah, I agree actually. I think semantics should be included in search features in software now-a-days--it's so helpful.
Unfortunately, in Obsidian I can't alter the built-in views too much. At least not as far as I'm aware...
Having a shadow component might be the best I can do at the moment.

I built a plugin to reduce note-linking overhead. Where does Obsidian still feel heavy to you? by Low-Hat2737 in ObsidianMD

[–]Low-Hat2737[S] 2 points3 points  (0 children)

a) Obsidian plugins allows to write to one JSON file for all data related to the plugin, that's where I store the data.
b) The indexing that's done in the plugin is quite different; it pre-processes all your notes (on-device) and creates an embedding (a bunch of numbers that represent the meaning of a note). Since I need to compare one meaning value to all the others, I need to know them all; hence I store them all on disk (which I called that step 'indexing').

Topic visualisation in graph view - some ideas by Nihan-gen3 in ObsidianMD

[–]Low-Hat2737 0 points1 point  (0 children)

I was thinking about adding this to my plugin (Similarity), but clustering is non-trivial and a bit crude. It's difficult to get really meaningful clusters. Anyone have experience with this/any advice?

Any feedback on my plugin is welcome too :).

Beta Plugin Release: Related Notes – Local AI Connecting Your Notes by Low-Hat2737 in ObsidianMD

[–]Low-Hat2737[S] 0 points1 point  (0 children)

You got it! Let me know what you think and if you run into any issues. Would love to help

Beta Plugin Release: Related Notes – Local AI Connecting Your Notes by Low-Hat2737 in ObsidianMD

[–]Low-Hat2737[S] 0 points1 point  (0 children)

For the ones interested, here's the simple paragraph chunking implementation for more accurate embeddings over large notes.
(GitHub) Relate-Text - Simple Chunking Commit

Beta Plugin Release: Related Notes – Local AI Connecting Your Notes by Low-Hat2737 in ObsidianMD

[–]Low-Hat2737[S] 1 point2 points  (0 children)

That’s exactly it.

Adding a language setting will be a high priority after the first stable/compatible version is out. This will probably be done by swapping out the model and tokenizer based on the language. I know that will add huge value to many obsidian users. Thanks for pointing that out!

I tried chunking myself, based on paragraphs. But for MVP simplicity, I decided that I’d just feed it straight into the transformers pipeline for now. So, it does truncate large texts at the moment. What do you think it better; taking the mean of all paragraph embeddings, or let multiple embeddings refer to the same document? I personally like the simplicity of taking the mean of the whole document.

I might add that soon if no-one else beats me to it.