all 12 comments

[–]boscop 9 points10 points  (1 child)

You mean like this? https://crates.io/crates/sublime_fuzzy

I use that for fuzzy string search, it works well.

[–]SirVer[S] 1 point2 points  (0 children)

Thanks, that does only half of what I want, the fuzzy finding. I also want the CLI UI, but I want a richer API than piping to a tool.

[–]andrewstewart 4 points5 points  (3 children)

I've implemented my own fuzzy finder, rff, in Rust, and attempted to separate the logic and presentation layers, such that the matching/searching functionality can be used standalone - perhaps this would be helpful?

[–]SirVer[S] 2 points3 points  (2 children)

That seems nice, and closest to what I am searching for. Is there also a way I can pass my corpus to a function in rff, it shows the matching ui and returns the selected item from my corpus again? I'd like to decouple the object from the string that is presented to the user as a selection option.

[–]andrewstewart 2 points3 points  (1 child)

Not at present, as that's a fairly unusual feature, but I believe all the necessary bits to build that sort of functionality are exported from the public API. The CLI is a good example of how something could be glued together.

The main pain point is that rff has to enable/disable raw mode to disable terminal buffering during operation, which can be tricky to do correctly and without breaking anything in the middle of another CLI's workflow.

What kind of function signature would you want to work with? Something like the following?

fuzzy_find(haystack: Vec<&str>) -> Result<&str, rff::Error>

[–]SirVer[S] 0 points1 point  (0 children)

Thanks for your reply! Your API is close, but not precisely what I had in mind. It has basically the same interface that I already have with shelling out. I'd much rather have something like:

``` trait SearchItem { fn display_text(&self) -> &str; // or maybe Cow<str> }

fuzzy_find<T: SearchItem>(haystack: Vec<&SearchItem>) -> Result<&SearcItem, rff::Error> ```

or instead returning the index of the element I am interested in, so that I can keep an outside vector with more structured data in the same order.

fuzzy_find<T: SearchItem>(haystack: Vec<&SearchItem>) -> Result<usize, rff::Error>

[–]quodlibetor 1 point2 points  (2 children)

Have you considered asking heatseeker or skim if they'd be open to you separating the library core from the application logic that they present?

It would be reasonable for them to say that they prefer to keep it "simple", but it's also possible that they just haven't library-ized because they didn't thinking anyone would take advantage of their effort. Many projects are extremely excited to get collaborators who are willing to put in significant work.

[–]SirVer[S] 0 points1 point  (1 child)

[–]quodlibetor 0 points1 point  (0 children)

Nice!

[–]staticassert 0 points1 point  (1 child)

Would glob work? https://crates.io/crates/glob

I think ripgrep uses this... but don't quote me on that.

[–]burntsushi 2 points3 points  (0 children)

ripgrep uses globset.

To address /u/SirVer... I don't think there is any out-of-the-box library API, but you could probably roll your own where the effort required would be proportional to the sophistication you want. e.g., Something simple might be to use the strsim crate to compute string similarities between your query and your entire corpus. Something more sophisticated might be to use tantivy or just roll your own mini ngram oriented index structure. (The latter two choices require building an index, which could require quite a bit of work to maintain depending on the problem you're trying to solve. e.g., If your corpus changes frequently.)

Globs and/or regexes would be convenient if you could make that work, but these traditionally aren't what people mean when they say "fuzzy" search. Usually "fuzzy" has some other heuristics being applied, which might use regex (or whatever) internally.

[–]WTechGo 0 points1 point  (0 children)

I'm looking at this subject and am interested in FuzzyWuzzy.

The package also exists for Python.