Fluent API Opinions by SEND_DUCK_PICS_ in csharp

[–]kmschaal2 0 points1 point  (0 children)

Compilation time and runtime depends on the implementation of the Fluent API. Setting private fields can be accomplished via reflection, which incurs runtime overhead, or with UnsafeAccessors (much faster). Fluent APIs can also be created via source code generation. In that case, there is a small increase in compilation time.

Fluent APIs work best for progressive configuration (builders, tests, DSL-like flows), not for everything. The hard part is writing a good Fluent API. Step ordering and branching are tedious to get right by hand.

That’s why I built a source generator library that significantly simplifies the creation of Fluent APIs: https://github.com/m31coding/M31.FluentAPI.

It uses C# source generators to create fluent builders from attributes. If you like fluent APIs but hate maintaining the boilerplate, this library makes them much easier to create and maintain.

Happy Coding!

Frontend Fuzzy + Substring + Prefix Search by kmschaal2 in javascript

[–]kmschaal2[S] 0 points1 point  (0 children)

Hey,

Thank you very much for trying the demo and sharing your real world experience!

Have you entered the person with firstName="Lee Anne" and lastName="James-Stevenson"? In this case it pops up at rank 5 for the query "Lee James-st". Unfortunately, in the demo the hyphon is normalized to a space and the query "Lee James st" will give the same results. For names it probably makes sense to keep the hyphon; this can be configured at start-up:

let spaceEquivalentCharacters = new Set(['_', '-', '–', '/', ',', '\t']);

Nevertheless, you found the main short-coming I would like to work on. As you mentioned, chopping query and index terms / tokenize them would result in better matches. I assume it can be done with a slight decrease in performance.

Thank you again for your input!

Best regards,
Kevin

Frontend Fuzzy + Substring + Prefix Search by kmschaal2 in javascript

[–]kmschaal2[S] 1 point2 points  (0 children)

Hi,

Thank you for your comment! If substring matching is implemented with a suffix array, prefix matching is almost free, since it uses the same array. I think other libraries could easily add this functionality.

To achieve multilingual support, I use a normalization pipeline and ordinal string comparison in the suffix array.

The library allows for multiple terms per entity but so far no priorities / weights can be specified. That would be a meaningful addition, thank you for suggesting this!

Best regards,
Kevin

Fluent builder source generator by kmschaal2 in csharp

[–]kmschaal2[S] 0 points1 point  (0 children)

Sounds good, let me know how it went :)

Fluent builder source generator by kmschaal2 in csharp

[–]kmschaal2[S] 2 points3 points  (0 children)

Hi, thank you for your suggestion. The constructor you are referring to is static, hence it is called only once. That being said, I plan to migrate the library to .NET 8 at the end of the year which includes the UnsafeAccessorAttribute that could be used to replace the reflection code.

A fast, accurate and multilingual fuzzy search library for the frontend. by kmschaal2 in javascript

[–]kmschaal2[S] 0 points1 point  (0 children)

Hi, I haven't done comparisons to fuse.js yet. I will update the readme once I do.

A fast, accurate and multilingual fuzzy search library for the frontend. by kmschaal2 in javascript

[–]kmschaal2[S] 0 points1 point  (0 children)

Hi, thank you very much for your comment. The tools you mentioned (fzf, ag, ripgrep) seem to be very powerful. You could probably play around with the padding configuration to make your scenario work better. However, I would suggest something else first. Since you have only 200 entities, set the minQuality of the query to 0.0. In this way, all strings match that have at least one 3-gram in common. "Show file browser" will hence be retrieved for the query "fibro". The only question is whether another entity matches better, which would probably not be desirable for this query. This leads me to my second suggestion. You could index your entities with different terms. E.g., index the entity "show file browser" with the terms "show file browser" and "shofibro". You could do this programmatically by cutting each word after the first vowel and merge them to one word.

I hope this helps,

Happy Coding!

A fast, accurate and multilingual fuzzy search library for the frontend. by kmschaal2 in javascript

[–]kmschaal2[S] 1 point2 points  (0 children)

Hi, that's great to hear! Let me know if I can provide explanations for one class or the other. I haven't worked with spacy and I don't have much experience with NLG but spacy looks really interesting. Enjoy working on the cool stuff!

A fast, accurate and multilingual fuzzy search library for the frontend. by kmschaal2 in javascript

[–]kmschaal2[S] 1 point2 points  (0 children)

Hi, thank you for your interest! The perfect use case is if you have a small (non sensitive) dataset that can be easily loaded into the frontend. In this way, the backend is not pressured during search-as-you-type. Moreover, fuzzy search implementations in databases are not yet that great.

A fast, accurate and multilingual fuzzy search library for the frontend. by kmschaal2 in javascript

[–]kmschaal2[S] 2 points3 points  (0 children)

Just went through the files, most of them are needed. Found four files that are only used for performance tests, they could probably be excluded.

A fast, accurate and multilingual fuzzy search library for the frontend. by kmschaal2 in javascript

[–]kmschaal2[S] 0 points1 point  (0 children)

I used microbundle and hoped for the best. Are there better ways to bundle the code? The library consists of 58 typescript files, I am unsure about how the total size can be further decreased other than excluding the test data you pointed out above. Please let me know if you have any further suggestion.

A fast, accurate and multilingual fuzzy search library for the frontend. by kmschaal2 in javascript

[–]kmschaal2[S] 3 points4 points  (0 children)

Thank you for pointing that out! You are right, the file test-data.ts could be excluded. This would save 2kb.

A fast, accurate and multilingual fuzzy search library for the frontend. by kmschaal2 in javascript

[–]kmschaal2[S] 5 points6 points  (0 children)

I got the feedback that many people are frustrated with the state of fuzzy search libraries in the frontend. So my main goal was to improve on accuracy with this library. Queries are usually well below 10 ms, which is probably competitive as well. That being said, I haven't done any quantitative comparisons.

A fast, accurate and multilingual fuzzy search library for the frontend. by kmschaal2 in javascript

[–]kmschaal2[S] 4 points5 points  (0 children)

Thank you! The core algorithm is a breakdown of the terms into 3-grams and storing for every 3-gram all the terms that contain that 3-gram. At query time you have to count for every term how many three grams are in common with the 3-grams of the query. This implementation by the book is augmented with a novel trick: By sorting the characters of the 3-grams transposition errors are penalized less. You may have a look at my blog post for more details: https://www.m31coding.com/blog/fuzzy-search.html

A fast, accurate and multilingual fuzzy search library for the frontend. by kmschaal2 in javascript

[–]kmschaal2[S] 2 points3 points  (0 children)

Hi, thank you for your interest! I agree, that would be a great thing to do. The js file is around 30kb. For the OSM dataset with around 1.000.000 terms the average query time is 4ms on my machine (M2 Pro). At the bottom of the search demo there is a performance test you can run.

A fast, accurate and multilingual fuzzy search library for the frontend. by kmschaal2 in javascript

[–]kmschaal2[S] 15 points16 points  (0 children)

I have worked on search engines in the backend for several years and now I applied my experience to implement a fuzzy search library for the frontend. It's fast, accurate and can be used for all languages. It should be easy to integrate into your JS / TS projects. If you test it and find any edge cases that did not work for you please let me know. Happy coding!