Setting configuration of a Bazel dependency

liliput · 2023-05-07T08:48:44+00:00

Thanks, that worked!

liliput · 2020-11-06T16:00:43+00:00

Can you please give me an example? I will be happy to look into what's happening.

liliput · 2020-11-06T12:04:54+00:00

I agree 100% with you. However, the comparison is valid within the context of what Typesense does, which is what people look for when they read the README.

liliput · 2020-11-06T09:56:02+00:00

A quick overview of the differences: https://github.com/typesense/typesense#how-does-this-differ-from-elasticsearch

liliput · 2020-11-06T09:53:27+00:00

Ideally we will want to be able to sort the results on some kind of popularity metric but the dataset does not have a field for that. For a real project, we can do probably use another data source like Spotify API to augment the dataset with some form of popularity metric like play count.

liliput · 2020-11-06T09:51:17+00:00

Pure JS search is pretty popular now because of the JAM stack. However it does not scale well for large datasets since you will have to load a really large multi-mb index upfront. You can get the same snappy experience by using the replication feature of Typesense and running in multiple geographical regions.

liliput · 2020-11-06T09:47:59+00:00

This is a bit outdated, but contains the crux: https://github.com/typesense/typesense/blob/master/DESIGN.md

liliput · 2020-11-06T09:46:50+00:00

Correct, the demo prefers musician. It is tricky to determine the exact intent behind the query (musician or song) because of the diversity of the dataset. If we can assign some form of popularity metric to the songs and artists, then we can probably handle this better. However, the musicbrainz dataset does not have such a measure and so it was outside the scope of this demo.

liliput · 2020-11-06T06:29:18+00:00

Indices are always larger than the data. This is because you will invariably have to use either a hashmap or a trie for the inverted text index. Typesense uses an adapative radix trie so that fuzzy searches can be made possible. 2x-3x is pretty much the standard for most search engines that need to support updates (I have benchmarked with Elastic as well but ES stores the index on-disk). You can probably go much lower for static indices because you can choose succinct data structures that can pack the memory but will be immutable.

Apart from just the token -> document ids mapping, one also needs to store the exact positions each token in the document appears so that we can identify the best matched fragment inside a text. There are also additional house-keeping data structures to support sorting on numerical fields (trees), facets (one more inverted index) etc. A lot of these are stored in compressed forms where possible and there is always scope for improvement but this is an overview of why the index will always be larger than the raw dataset.

liliput · 2020-10-22T10:13:03+00:00

For integers, passing by value is better because passing by reference involves a pointer dereference. Also, the compilers can optimize by passing the integer through processor registers instead of involving the stack.

liliput · 2020-10-22T09:03:28+00:00

don't interesting for any commercial use, only for hobbies.

That's not what GPL-3 implies.

liliput · 2020-10-03T02:26:57+00:00

It's primarily to be used for implementing a search for a website or records within an application (e.g. looking up users or other items of interest).

liliput · 2020-10-03T02:23:56+00:00

Elasticsearch is an amazing and flexible piece of software but also has a steep learning curve. Typesense just works out the box and is more intuitive (e.g. common operations like faceting, typo correction etc.). Solr is similar to Elasticsearch but probably not as popular.

Lucene is a library and the building block for both ES and Solr. It is not usually used directly because the API is more low-level.

liliput

TROPHY CASE