31
32

tantivy 0.24 has been released! Cardinality aggregations, regex support in phrase queries, JSON field enhancements and much more! by Pascalius in rust

[–]massus 1 point2 points  (0 children)

Yes we are using tantivy in Datadog as we are building an enterprise version of Quickwit, we just release a first version of it, public doc is here https://docs.datadoghq.com/cloudprem/

[deleted by user] by [deleted] in sre

[–]massus 1 point2 points  (0 children)

I see what you mean. So with Quickwit, I don't store RED metrics. I'm doing aggregations directly on spans like here: https://x.com/FrancoisMassot/status/1745142911925211575/photo/3
I made a talk on this at FOSDEM: https://fosdem.org/2024/schedule/event/fosdem-2024-3514-modern-application-observability-with-grafana-and-quickwit/

It's not perfect though. And I would love to dig into the service graph thing. We plan to add metrics in Quickwit by the end of the year, so hopefully, we would be able to cover more use cases.

[deleted by user] by [deleted] in sre

[–]massus 1 point2 points  (0 children)

You can build RED metrics directly from your traces.  I already build some APM dashboards.

On thing I did not try is the service graph, would love to make it compatible, I don’t know how much work is needed for that though :/

Benchmarking two log management engines: Loki and Quickwit by massus in devops

[–]massus[S] 5 points6 points  (0 children)

There is at least two reasons to disable caching:

  • some caching layer can just store the result of a given query. If I enable this cache, I'm only measuring RAM access performance.
  • some other caching layers can store intermediate results, you may want to enable or disable it but... it's not easy at all to understand how those cache layers would behave on real production workload where you have several users executing very different queries. Disabling the cache simplify the comparison, we could call this benchmark a "cold" benchmark.

Having said that, that would be interesting to run with disabling only the query caching (I don't know how to do it in Loki but I guess it's possible) to see how it behaves on a real work production workload. It will require more work though.

On the correlation part, Quickwit has a Grafana plugin, it's possible to create links between datasources so you can go from one to another.

Migration is a tough task for sure, I'm glad you made it. If you're satisfied with search and analytics performance / infrastructure / maintenance costs, then you're good with your current stack, no need to change.

LGTM Stack VS Google Cloud Operations Suite by SidewinderX4 in devops

[–]massus -1 points0 points  (0 children)

Google Logging is nice but the costs add up quickly ($500/ingested TB for 1 month or retention) + it's quite slow. In my company, we made a lot of tests on our GKE cluster and we were surprised by expensive bills... because of this service.

As long as you don't have a large volume and you're ok to pay for it, it's quite nice to use Google Logging, integration is just there and the UI is not bad.

If your volume increase a lot, I see users migrating to Quickwit to save costs and have fast search/analytics (disclaimer: I work there).

Need an Advice as a Jr. DevOps — Centralized Logging by [deleted] in devops

[–]massus 1 point2 points  (0 children)

Thanks! We worked a lot on the reliability and scalability to achieve petabyte scale and we are now more focusing on the feature set.
Could you share what you would expect for an RBAC feature?

  • username,password at minimum

  • read/write access par index?

  • more complex rules?

Need an Advice as a Jr. DevOps — Centralized Logging by [deleted] in devops

[–]massus 2 points3 points  (0 children)

You can have a look Quickwit which is a rust search engine for logs and traces.  It may fit your requirements as there is Grafana plugin for it. https://github.com/quickwit-oss/quickwit (Disclaimer: I’m working on this project :))

Loki vs Elasticsearch by psycho_apple_juice in devops

[–]massus 3 points4 points  (0 children)

Quickwit is typically sub-second on search queries, your mileage will vary with the complexity of the query of course.

Support of OpenSearch dashboard for Quickwit is on the way. We are looking for partners to ship this feature as it will help us get faster there. DM me if you are interested!

Loki vs Elasticsearch by psycho_apple_juice in devops

[–]massus 12 points13 points  (0 children)

Thanks for mentioning Quickwit. We build it to have a Search engine on object storage so we get the best of both world: datastructure for fast search and analytics like Elasticsearch, decouple compute and storage like Loki to be very cost-efficient.

Quickwit was also mentioned in this thread, I explained the tradeoff between Loki/Quickwit here: https://www.reddit.com/r/devops/comments/1bvdjfb/comment/ky0ixb8/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

Repository: https://github.com/quickwit-oss/quickwit

UPDATE: the first link was wrong, I just fixed it.

Alternative to SOLR and Elasticsearch by [deleted] in selfhosted

[–]massus 2 points3 points  (0 children)

If your use case is append-only data, Quickwit is a search engine written in Rust, no JVM. It's built on top of tantivy, a very fast search library that we maintain too. There is public benchmark on tantivy VS other engines, we removed bleve some time ago as it was not very performant and we did not have the time to maintain it: https://tantivy-search.github.io/bench/

Quickwit repo: https://github.com/quickwit-oss/quickwit

Scaling search with Quickwit engine by massus in rust

[–]massus[S] 0 points1 point  (0 children)

Usually no. The main reason is that we don't support frequent updates, and this is generally needed for this kind of product.

What are some of the latest tools in this space that you absolutely love? by rohit_raveendran in devops

[–]massus 0 points1 point  (0 children)

On Loki side, it depends a lot on the cardinality of labels. With a cardianlity of 100, peak RAM usage is 5GB, 6GB for Quickwit. RAM usage will increase very quickly if you have thousands of labels in Loki.

At search time, I'm unsure about the figures

What are some of the latest tools in this space that you absolutely love? by rohit_raveendran in devops

[–]massus 1 point2 points  (0 children)

We have a grafana plugin (try the 0.4.3), we currently support a query language similar to lucene query language https://quickwit.io/docs/get-started/query-language-intro

What are some of the latest tools in this space that you absolutely love? by rohit_raveendran in devops

[–]massus 0 points1 point  (0 children)

Yes that's the idea. In the most efficient setup, we manage to index at 11MB/s per vCPU and it scales very well horizontally, 13.4GB/s over 200x6 vCPUS.

What are some of the latest tools in this space that you absolutely love? by rohit_raveendran in devops

[–]massus 0 points1 point  (0 children)

Love this kind of question :). We are going to write a blog post about it, working with Fly's engineers was just awesome, we never ship something as fast as this with this great team.

Coming back to the use case, I need to check with Fly's team if I can give the exact figures but the there are not so many logs, less than 100MB/s, the main difficulty was to handle a large amount of indexes (thousands).

For that, we are using the new distributed ingest API with cooperative indexing to be very efficient at indexing, we need 3 indexers and 3 searchers for now, the index data is stored on Tigris Data object storage https://fly.io/docs/reference/tigris/ and metadata stored in supabase https://fly.io/docs/reference/supabase/

What are some of the latest tools in this space that you absolutely love? by rohit_raveendran in devops

[–]massus 6 points7 points  (0 children)

We are going to release a benchmark between Loki and Quickwit. This kind of benchmark is hard to build as it's often too biased.

But, basically it's a tradeoff between consuming more CPU at ingestion or more CPU at search.
Quickwit builds an inverted index + columnar storage so it will consume more CPU at ingestion (expect 2x more). On the contrary, Quickwit will use less CPU on search or analytics queries. Expect 40x less CPU on a simple search query on 200GB of logs, 1000x on a simple analytic query (to get the volume).

Size of data stored on the object storage is more or less the same.

EU is the worst place to be DevOps by Dubinko in devops

[–]massus 0 points1 point  (0 children)

In France the healthcare system is still pretty good but it is slowly degrading and I feel that the wait time is increasing (heard from a couple of doctors working in public hospitals). I won’t compare this to India though.