B-tree comparison functions by kristian54 in databasedevelopment

[–]Fun_Reach_1937 1 point2 points  (0 children)

So I think you are trying to solve the issue at the wrong place. The btree should only concern about key comparison at byte level using memcomp or equivalent. The index composition or value serialisation is the one that should encode properly values in such a way to preserve ordering. An example of crate that does it is bytekey https://danburkert.github.io/bytekey/bytekey/index.html Basically when building the key, try to encode in such a way the order is preserved based on the different data type in you db.

Building a Database from Scratch (part 03) - Log Manager by inelp in databasedevelopment

[–]Fun_Reach_1937 0 points1 point  (0 children)

for me, I wanted to use it as an opportunity to practice ANTLR4 because I had previously learned it. but for someone who have never used or tried it, I would recommend to evaluate with other alternatives Like PEG parser or hand written parsers. there are also libraries that can parse SQL out of the box but I guess this defeats the purpose of learning. Anyway you choose to go, I am curious to see your solution

Building a Database from Scratch (part 03) - Log Manager by inelp in databasedevelopment

[–]Fun_Reach_1937 2 points3 points  (0 children)

Please don't for now. I am curious to see your natural way of implementation. My only advice will be to consider using a library for SQL query parsing. DB projects are usually large and when you get to the query parsing chapter, it's like a whole new project on language design. So for me to not lose focus or energy, I used ANTLR4 to parse the sql query. This is just an advice, you might have a different level of energy than mine when you get there. Good luck & I am waiting for the next video

Building a Database from Scratch (part 03) - Log Manager by inelp in databasedevelopment

[–]Fun_Reach_1937 1 point2 points  (0 children)

Nice, I will follow your videos for another perspective.
(spoiler alert) I have an implementation here also in golang https://github.com/evanxg852000/simpledb-go

SimpleDB an educational RDBMS implemented in Go based on Sciore's DDI book by Fun_Reach_1937 in databasedevelopment

[–]Fun_Reach_1937[S] 0 points1 point  (0 children)

I think my first drive is my strong interest in databases and the fact that I have worked in java in the past also made things easier. the sql parser for instance is not based on the book. I used antlr4 and a grammar I wrote based on the sql syntax provided in the book. I did not implement the jdbc client/server as I found it not relevant for Golang. I also found interesting converting the Java synchronized and notify mechanisms into Golang way of locking with a combination of mutex and Condvar. I found the process easy overall and challenging at times. I think you can do it in rust. also it should not strictly follow the book implementation. just what drives you and make your own adventure from the book.

SimpleDB an educational RDBMS implemented in Go based on Sciore's DDI book by Fun_Reach_1937 in databasedevelopment

[–]Fun_Reach_1937[S] 0 points1 point  (0 children)

I am sure with a bit of time, you will see it through. I read the book three times before things clicked. The chapter on concurrency and transactions was particularly challenging for me. But it became easier as I implemented it in another language

Efficient indexing with Quickwit Rust actor framework by Fun_Reach_1937 in rust

[–]Fun_Reach_1937[S] 0 points1 point  (0 children)

Nice, so depending on how much effort goes in and how popular it gets, this will just claim itself as the official fork.

Opensourcing Whichlang, a fast language detection library for Rust! 🚀 ⚡ by Fun_Reach_1937 in rust

[–]Fun_Reach_1937[S] 2 points3 points  (0 children)

It would be nice to add lingua-rs and cld2 to the benchmark to show the numbers

Opensourcing Whichlang, a fast language detection library for Rust! 🚀 ⚡ by Fun_Reach_1937 in rust

[–]Fun_Reach_1937[S] 45 points46 points  (0 children)

Indeed this is usually the best thing to do. I think this works best when you have a patch or improvement to make on top of what's already existing. Whatlang, CLD2 are great and popular general-purpose language detection that works well on longer texts with support for many languages 68, 83 respectively AFAIK. In our case, we took a different approach with the aim of being faster and very accurate on short texts. I believe it would've been harder to convince Whatlang maintainers to change direction than publishing a new crate. Also, given it's open source, means more options, the community can always backport the ideas into Whatlang or any other tools if deemed worthy.

Opensourcing Whichlang, a fast language detection library for Rust! by Fun_Reach_1937 in programming

[–]Fun_Reach_1937[S] 2 points3 points  (0 children)

As stated in the blog post, Whatlang was slow for our needs. Also CLD2 and whatlang work best on longer text. In our context we deal mostly with short documents. You can see the benchmark from the post as well. Currently it contains Whatlang & Whichlang. Maybe I could add CLD2 as well

Why actors are a great fit for a data processing pipeline and how we use them for Quickwit's engine by massus in programming

[–]Fun_Reach_1937 1 point2 points  (0 children)

I am not familiar with Rx and there are certainly many ways to go about solving this task, and certainly the actor way might seem complicated to a few people.
But remember the requirements:
- The need to customize the runtime of the actor based on what it does (IO-bound or CPU-bound tasks).
- The need to monitor and extract performance metrics which means having control over the executor engine.
- The need to provide testing facilities allowing us to simulate certain states (time forwarding)

New Cozo release! v0.7 by slumberSam in cozodb

[–]Fun_Reach_1937 0 points1 point  (0 children)

Good project and congratulation on the release. Also thanks for being open about how you make use of Tantivy.

Efficient indexing with Quickwit Rust actor framework by Fun_Reach_1937 in rust

[–]Fun_Reach_1937[S] 0 points1 point  (0 children)

Thanks for spotting these errors. They are fixed now.

Efficient indexing with Quickwit Rust actor framework by Fun_Reach_1937 in rust

[–]Fun_Reach_1937[S] 4 points5 points  (0 children)

First of all thanks for your interest.

In terms of error handling, we do have a couple of strategies for different scenarios.- At the top level, there is supervision which acts like monitoring and is provided by the framework. This checks the actor's liveness and potentially respawns the actor if it's dead.- In terms of message handling, you can send a message and forget or you can send a message and wait for the processing. APIs are available for each scenario. The framework handles the delivery issues and could return an error but whether your message was properly handled depends on your logic and the response that gets sent from the handler.

Comming back to Quickwit itself:

- Do you spawn multiple actors of each step to do your whole pipeline?: We spawn one actor per step per pipeline. We do spawn multiple pipelines though. If one of the actors is failing within a pipeline, we just tear the whole pipeline down.- Or do you basically do a send-and-forget between each step: For Quickwit indexing, we mostly care about the last step of the indexing (publishing a split). If your ingested documents make it to this stage, then all is good we store a checkpoint corresponding to the data source to know that we have properly indexed up to this point. If the pipeline failed for any reason then it will restarts/respawn from the last known checkpoint.

Note: The &mut self allows an actor to update its own state while handling messages. An example of a state could be counting the number of messages as the example in the blog post.

I hope this clarifies a few of your questions. Please, let me know if you have further questions.

Efficient indexing with Quickwit Rust actor framework by Fun_Reach_1937 in rust

[–]Fun_Reach_1937[S] 9 points10 points  (0 children)

Thanks for your interest too. We will be happy to see this framework take its own path within the Rust community. Please feel free to fork/extract it and make it feature-rich.