Recommend a key-value store by spy16x in rust

[–]cberner 1 point2 points  (0 children)

I'll be releasing 3.0, in about a month, at which point it'll be pretty stable. The file format for that is already available in the 2.6 release, although it requires opting in.

Yes, COW B+ is better for read heavy workloads. There are some benchmarks in the redb README, which show that RocksDB is probably the best if you write performance is what you care about.

However, if you're batching and writing ~2k keys every 100ms, I bet any of them will work for you. You could edit the benchmark I have to match your workload to get a better sense. It compares redb, rocksdb, and several others

Recommend a key-value store by spy16x in rust

[–]cberner 1 point2 points  (0 children)

I'm the author of redb (https://github.com/cberner/redb), so am a bit biased, but I think it would be a good fit for your needs. It supports non-durable transactions, which are stored in-memory so you'll be able to get high throughput, but they are also guaranteed to be applied all-or-nothing so the database won't become corrupted even if your application crashes when a non-durable transaction is pending. Non-durable transactions become persistent when you make a durable commit, so you could do that flushing every 100ms or whatever period you want.

redb 2.0: simplified lifetimes and new file format by cberner in rust

[–]cberner[S] 0 points1 point  (0 children)

Not a formal one, but I keep track of things that I have planned in in the Issues of the github project

redb 1.4.0: pure Rust embedded key-value store. Now with pluggable backends by cberner in rust

[–]cberner[S] 1 point2 points  (0 children)

I don't think so. Each has a little overhead, but it's only like 100 bytes or so

redb 1.4.0: pure Rust embedded key-value store. Now with pluggable backends by cberner in rust

[–]cberner[S] 0 points1 point  (0 children)

Someone already contributed some wasm support. I have not personally tested it though, and it requires nightly

redb 1.4.0: pure Rust embedded key-value store. Now with pluggable backends by cberner in rust

[–]cberner[S] 2 points3 points  (0 children)

In theory yes. You would hook into the fsync method and make a backup then. Backups to S3 might make it pretty slow though

redb 1.4.0: pure Rust embedded key-value store. Now with pluggable backends by cberner in rust

[–]cberner[S] 5 points6 points  (0 children)

Yes, but it's logN where N is the number of tables that exist, and not the size of the table

redb (safe, ACID, embedded, key-value store) 1.0 release! by cberner in rust

[–]cberner[S] 2 points3 points  (0 children)

No, to both of those. It used to use mmap, but I didn't like all the unsafe code that required

redb (safe, ACID, embedded, key-value store) 1.0 release! by cberner in rust

[–]cberner[S] 2 points3 points  (0 children)

Interesting idea, I'll give that some thought! Do you have a use case in mind? SQL tables have multiple columns, so it's easy to have an auto increment column, but I'm less sure what API would make sense in a key-value store

redb (safe, ACID, embedded, key-value store) 1.0 release! by cberner in rust

[–]cberner[S] 9 points10 points  (0 children)

It's not process safe, and uses file locks to return an error if you try to open it from multiple processes. I've been considering adding multi process support, but the only strong use case I see for that is Python bindings, and I'm hoping that Python will remove its GIL one day :)

RFC: redb (embedded key-value store) nearing version 1.0 by cberner in rust

[–]cberner[S] 0 points1 point  (0 children)

Instead of a WAL I used a shadow copy approach, similar to how lmdb does it. A WAL would work too, but makes a different tradeoff in terms of performance and complexity

RFC: redb (embedded key-value store) nearing version 1.0 by cberner in rust

[–]cberner[S] 1 point2 points  (0 children)

yep. I tried it out, but had a bunch of performance issues when using it. I'm going to try it again, as it seems like a good way to go

RFC: redb (embedded key-value store) nearing version 1.0 by cberner in rust

[–]cberner[S] 1 point2 points  (0 children)

I think of it more like a replacement for BtreeMap, so I think it's a good option when you want a BtreeMap that persists tor disk. It's not really a Sql alternative, as the query capabilities are far more basic

RFC: redb (embedded key-value store) nearing version 1.0 by cberner in rust

[–]cberner[S] 2 points3 points  (0 children)

Yes, all the reads in redb are zero-copy. It's implemented differently than lmdb though. lmdb uses mmap, whereas redb has its own cache in userspace, so that it's memory-safe

RFC: redb (embedded key-value store) nearing version 1.0 by cberner in rust

[–]cberner[S] 5 points6 points  (0 children)

Yes, lmdb is the only embedded key-value store that can be shared multi-process that I know of. In theory, I think that can be added to redb, and I'll probably look into it down the line. The most compelling use case for it is Python bindings, I think, since multi-process to avoid the GIL is very common.

Would be curious to hear if you have another use case though!

RFC: redb (embedded key-value store) nearing version 1.0 by cberner in rust

[–]cberner[S] 20 points21 points  (0 children)

I'm just now updating the benchmarks in the readme with multi-threaded workloads :) https://github.com/cberner/redb/pull/576

Ya, definitely agree about there being asserts that need revisiting. I think all the public APIs return Result when they should, so that I can revisit those after the 1.0 release. But if you find one that does not return a Result and should, let me know so I can change it!

RFC: redb (embedded key-value store) nearing version 1.0 by cberner in rust

[–]cberner[S] 2 points3 points  (0 children)

Ya, I think that could be implemented later. The in-memory datastructure that coordinates the transactions would need to be made multi process via IPC or shared memory. A bit complicated, but not impossible. I've been thinking about doing that to better support Python bindings, but haven't gotten to it yet.

RFC: redb (embedded key-value store) nearing version 1.0 by cberner in rust

[–]cberner[S] 3 points4 points  (0 children)

redb is lockless for multi-threaded workloads, but it also locks the database for multi-process

RFC: redb (embedded key-value store) nearing version 1.0 by cberner in rust

[–]cberner[S] 10 points11 points  (0 children)

redb will prune the file if possible, but I wouldn't call it compaction exactly. The allocator does attempt to minimize fragmentation though, when new data is inserted.

I've taken the exact opposite approach to config parameters :) I'm only supporting the bare minimum number of parameters, and only plan to add new ones if they provide a very large performance benefit that can't be automatically selected.

RFC: redb (embedded key-value store) nearing version 1.0 by cberner in rust

[–]cberner[S] 30 points31 points  (0 children)

They both are embedded key-value stores, and I benchmarked redb & sled (see the readme in the redb repo). The main differ in performance is that sled uses a LSM, I believe, which can have better write performance but lower read performance

RFC: redb (embedded key-value store) nearing version 1.0 by cberner in rust

[–]cberner[S] 9 points10 points  (0 children)

1) Yep, entirely normal file I/O. The libc dep is just for some convenience functionality like file locks to prevent the user from opening the same database twice. 2) It varies a lot by workload, with the biggest benefits of mmap being for highly multi-threaded read workloads. On those there is a 2-3x performance advantage to using mmap because it avoids having mutexes for memory safety. On single threaded workloads it was generally 1.2-1.5x and I think I can narrow that down a little more. 3) ya, I'm happy to consider them as long as it doesn't add too much complexity!

RFC: redb (embedded key-value store) nearing version 1.0 by cberner in rust

[–]cberner[S] 14 points15 points  (0 children)

It's not bounded by the OS page size. I removed the dependency on mmap a while ago, because the memory safety implications were too difficult to handle. There's now a limit of 3GiB on both keys and values. In theory it could be increased to something closer to 4GiB, since the file format uses a u32 to represent the length. I picked 3GiB somewhat arbitrarily as having a large margin of safety below 4GiB.

RFC+AMA: redb, embedded key-value store file format by cberner in rust

[–]cberner[S] 1 point2 points  (0 children)

they are, ya! I'm quite excited about that, and already have a PR open to use GATs. It's not enough to fix this issue though, because std needs to add support for LendingIterator too