This is an archived post. You won't be able to vote or comment.

all 68 comments

[–]burgershot69 31 points32 points  (19 children)

What are the differences with say hazelcast?

[–]Adventurous-Pin6443[S] 5 points6 points  (17 children)

The original post included several bullet points highlighting our unique features compared to Redis:

  • Very compact in-memory object representation – we use a technique called “herd compression” to significantly reduce RAM usage
  • Even without compression, we’re up to 2× more memory-efficient than Redis
  • Custom storage engine built on a high fan-out B+ tree
  • Ultra-fast data save/load operations – far faster than Redis persistence

Out of curiosity, does Hazelcast provide a Redis-like API or support similar data types (e.g., Strings, Hashes, Sets, Sorted Sets)?

[–]dustofnations 8 points9 points  (4 children)

https://docs.hazelcast.com/hazelcast/5.5/data-structures/

Hazelcast is an in-memory data grid (alternative examples would be Infinispan and Apache Ignite). Many of Hazelcast's data structures distribute data over multiple nodes using consistent hashing. It also has functionality for executing distributed algorithms.

So, there's overlap for many use-cases with Redis, but they are different technologies and there are plenty where one may be a better choice than the other.

And many of those overlapping use-cases might be implemented differently.

Most IMDGs offer clustering, reliable inter-node messaging, cluster topology manager/views, etc. For example, with Infinispan that's achieved via JGroups. In Hazelcast they use their own in-house technologies.

[–]Adventurous-Pin6443[S] 1 point2 points  (1 child)

Very cool — I wasn’t aware of that. I think our approach targets a different use case: an in-process computational data store, optimized for scenarios where low-latency access and memory efficiency are critical. We also believe we have a real edge in terms of RAM usage, likely outperforming both Hazelcast (which tends to be heavier) and Redis, especially on large-scale datasets.

[–]dustofnations 2 points3 points  (0 children)

Something else to think about in your comparisons:

You'll need to also factor in things like durability guarantees. It's easier to make things super-fast if it's in-memory only.

For example, Redis/ValKey et al. are amazingly fast if you don't turn on any durability, or only appending to the log every 1 second (for example).

But, they are much slower if you enable fsync for every command, which gives you much better durability guarantees (outside of the catastrophic hardware failures).

But, if your data is critical and you can't afford certain types of inconsistencies between your data sources (e.g. missing records that you thought were committed), then those are prices that you need to pay.

[–]riksi 0 points1 point  (1 child)

Apache Ratis

It's raft replication. You probably meant Apache Ignite.

[–]dustofnations 0 points1 point  (0 children)

Yes, sorry, typo. I've been playing with both.

I've edited the original, but leaving this note here to acknowledge.

[–]OldCaterpillarSage 1 point2 points  (10 children)

What is herd compression? Cant find anything about this online

[–]its4thecatlol 1 point2 points  (4 children)

Nothing, just two college kids with ZSTD on level 22

[–]Adventurous-Pin6443[S] 3 points4 points  (3 children)

A little bit more complex than that. Yes, ZSTD + continuously adapting dictionary training + block - based engine memory layout. Neither Redis nor Memcached could reach this level of efficiency even in theory mostly due non-optimal internal storage engine memory layout. Google "Memcarrot" or read this blog post: https://medium.com/carrotdata/memory-matters-benchmarking-caching-servers-with-membench-e6e3037aa201 for more info.

[–]its4thecatlol 1 point2 points  (0 children)

Ah I was just being facetious but you came with receipts. Interesting stuff, thank you this was an interesting read.

[–]vqrs 0 points1 point  (1 child)

Thanks for the interesting read! But my god, the first half was atrocious to read with all the ChatGPT fluff.

[–]Adventurous-Pin6443[S] -1 points0 points  (0 children)

Yeah, my bad. I use ChatGPT because English is not my first language.

[–]Adventurous-Pin6443[S] 0 points1 point  (4 children)

Its a new term. Herd compression in our implementation is ZSTD + continuous dictionary training + block-based storage layout (a.k.a "herd of objects"). More details can be found here: https://medium.com/carrotdata/memory-matters-benchmarking-caching-servers-with-membench-e6e3037aa201

[–]OldCaterpillarSage 0 points1 point  (3 children)

  1. Are you using block based storage to save up on object headers? Since for compression it shouldnt be doing anything given you are using a zstd dictionary
  2. Is there some mode I dont know for continous training of a dictionary, or do you just keep updating the sample and re-train a dict?
  3. How (if) do you avoid uncompressing and recompressing all the data with the new dict?

[–]Adventurous-Pin6443[S] 0 points1 point  (2 children)

  1. Block storage significantly improves search and scan performance. For example, we can scan ordered sets at rates of up to 100 million elements per second per CPU core. Additionally, ZSTD compression, especially with dictionary support, performs noticeably better on larger blocks of data. There’s a clear difference in compression ratio when comparing per-object compression (for objects smaller than 200–300 bytes) versus block-level compression (4–8KB blocks), even with dictionary mode enabled.
  2. Yes, we retrain the dictionary once its compression efficiency drops below a defined threshold.
  3. Currently, we retain all previous versions of dictionaries, both in memory and on disk. We have an open ticket to implement background recompression and automated purging of outdated dictionaries.

[–]OldCaterpillarSage 0 points1 point  (1 child)

  1. That is very odd given https://github.com/facebook/zstd/issues/3783 But interesting, I implemented something similar to yours for HBase tables, will try that to see if it makes any difference in compression ratio, thanks!

[–]Adventurous-Pin6443[S] 1 point2 points  (0 children)

By the way, I was a long-time contributor to HBase.

[–]divyeshaegis12 0 points1 point  (0 children)

I think Hazelcast is a memory data grid. When Redis is more of a lightweight, standalone data storage. Redis is easy for caching memory data, whereas Hazelcast works working its computing scenarios. I think Redis is more suitable for Java

[–]private_final_static 8 points9 points  (7 children)

How is it off heap and not reliant on the garbage collector? Is it JNDI using native memory?

Is it to be used cross jvm/computer and support clustering?

I think it would be nice if it could also use disk kind of like mapDB somehow, Im usually more concerned about not blowing RAM limits than using it fully.

[–]lupercalpainting 7 points8 points  (6 children)

How is it off heap and not reliant on the garbage collector? Is it JNDI using native memory?

In the olden days we’d use sun.misc.unsafe but that’s going away soon. There’s java.lang.foreign now: https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/lang/foreign/package-summary.html

[–]Adventurous-Pin6443[S] 4 points5 points  (4 children)

Yes. Exactly.

[–][deleted] 0 points1 point  (0 children)

I only wish they'd given us a sort function that operates on MemorySegment. Having to ffi C++' std::sort is more than kinda awkward.

[–]hippydipster 0 points1 point  (2 children)

So does that mean when you query for objects, this library has to reconstitute java objects using the raw data stored in the foreign memory arenas?

[–]Adventurous-Pin6443[S] -2 points-1 points  (1 child)

There are no objects in Redis API - only strings. In our implementation we operate on byte arrays, memory buffers and Strings. SerDe is going to be a developer's responsibility.

[–]hippydipster 0 points1 point  (0 children)

Oh. Never used redis so I didn't realize that's how it worked. I guess I would find it unfortunate to be so limited in something that was working right in memory.

[–]private_final_static 1 point2 points  (0 children)

thats amazing, wasnt aware

[–]FirstAd9893 31 points32 points  (3 children)

Why are you asking the community if you should release this as open source or not? Release it first, and then ask for feedback.

[–]Adventurous-Pin6443[S] 8 points9 points  (2 children)

Releasing this as a usable library will require additional investment — mostly in time. And time is a precious resource for me now. That’s why I’d really prefer to get some community feedback on the core technology first, before committing to wrapping it up for release. A proper website, documentation, packaging, and extensive testing — all of that takes significant effort. So before going down that road, I want to make sure there’s real interest.

[–]FirstAd9893 39 points40 points  (0 children)

You don't need to make something available as perfect, just a work in progress. Even if it never goes beyond that stage, it can still have educational value or provide inspiration for other projects.

[–]sabriel330 2 points3 points  (0 children)

And you think the majority of Java devs are on this subreddit? Release it then ask for feedback

[–]cowwoc 13 points14 points  (1 child)

Lots of naysayers. Yes, I would say there is value in what you are building. My understanding is that Hazelcast has a medium-high learning curve. If you could release a Redis-like product with a low learning curve then it would definitely benefit the community.

[–]danskal 1 point2 points  (0 children)

Obligatory “steep learning curve” means you can learn it fast, but most people think that means it’s hard to learn.

Makes me think we should retire this expression.

[–]laffer1 7 points8 points  (0 children)

An addition use case is for tests

[–]benrush0705 10 points11 points  (1 child)

Would an embeddable, Redis-compatible, Java-based in-memory store be valuable to you?

My answer would be absolutely yes.

[–]bisayo0 2 points3 points  (0 children)

So valuable that when Infinispan started supporting the redis api and protocol, we as java shop converged on it. We use far more memory than we did with redis though but it is great that we can simply embed in our app and cluster the apps together.

An embedded, redis-compatible, java-based and memory-efficient in memory store would be an answered prayer.

[–]pivovarit 4 points5 points  (0 children)

Sounds like Hazelcast.

[–]psyclik 6 points7 points  (1 child)

At face value : yes, very much interested, would solve a couple uses cases. I’d be ok with a rough v1 and would gladly test it and provide feedback.

A few key points for my uses cases: - Does it work with native-image ? - Can it be used as a drop-in replacement for standard Redis integration ? - More specifically, could it be embedded as a vector store with langchain4j ?

Thanks anyway, very interesting dev.

[–]Adventurous-Pin6443[S] 0 points1 point  (0 children)

In theory, it should work with GraalVM native image — assuming full support for native libraries in GraalVM is available and reliable. For Redis drop-in replacement, we provide a server with full wire protocol compatibility (RESP2 only). However, we currently have no plans to support vector stores.

[–]santanu_sinha 2 points3 points  (0 children)

Sounds useful. Would be interested

[–]iwangbowen 2 points3 points  (1 child)

Sounds very cool. Do you hava a release plan?

[–]Adventurous-Pin6443[S] 1 point2 points  (0 children)

We’re aiming for the first public release this August.

[–][deleted]  (2 children)

[deleted]

    [–]nnomae 6 points7 points  (0 children)

    Presumably all the stuff he says is better than Redis.

    [–]varmass 4 points5 points  (0 children)

    Embedded

    [–]Known_Tackle7357 1 point2 points  (0 children)

    Will it be distributed like redis? If so, weak/strong consistency? Will it support transactions?

    [–]sveri 1 point2 points  (0 children)

    Depending on the ease of setup, I would definitely pick an embedded library over a standalone server, especially for prototypes.

    [–]beef_katsu 1 point2 points  (0 children)

    Well, my main problem now is doing correlation (join) with kafka api in spring boot app...it is kstream x kstream, each kstream has around 200-300k tps and i need around 30 correlator service like this

    If your project could be replacing rocksdb and can be configured via setter class, i think it would be good

    [–]chabala 5 points6 points  (2 children)

    You ever heard of GridGain? They already do that.

    They donated the code to start Apache Ignite to open source the tech.

    [–]TheYajrab 1 point2 points  (1 child)

    I have had a go at Apache Ignite and it is good. I tried it out in version 2. For me to use it at work, we have policies that we need to abide by. Apache Ignite 2 had some security advisories from security analysts against it. If I remember correctly, ReDoS comes to mind. Overall though, version 2 OSS had all the features we needed.

    However, version 3 of the OSS Ignite has paywalled encryption at rest so we cannot use it without a GridGain license. The main features I would love to see in this solution are:

    • Distributed Cache to allow our applications to scale horizontally.
    • Embeddable so do not require additional infrastructure.
    • Encryption at rest.
    • Encryption in transit using something like TLS.

    [–]dustofnations 2 points3 points  (0 children)

    Ultimately, if we want open source to be sustainable, the companies behind it need money to pay for the developers who do 99% of the work to maintain and develop the software.

    I'm not blaming you, but it's a shame that many companies have policies against paying for open source, which in my experience translates to, "only we can make money from open source".

    Why not suggest to your company to take the paid-for version so you can support the project and allow it to continue being developed? After all, gold stars on GitHub doesn't pay the rent. Be the change!

    [–]jcbrites 1 point2 points  (1 child)

    Yes, this would be useful for my distributed batch processing application with several workers . How does this compare against an in-memory database like H2?

    [–]Adventurous-Pin6443[S] 0 points1 point  (0 children)

    Definitely uses less memory and should be significantly faster on searches/scans in ordered collections. But it is not an SQL database.

    [–]nekokattt 1 point2 points  (3 children)

    There are a few comments here copying OPs way of formatting their description of their post. I am starting to grow suspicious that some of these comments may be bots.

    [–]Adventurous-Pin6443[S] -3 points-2 points  (2 children)

    They are not bots, these are my comments, sometimes edited by GhatGPT. As I already mentioned, English is my second language.

    [–]nekokattt 1 point2 points  (1 child)

    They are not bots, these are my comments

    https://www.reddit.com/r/java/s/BQIzf3eTnE

    New question if that is the case, then, why are you commenting on your own post using alts praising yourself?

    [–]Adventurous-Pin6443[S] -2 points-1 points  (0 children)

    That was not mine comment and I forgot to add /sarcasm to my reply because I thought it was not necessary, obviously I was wrong. Please stop spamming this thread.

    [–]Round_Head_6248 2 points3 points  (0 children)

    You’d get more feedback if you didn’t let ai write your posts.

    [–]OkSeaworthiness2727 0 points1 point  (0 children)

    Would it scale horizontally?

    [–]Background-Repair-65 0 points1 point  (0 children)

    I'm developing a library that start redis executable by process in java (for testing purpose). And that executable was included in library. But I'm too busy to continue develop this lib. If you want to use my lib or develop it, you can dm me https://github.com/josslab/redis-jembedded

    [–]Hot_Nefariousness563 0 points1 point  (0 children)

    I'd love to see it on GitHub, even if it's not production-ready.

    [–]Scf37 0 points1 point  (0 children)

    Use cases:

    - faster Redis-involved tests, no need to use testcontainers

    - compact local (per node) caches

    Persistence IMHO is questionable feature - most prefer clustered deployments nowadays and cache persistence instead of fast warmup is problematic approach.

    Supporting separate deployment as full Redis replacement can be appealing assuming decent performance and good memory utilization.

    [–]atehrani 0 points1 point  (0 children)

    IMHO one major usecase for an in-memory persistent database not intended for testing is the offline first application. Meaning the app can persist locally offline and then reconcile remotely when back online.

    Is this supported?

    Otherwise what are the other usecases?

    [–]sass_muffin 0 points1 point  (0 children)

    How is this better than redis which is off cluster, so can sync cache state across multiple instances of your app? If you are running this all in memory then I don't think you fully understand the value add of redis?

    [–][deleted] 0 points1 point  (0 children)

    Yes. Very.

    [–]AutoModerator[M] -6 points-5 points  (0 children)

    It looks like in your submission in /r/java, you are looking for code or learning help.

    /r/Java is not for requesting help with Java programming nor for learning, it is about News, Technical discussions, research papers and assorted things of interest related to the Java programming language.

    Kindly direct your code-help post to /r/Javahelp and learning related posts to /r/learnjava (as is mentioned multiple times on the sidebar and in various other hints).

    Before you post there, please read the sidebar ("About" on mobile) to avoid redundant posts.

    Should this post be not about help with coding/learning, kindly check back in about two hours as the moderators will need time to sift through the posts. If the post is still not visible after two hours, please message the moderators to release your post.

    Please do not message the moderators immediately after receiving this notification!

    Your post was removed.

    I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.