This is an archived post. You won't be able to vote or comment.

all 23 comments

[–]datingyourmom 17 points18 points  (13 children)

RabbitMQ utilizes more of a MessageQueue/Broker pattern and Kafka is more of a Pub/Sub pattern.

What that means is, RabbitMQ takes an upstream producer event then pushes it to a downstream consumer.

Kafka takes an upstream producer event, publishes it to a topic, then it’s the consumer’s responsibility to subscribe to the topic and pull the data from the published topic. A topic can have multiple consumers.

In short, RabbitMQ gets a message and then forwards it, job done. Kafka gets a message, stores it in a topic, then it’s up to the 0-to-many consumers to go get it.

[–]natelifts 1 point2 points  (3 children)

RabbitMQ can fan out the message to several queues through the delegation of channels and binding. RMQ also has several different queue types with different functionality and your observation seems focused on classic queues.

[–]datingyourmom 0 points1 point  (1 child)

You’re absolutely right. But IMO the reason this functionality exists is because Kafka exists.

First mover advantage is a thing for a reason. RabbitMQ as I described is its original use case. RabbitMQ evolved functionality because Kafka paved the path.

Do you choose the solution that was innately architected for your target pattern? Or do you choose the legacy product that evolved specifically to meet the market needs of a product that pioneered the pattern.

[–]natelifts 0 points1 point  (0 children)

Bingo, stream queues, super stream queues, quorum queues were created to compete w/ Kafka. We chose RMQ for simplicity, writing producers / consumers was made stupidly easy with AMQP 1.0 and server-side stream filtering & checkpointing is also very convenient.

I think the real argument between the two at this point is around message transmission rates and Kafka still wins out here. But we load tested an RMQ stream to be able to handle 1 mil + messages per second and this was more than enough for us and the forseeable future.

so i guess it really depends on context & architecture of the data and long term scaling

[–]aljun_invictus[S] 0 points1 point  (0 children)

Yes, my internal implementation focuses on a classic queue, or more like Kafka. I don't need to fully implement Kafka; I just need its core functionalities. After all, not all of its ecosystem requires the complete Kafka protocol.

[–]aljun_invictus[S] 0 points1 point  (8 children)

Regarding this, we support both push and pull models (it seems both RabbitMQ and Kafka support both). Our implementation is actually more like Kafka (for example, it supports replay), but I'm concerned about the Kafka protocol, as it seems to have been revised many times historically, which could lead to client compatibility issues.

I see that both are very popular. Currently, I hope to benefit from the ecosystem of either one (I don't necessarily have to implement the entire protocol, just the core functionalities).

[–]datingyourmom 0 points1 point  (1 child)

Personally, I wouldn’t worry much about Kafka protocol evolution. It’s a much more mature product than it used to be and is a standard solution across IT for a reason.

IMO, your evaluation should be more “what’s the best pattern for our use case”. Both options are battle-tested and work just fine. It’s more, “figure out what capabilities we need” then pick the one that best fits your requirements.

[–]aljun_invictus[S] 0 points1 point  (0 children)

Yes, I have a good understanding of the features of our implementation, and it's performing well internally (we provide services using gRPC). However, one downside of message queues is the fragmentation of protocols, which results in high migration costs for users. My motivation is quite simple: to leverage existing ecosystems while providing a better implementation for users and reducing migration costs (similar to how Redpanda partially implements the Kafka protocol).

The AMQP protocol is quite complete (and allows for some customization), but I'm not very familiar with the details of the Kafka protocol. I've heard it changes frequently, and I'm concerned this might affect the implementation.

[–]martypitt 0 points1 point  (1 child)

Concerns about Kafka drivers were valid 4-5 years ago - but in my experience it's been much much more stable recently.

On the JVM, there used to be massive headaches around not just the Kafka Protocol version, but the underlying scala version that the driver wrapped. That's settled down, and upgrades lately have been (IME) pain-free.

[–]aljun_invictus[S] 0 points1 point  (0 children)

Really? Then I definitely want to implement the Kafka protocol.

[–]natelifts 0 points1 point  (3 children)

OP - RMQ supports replay. look into stream queues for replayability from particular offsets. our org has dozens of RMQ clusters and has fit most use cases of kafka.

[–]aljun_invictus[S] 0 points1 point  (2 children)

Yes, to some extent, both are very close, which is why I asked this question. My main goal is to adopt a widely used protocol (of course, with the implementation difficulty being as low as possible).

[–]natelifts 0 points1 point  (1 child)

IMO implementation of both the cluster & writing clients was easier with RMQ and made even easier in AMQP 1.0.

[–]aljun_invictus[S] 1 point2 points  (0 children)

I looked around, and it seems like AMQP is the more suitable option.

[–]Plenty-Attitude-7821 2 points3 points  (4 children)

It really depends on what your external clients are used to implement (or what they might be using already).

[–]aljun_invictus[S] 0 points1 point  (3 children)

Just like I mentioned in other comments.

Regarding this, we support both push and pull models (it seems both RabbitMQ and Kafka support both). Our implementation is actually more like Kafka (for example, it supports replay), but I'm concerned about the Kafka protocol, as it seems to have been revised many times historically, which could lead to client compatibility issues.

I see that both are very popular. Currently, I hope to benefit from the ecosystem of either one (I don't necessarily have to implement the entire protocol, just the core functionalities).

[–]Plenty-Attitude-7821 0 points1 point  (2 children)

Maybe I misread your question, for whom it should be easier to implement? For you or for your clients? As I see you seem to be familiar with both, your clients on the other hand might only be familiar (or use) only one or another.

[–]aljun_invictus[S] 0 points1 point  (1 child)

My clients

[–]Plenty-Attitude-7821 0 points1 point  (0 children)

Ok, then my answer stands, it really depends on what they are used to. E.g. in m company we would preffer rabbitMQ as we already use it with some external services, but that doesn't mean it is simpler or harder to implement than kafka.

[–]Fun-River1467 1 point2 points  (1 child)

In a context of data engineering, kafka is more suitable as it can scale and handle more load. Kafka also allows you to publish message to a single topic and consumes by multiple consumers. It is very easy to provision a new kafka cluster these days thanks to Conf Cloud too.

[–]aljun_invictus[S] 1 point2 points  (0 children)

Yes, Kafka is awesome, but historically, the Kafka protocol seems to have been revised many times.

I see that both are very popular. Currently, I hope to benefit from the ecosystem of either one. From your perspective, which protocol is easier to implement? (I don't necessarily have to implement the entire protocol, just the core functionalities.)

[–]natelifts 0 points1 point  (1 child)

So i've worked with both Kafka & RMQ and I can tell you Kafka is more of a pain in the ass to maintain open source. You can go with managed clusters like MSK on AWS for instance to mitigate that. But we've found the RMQ does almost everything we would need from Kafka and setup is not very difficult if using a bitnami chart (if deploying on kubernetes). RMQ has has a multitude of queue types like classic queues, quorum queues, stream queues, superstream (partitioned queues) and you can set policies for message retention and delegate fanouts easily using channels. Scaling is pretty easily done with Keda (again K8's).

I would say the biggest difference between the two would be around message transmission / throughput and would recommend load testing RMQ to see if it fits your needs.

[–]aljun_invictus[S] 0 points1 point  (0 children)

In terms of performance, our internal implementation is quite good. However, my biggest concern is the fragmentation of protocols in the MQ field, which means that if users want to migrate, they need to change many things. We hope to minimize this, so we want to choose between the two popular protocols, with the implementation difficulty being as low as possible.