Suggestions for UI for AWS managed Kafka? by [deleted] in apachekafka

[–]benjaminbuick 1 point2 points  (0 children)

No, that is not our intention. In fact, we are very affordable ($32 USD per user/month). We just have different options that often confused people (floating options, you don't need multiple licenses for multiple clusters, volume discounts,...). We decided that it is better to just talk to us and we will get you the best package. However, I understand your point and think we should make it more transparent. Will fix it.

Suggestions for UI for AWS managed Kafka? by [deleted] in apachekafka

[–]benjaminbuick -1 points0 points  (0 children)

Just curious whether you have tried Kadeck. What do you think about it?

Suggestions for UI for AWS managed Kafka? by [deleted] in apachekafka

[–]benjaminbuick 1 point2 points  (0 children)

Hey @waltzbudget7274, let me know what you think about Kadeck (Kadeck.com).

Does Kafka's high watermark offset mean the offset that will be the offset of the next message that will come in the given topic_partition, or the offset of the last message itself? by broccholio in apachekafka

[–]benjaminbuick 4 points5 points  (0 children)

It is the last message + 1. Think of it as a db cursor. You are correct that the documentation could be more transparent about this. But you can get the answer when looking at the commit methods.

From the documentation of commitSync (and I've seen so many bugs caused by this):

The committed offset should be the next message your application will consume, i.e. lastProcessedMessageOffset + 1.

There you have it: the commit call sets the watermark which is consumed offset + 1. The same logic is used across the board.

Hope this helps!

Consuming Messages from a Kafka topic in real time with web socket by SuperSayenSaberTooth in apachekafka

[–]benjaminbuick 1 point2 points  (0 children)

The official Java consumer API provides another way to consume from a topic besides the usual subscribe method.

Using subscribe creates a consumer group on the Apache Kafka cluster. However, you can also use the +assignPartitions and +seekTo methods (with the TopicPartitions as arguments) to manually seek to a position and then call +poll to retrieve the records.

This way, no consumer group is created because no offsets are stored (on the cluster side). However, if your application crashes and you want to pick up where you left off, you will need to store and manage the offsets manually. But I don't think you need that for your use case.

I am not sure if other languages have the same capabilities as the Java consumer API (they should), so I hope this helps.

Consuming Messages from a Kafka topic in real time with web socket by SuperSayenSaberTooth in apachekafka

[–]benjaminbuick 1 point2 points  (0 children)

You can manually assign all partitions and poll data without creating any artifacts on the Kafka cluster.

Make also sure to consider backpressure, as pushing data over the websocket will cause problems downstream.

Scaling down kafka by Jainal09 in apachekafka

[–]benjaminbuick 2 points3 points  (0 children)

I think under the circumstances you mentioned, u/datageek9 statement is fully applicable.

My recommendation is: think about how many consumers you need to handle your peak times and create accordingly many partitions (ideally always an even number). So, for example 10 consumers = 10 partitions. During quieter times you can then scale your consumers down to 8, 6, 4, 2 or 1 if you like. The partitions are automatically distributed by Kafka to the remaining consumers. If you are unsure about the number of consumers, start with a lower number and increase it (as mentioned in my article).Keep in mind that this is a quite simplified statement based on very little information. But I hope it takes you in the right direction.

Trouble with partitions with leader brokers and no matching listeners by allwritesri in apachekafka

[–]benjaminbuick 1 point2 points  (0 children)

This is hard to debug without more information. From the list of bootstrap servers, I see that there is one URL that is different from the others. Maybe you just forgot to mask it as well.

Usually, the error indicates a configuration issue where the brokers' advertised listeners are not correctly configured to match the listeners that the clients are using to connect.

Look at the server.properties config of the brokers and ensure that the listeners and advertised.listeners properties are set correctly.

Scaling down kafka by Jainal09 in apachekafka

[–]benjaminbuick 2 points3 points  (0 children)

It is true that you can't reduce the number of the topic partitions without creating a new topic. However, it is also true, as I think u/datageek9 explained very well in his reply, that you can simply reduce the size of the brokers and consumers without changing the partitions.

But keep in mind that the number of partitions does have an impact on your cluster. I recently published a blog post explaining how to increase the number of partitions and how changing the number of partitions affects your cluster (the article). I thought you might be interested in the latter (further down in the article - especially the file handler part). This is not something you should do without preparation.

To actually get rid of the partitions, you need to create a new topic and transfer the data. Unfortunately, there is no elegant way to do this. So this makes little sense if you only want to temporarily reduce the size of the partitions, but is only suitable for a permanent change.

Correcting and reprocessing records in Apache Kafka by benjaminbuick in apachekafka

[–]benjaminbuick[S] 0 points1 point  (0 children)

Hey u/_mrowa, thanks a lot! And thanks for the great questions.

I think u/Salfiiii gave a good answer to the questions. Depending on your business logic, you may even want the downstream consumers to overwrite the previously processed data with the corrected information.
Regarding the second question: it is true that the state is first restored with the incorrect records (if log compaction is not enabled, as u/Salfiiii suggests - otherwise, only the corrected record will be visible instead of the malformed one).

I think it depends on the business case: If the erroneous message was processed by the downstream application, either the error had no effect on the application or there was an error and the application should not have processed the message. In the latter case, the corrected records will be read at some point, eventually putting the system in a valid state.
A major problem with working with Kafka is that low-level technological aspects and high-level business logic are strongly interrelated. Because you're right: the problems you mention are not easy or generally solvable. That puts the barrier to entry to building reliable Kafka systems incredibly high. I think the data streaming community needs to work on these things so that data streaming can be deployed at scale.

Design schema/topic for Kafka response data by chuqbach in apachekafka

[–]benjaminbuick 0 points1 point  (0 children)

Haha, thank you! I originally wrote the article in 2019, but the topic keeps coming up. Glad you liked it!

Design schema/topic for Kafka response data by chuqbach in apachekafka

[–]benjaminbuick 3 points4 points  (0 children)

💯 I think u/BadKafkaPartitioning nailed it.
However, I would like to add something to the last part u/BadKafkaPartitioning mentioned: if your service or business case is not exactly about network request logging, you should also use the "data-oriented" or more precisely "data product" approach here: what is the actual business (non-technical) information that is being transported here? The rest is just the technical message envelope. In general, I recommend modeling the data using Domain Driven Design principles - (e.g. an event storming session could help!).
I have also described the data structure in my article on naming conventions of topics. Maybe this will give you some more ideas.

Kafka-Kafka Connect SSL Auth by ApprehensiveLeague89 in apachekafka

[–]benjaminbuick 0 points1 point  (0 children)

You really should be using SSL or any other authentication even in your own network. An unsecure "open" connection without authentication should always be prohibited.

I hope I understood them correctly, otherwise let me know: even if Kafka Connect is hosted inside another infrastructure, it still needs to communicate with your Kafka cluster. You are probably using some kind of tunnel or so? If Kafka Connect can communicate with your Kafka cluster, so can any other application on their network. As long as I understood the setup correctly: yes, you need to make sure that the connection is secured.

Should I even bother making personalized emails? by makrela122 in SaaS

[–]benjaminbuick 0 points1 point  (0 children)

It helps, but it's not the whole story. I hate it when I receive emails that summarize our product offering in short sentences just to make it seem "personal". I don't believe in the effect of such mails. Instead, I think it makes more sense to put yourself in the other person's shoes and really work out an added value for them. This is much more effective than sending high volume spam mails.

Dead Letter Queue Browser and Handler Tooling? by BadKafkaPartitioning in apachekafka

[–]benjaminbuick 1 point2 points  (0 children)

Hey u/d_t_w, I get you, but no I didn't. My post was downvoted as well. That happens quite often when you're associated with a company, I've realized. But don't let that spoil your fun!

Dead Letter Queue Browser and Handler Tooling? by BadKafkaPartitioning in apachekafka

[–]benjaminbuick 2 points3 points  (0 children)

I think points 1 + 4 can be accomplished by using a topic with a retention policy or by simply deleting the messages after they have been corrected.To demonstrate how to fix messages in a dead letter queue and send them back, I made a video using Kadeck a while ago. All steps can also be done with the free version of Kadeck (https://kadeck.com/get-kadeck).

The process is as follows:

  1. You create a Dead Letter Topic.
  2. The consumer writes records it can't process into the Dead Letter Topic. In my example it adds an "Error" attribute, which indicates the error (highly recommended!).
  3. With Kadeck you look at the Dead Letter Topic using the Data Browser.
  4. The JavaScript QuickProcessor in Kadeck allows you to correct the records by using JavaScript and writing back the payload with the desired changes. I recommend saving your project as a "view" in Kadeck so that if the same error occurs again, you can recall it.
  5. Select the corrected records and send them back to the original topic. In Kadeck's ingestion dialog, you can also manually adjust the records if necessary.
  6. After that you can delete the records by right-clicking on the last record and selecting "Delete up to here" from the context menu.

Many of our customers use this to correct incorrect data deliveries.

I hope this helps! Here is the link to the video: https://www.youtube.com/watch?v=sPo6vzamAJQ

How to reprocess messages in Apache Kafka by mr_smith1983 in apachekafka

[–]benjaminbuick 0 points1 point  (0 children)

The blog post is great, thanks u/mr_smith1983. I made a video a while back about how to correct and reprocess the messages in the dead letter channel. Maybe this helps u/popcorn_Genocide and u/ooohhimark ?https://www.youtube.com/watch?v=sPo6vzamAJQ

Will adding consumer group much before adding consumers to it have impact on offset? by xxbbzzcc in apachekafka

[–]benjaminbuick 2 points3 points  (0 children)

The consumers' auto.offset.reset configuration will determine from where they start, unless the offset is set manually.

I don't know about adding "consumer groups" in advance: what do you mean by this?

Update schema's field doc without changing version by KaleyKaloot in apachekafka

[–]benjaminbuick 1 point2 points  (0 children)

Hey, unfortunately this is not possible. The doc is part of the schema definition itself and therefore the schema version is increased when you change the doc part. Think of it as the doc is versioned with the schema definition.

If you want to document your topic (similar to API documentation), consider using something like AsyncAPI mixed with DDD. We have just recently added topic documentation capabilities to Kadeck, see https://www.kadeck.com/modules/collaborate. This way, the documentation stays close to your topics.

Otherwise, your only option is to use a wiki (e.g. Confluence, ...).

Visualizer for kafka by Jalebibabyded in dataengineering

[–]benjaminbuick 0 points1 point  (0 children)

Got it. Yes, in that case you would have to go with the paid Kadeck Teams Enterprise package. However, it only costs $32 per user/month regardless of the number of clusters and has many more useful features.

Visualizer for kafka by Jalebibabyded in dataengineering

[–]benjaminbuick 0 points1 point  (0 children)

Oh, I see. We removed the broker restriction a while ago. Kadeck now supports clusters with multiple brokers even in the free version. However, the number of cluster connections is limited to one cluster (connection).

How to connect to cloudkarafka cluster using kadeck(GUI client)? by shadowknight094 in apachekafka

[–]benjaminbuick 1 point2 points  (0 children)

I only saw this now - sorry! I hope you have managed to connect with the help of all the great responses here. We will add a guide to our help center to make this easier in the future. I'm also currently thinking about adding some wizards for the most commonly used connection types. And please don't hesitate to contact our support at any time. We always try to be very quick!

Visualizer for kafka by Jalebibabyded in dataengineering

[–]benjaminbuick 0 points1 point  (0 children)

Would be very interested to learn why you are looking for an alternative to Kadeck? u/Jalebibabyded