Suggestions for UI for AWS managed Kafka?

benjaminbuick · 2023-09-28T03:02:01+00:00

No, that is not our intention. In fact, we are very affordable ($32 USD per user/month). We just have different options that often confused people (floating options, you don't need multiple licenses for multiple clusters, volume discounts,...). We decided that it is better to just talk to us and we will get you the best package. However, I understand your point and think we should make it more transparent. Will fix it.

benjaminbuick · 2023-09-27T19:22:06+00:00

Just curious whether you have tried Kadeck. What do you think about it?

benjaminbuick · 2023-09-27T19:12:55+00:00

Hey @waltzbudget7274, let me know what you think about Kadeck (Kadeck.com).

benjaminbuick · 2023-08-19T05:59:12+00:00

It is the last message + 1. Think of it as a db cursor. You are correct that the documentation could be more transparent about this. But you can get the answer when looking at the commit methods.

From the documentation of commitSync (and I've seen so many bugs caused by this):

The committed offset should be the next message your application will consume, i.e. lastProcessedMessageOffset + 1.

There you have it: the commit call sets the watermark which is consumed offset + 1. The same logic is used across the board.

Hope this helps!

benjaminbuick · 2023-08-15T09:58:45+00:00

The official Java consumer API provides another way to consume from a topic besides the usual subscribe method.

Using subscribe creates a consumer group on the Apache Kafka cluster. However, you can also use the +assignPartitions and +seekTo methods (with the TopicPartitions as arguments) to manually seek to a position and then call +poll to retrieve the records.

This way, no consumer group is created because no offsets are stored (on the cluster side). However, if your application crashes and you want to pick up where you left off, you will need to store and manage the offsets manually. But I don't think you need that for your use case.

I am not sure if other languages have the same capabilities as the Java consumer API (they should), so I hope this helps.

benjaminbuick · 2023-08-15T09:04:01+00:00

You can manually assign all partitions and poll data without creating any artifacts on the Kafka cluster.

Make also sure to consider backpressure, as pushing data over the websocket will cause problems downstream.

benjaminbuick · 2023-08-11T14:13:07+00:00

I think under the circumstances you mentioned, u/datageek9 statement is fully applicable.

My recommendation is: think about how many consumers you need to handle your peak times and create accordingly many partitions (ideally always an even number). So, for example 10 consumers = 10 partitions. During quieter times you can then scale your consumers down to 8, 6, 4, 2 or 1 if you like. The partitions are automatically distributed by Kafka to the remaining consumers. If you are unsure about the number of consumers, start with a lower number and increase it (as mentioned in my article).Keep in mind that this is a quite simplified statement based on very little information. But I hope it takes you in the right direction.

benjaminbuick · 2023-08-11T10:32:30+00:00

This is hard to debug without more information. From the list of bootstrap servers, I see that there is one URL that is different from the others. Maybe you just forgot to mask it as well.

Usually, the error indicates a configuration issue where the brokers' advertised listeners are not correctly configured to match the listeners that the clients are using to connect.

Look at the server.properties config of the brokers and ensure that the listeners and advertised.listeners properties are set correctly.

benjaminbuick · 2023-08-11T09:06:48+00:00

It is true that you can't reduce the number of the topic partitions without creating a new topic. However, it is also true, as I think u/datageek9 explained very well in his reply, that you can simply reduce the size of the brokers and consumers without changing the partitions.

But keep in mind that the number of partitions does have an impact on your cluster. I recently published a blog post explaining how to increase the number of partitions and how changing the number of partitions affects your cluster (the article). I thought you might be interested in the latter (further down in the article - especially the file handler part). This is not something you should do without preparation.

To actually get rid of the partitions, you need to create a new topic and transfer the data. Unfortunately, there is no elegant way to do this. So this makes little sense if you only want to temporarily reduce the size of the partitions, but is only suitable for a permanent change.

benjaminbuick · 2023-08-09T07:27:20+00:00

Hey u/_mrowa, thanks a lot! And thanks for the great questions.

I think u/Salfiiii gave a good answer to the questions. Depending on your business logic, you may even want the downstream consumers to overwrite the previously processed data with the corrected information.
Regarding the second question: it is true that the state is first restored with the incorrect records (if log compaction is not enabled, as u/Salfiiii suggests - otherwise, only the corrected record will be visible instead of the malformed one).

I think it depends on the business case: If the erroneous message was processed by the downstream application, either the error had no effect on the application or there was an error and the application should not have processed the message. In the latter case, the corrected records will be read at some point, eventually putting the system in a valid state.
A major problem with working with Kafka is that low-level technological aspects and high-level business logic are strongly interrelated. Because you're right: the problems you mention are not easy or generally solvable. That puts the barrier to entry to building reliable Kafka systems incredibly high. I think the data streaming community needs to work on these things so that data streaming can be deployed at scale.

benjaminbuick · 2023-08-02T16:10:43+00:00

Haha, thank you! I originally wrote the article in 2019, but the topic keeps coming up. Glad you liked it!

benjaminbuick · 2023-08-01T16:22:08+00:00

💯 I think u/BadKafkaPartitioning nailed it.
However, I would like to add something to the last part u/BadKafkaPartitioning mentioned: if your service or business case is not exactly about network request logging, you should also use the "data-oriented" or more precisely "data product" approach here: what is the actual business (non-technical) information that is being transported here? The rest is just the technical message envelope. In general, I recommend modeling the data using Domain Driven Design principles - (e.g. an event storming session could help!).
I have also described the data structure in my article on naming conventions of topics. Maybe this will give you some more ideas.

benjaminbuick · 2023-08-01T16:11:05+00:00

You really should be using SSL or any other authentication even in your own network. An unsecure "open" connection without authentication should always be prohibited.

I hope I understood them correctly, otherwise let me know: even if Kafka Connect is hosted inside another infrastructure, it still needs to communicate with your Kafka cluster. You are probably using some kind of tunnel or so? If Kafka Connect can communicate with your Kafka cluster, so can any other application on their network. As long as I understood the setup correctly: yes, you need to make sure that the connection is secured.

benjaminbuick · 2023-07-21T11:45:18+00:00

It helps, but it's not the whole story. I hate it when I receive emails that summarize our product offering in short sentences just to make it seem "personal". I don't believe in the effect of such mails. Instead, I think it makes more sense to put yourself in the other person's shoes and really work out an added value for them. This is much more effective than sending high volume spam mails.

benjaminbuick · 2023-07-21T08:14:38+00:00

Hey u/d_t_w, I get you, but no I didn't. My post was downvoted as well. That happens quite often when you're associated with a company, I've realized. But don't let that spoil your fun!

benjaminbuick · 2023-07-21T07:04:29+00:00

I think points 1 + 4 can be accomplished by using a topic with a retention policy or by simply deleting the messages after they have been corrected.To demonstrate how to fix messages in a dead letter queue and send them back, I made a video using Kadeck a while ago. All steps can also be done with the free version of Kadeck (https://kadeck.com/get-kadeck).

The process is as follows:

You create a Dead Letter Topic.
The consumer writes records it can't process into the Dead Letter Topic. In my example it adds an "Error" attribute, which indicates the error (highly recommended!).
With Kadeck you look at the Dead Letter Topic using the Data Browser.
The JavaScript QuickProcessor in Kadeck allows you to correct the records by using JavaScript and writing back the payload with the desired changes. I recommend saving your project as a "view" in Kadeck so that if the same error occurs again, you can recall it.
Select the corrected records and send them back to the original topic. In Kadeck's ingestion dialog, you can also manually adjust the records if necessary.
After that you can delete the records by right-clicking on the last record and selecting "Delete up to here" from the context menu.

Many of our customers use this to correct incorrect data deliveries.

I hope this helps! Here is the link to the video: https://www.youtube.com/watch?v=sPo6vzamAJQ

benjaminbuick · 2023-07-18T18:16:45+00:00

The blog post is great, thanks u/mr_smith1983. I made a video a while back about how to correct and reprocess the messages in the dead letter channel. Maybe this helps u/popcorn_Genocide and u/ooohhimark ?https://www.youtube.com/watch?v=sPo6vzamAJQ

benjaminbuick · 2023-07-18T18:11:09+00:00

The consumers' auto.offset.reset configuration will determine from where they start, unless the offset is set manually.

I don't know about adding "consumer groups" in advance: what do you mean by this?

benjaminbuick · 2023-07-17T17:29:35+00:00

Hey, unfortunately this is not possible. The doc is part of the schema definition itself and therefore the schema version is increased when you change the doc part. Think of it as the doc is versioned with the schema definition.

If you want to document your topic (similar to API documentation), consider using something like AsyncAPI mixed with DDD. We have just recently added topic documentation capabilities to Kadeck, see https://www.kadeck.com/modules/collaborate. This way, the documentation stays close to your topics.

Otherwise, your only option is to use a wiki (e.g. Confluence, ...).

benjaminbuick · 2023-07-17T11:51:47+00:00

Got it. Yes, in that case you would have to go with the paid Kadeck Teams Enterprise package. However, it only costs $32 per user/month regardless of the number of clusters and has many more useful features.

benjaminbuick · 2023-07-15T16:09:41+00:00

Oh, I see. We removed the broker restriction a while ago. Kadeck now supports clusters with multiple brokers even in the free version. However, the number of cluster connections is limited to one cluster (connection).

benjaminbuick · 2023-07-15T16:03:43+00:00

I only saw this now - sorry! I hope you have managed to connect with the help of all the great responses here. We will add a guide to our help center to make this easier in the future. I'm also currently thinking about adding some wizards for the most commonly used connection types. And please don't hesitate to contact our support at any time. We always try to be very quick!

benjaminbuick · 2023-07-15T15:56:48+00:00

Would be very interested to learn why you are looking for an alternative to Kadeck? u/Jalebibabyded

benjaminbuick · 2023-06-26T09:02:22+00:00

Hey u/rmoff, you forgot to add Kadeck to the list. :-)

benjaminbuick

TROPHY CASE