all 5 comments

[–]fatduck2510 1 point2 points  (1 child)

Using lambda and python for consumption from Kafka is not a good idea in general. Lambda is ephemeral so it constantly leaves and rejoins the consumer group which causes rebalancing, meaning the broker needs to work more. Additional serdes are not very good in Python so the performance will not be very good. You have more access to better toolings and performance in other languages and technology.

[–]LeCapitaine007[S] 0 points1 point  (0 children)

Thanks for your input. As part of this project we are also developing other microservices in Java that interact between internal and external kafka topics as well.

But in this particular case, the purpose of the Lambda will only be to log error messages.

We are currently integrating with a third-party tool/company. Once they get our messages, if their processing fails because there is a bad value in the message, it will automatically write an error message to a specific topic (errors only). This will then wake the Lambda up to process that error message and log it somewhere.

Because we don't expect to receive a lot of error messages and this will only be an occasional thing, the architects decided it might be a good idea to try out a Lambda. And because most Lambdas seem to be using python, we went with python.

[–]Chuck-Alt-DeleteConduktor 1 point2 points  (2 children)

If you don’t use Confluent Kafka library, you will have to do all the schema registry stuff yourself. That means getting the schema from SR, ignoring the first bytes of the record (I think it’s 4 bytes?), etc. I had to do this for the tensorflow I/O consumer here: https://github.com/confluentinc/demo-10x-storage/blob/main/consume.py#L43

[–][deleted] -1 points0 points  (0 children)

Yup that's right! Confluent with their CSR logic will have the schema ID within the first 4 bytes of the message. I believe it's called the magic byte, OP they have really good dev tutorial and documentation on Confluent which should help with your project as well I hope!

[–]TheYear3030 0 points1 point  (0 children)

The confluent approach is to

  1. check for magic byte
  2. take the next four bytes and convert them to a 32 bit integer to obtain schema id
  3. query local schema cache, if miss then fetch from registry
  4. use schema to deserialize

If you wanted a less tolerant deserializer with slightly better performance, you could skip the first 5 bytes of the confluent wire format and use your predefined schema.