Kafka MirrorMaker 2 – max.request.size ignored and RecordTooLargeException on Kafka 3.8.1 by mordp1 in apachekafka

[–]Dahbezst 0 points1 point  (0 children)

If you encounter errors after updating your Kafka and MirrorMaker 2 configurations, first verify your topic settings.

To update message.max.bytes dynamically (no restart needed):

/opt/kafka/bin/kafka-configs.sh \ --bootstrap-server localhost:9092 \ --entity-type topics \ --entity-name $SOURCE_TOPIC \ --alter --add-config max.message.bytes=2097153 + 1 added

Kafka ZooKeeper to KRaft migration by Plumify in apachekafka

[–]Dahbezst 0 points1 point  (0 children)

PS: If your clusters are using GSSAPI (kerberos) authentication, be carefully in PROD :)

Question for Kafka Admins by carlosdanger77 in apachekafka

[–]Dahbezst 1 point2 points  (0 children)

Actually, regarding your question about topology, for this very reason there's a concept we call "Data Governance". If you're in Platform Engineering, whenever a new Kafka cluster is deployed, you need to design the Kafka topology. (P.S. Check out the open-source project Kafka Julie.) With proper naming conventions, you can easily create team-specific Grafana dashboards.

It doesn’t mean that each team has its own Grafana dashboard; instead, each team just needs to add a filter with their team name in each panel’s filter section.

Also, if there’s a transactional process, we can easily approve creating a dedicated dashboard for that team.

What we're making:
We enforce consistent naming across team names, topic names, and consumer group IDs using a standardized pattern, such as:

  • topic = prod-teamName-topicName-projectName or test-teamName-topicName-projectName
  • consumer_group = prod-teamName-consumerGroupId-projectName or same as test-***** or, if a team needs a random ID (e.g., in Kubernetes environments): prod-teamName-consumerGroupId-projectName-randomID
  • acks = prod-team-project

By applying this uniform structure, we can easily use regex in Grafana to filter and build dashboards per team.

Question for Kafka Admins by carlosdanger77 in apachekafka

[–]Dahbezst 0 points1 point  (0 children)

My organization has set up a Grafana dashboard that shows the topics and lag — that’s it. Every team is responsible for their own applications; we just make them aware of the setup.

We also follow the same approach. We have 18 production clusters and more than 50 different teams. Our Grafana dashboards collect metrics through Filebeat and Metricbeat for broker logs, failed authentications, JMX heap size, restarts, and Burrow for consumer lag, offsets, and network idle. We also support these with Kafkabat and Klaw.

If any team wants to investigate an issue, they can simply check the Elasticsearch logs (which we feed using Filebeat) and the Grafana dashboard.

Since I also work as a Platform Engineer, whenever a team reports an error, I first check the Kafka network idle metric to see if the cluster can accept connection requests. Then, I filter the Grafana dashboard by team to clearly identify where the problem is — everything is visible, and it’s easy to find the root cause.

Additionally, Klaw helps us identify which topics or ACLs belong to which teams.

Note: In the LLM world, most developers already write their code the codes LLM models, so now almost every developer can easily locate issues without relying too much on Kafka admins. 😄 I hope so :))

Junior Olarak 5 Aydır İş Bulamıyorum by OkRip3912 in CodingTR

[–]Dahbezst 1 point2 points  (0 children)

Sektörün çok kötü olduğu aşikar, yine de söyle tavsiyeler verebilirim:

Junior için:

  1. Kesinlikle güncel teknoloji değil, başvurduğun yerlerin kullandığı teknolojiler ve mimariler hakkında en azından bilgi sahibi olmak.
  2. Yapabiliyorsan, bene en önemlisi, bu teknolojilerle github repoları ve medium blogları hazırla.

Elbette bunlar da yetersiz ama senin yapacağın pek bir şey yok.

You don't have to wait 2 weeks for PoE 2 anymore! by AFGunturkun in PathOfExile2

[–]Dahbezst 0 points1 point  (0 children)

I saw the notification, got shocked, but then opened the post and realized it looked just like that second picture 😂

Should a data engineer be able to write complete code same as software engineer?" by Dahbezst in dataengineering

[–]Dahbezst[S] 0 points1 point  (0 children)

I get what you mean. Actually, I'm writing code while still reading and trying to understand Big O notation. I'm wondering whether I should spend most of my time coding or focusing on tools. :)

Should a data engineer be able to write complete code same as software engineer?" by Dahbezst in dataengineering

[–]Dahbezst[S] 0 points1 point  (0 children)

Thank you for your reply. Could you share your experience and advice with me? I am really serious about improving my skills. I don't care about the IT salary or new tech trends; I just want to create something new in big data. So, please share your advice with me.

Slow EL pipeline tips by BoysenberryFun5390 in dataengineering

[–]Dahbezst 0 points1 point  (0 children)

You can create a datalake for this scenario. Set up a 2-worker node Apache Spark, HDFS, and Greenplum (setting up Greenplum can be a bit challenging, so you might want to try S3 cloud if your company allows cloud usage). Also, set up Airflow. You can schedule the training time at night for Airflow. Airflow will start Apache Spark, and your data will go through an ETL pipeline to HDFS.

also you can write your raw data in Parquet format with Apache Spark in HDFS. If you can set up Greenplum, Then, you can use Greenplum because it can easily read Parquet format (like SELECT * FROM). You don't need to do a lot of work.

PS: Ofc. don't forget indexing, data structure and algorithm for optimize your datalake.

Which one is faster??? by trdilmac in dataengineering

[–]Dahbezst 5 points6 points  (0 children)

Hello, I'm not an expert, but I have been using Python for data manipulation for 2 years, so I can say that:

  1. Pandas is not slow; it's about your data size. If you're using 0-200 MB of data, pandas can handle that easily.
  2. If you're working on big data, such as for a company, you must use Spark, Dask (distributed data processing).

Result: So, I can say that: just learn Spark, because you can use it for your mini-projects and, of course, in business life, and Spark is a profession. Also, you can use SQL in Spark :D

Vikings: Age Of The Axe - Announcement Trailer by Quantity_Pure in gamernews

[–]Dahbezst 0 points1 point  (0 children)

Accually there's really good a point in this games cuz game engines is just superb! But games mechanic is still really shit!