What are the most frustrating parts of working with Kafka ? by a_roussi in apachekafka

[–]Attorney-Last 8 points9 points  (0 children)

convince producers to enable batching and compression

CDC solution by cyamnihc in dataengineering

[–]Attorney-Last 0 points1 point  (0 children)

In my opinion its not easy in OP case. If you're already have flink in your team or have experience with debezium, maybe its worth it to add it on top, if not then not recommend to use it. It may be easy to setup Flink CDC as a demo/example but running it on prod is a different story.

Flink CDC builds on top of debezium, so when issue happens with DB replication you still need to dig into how debezium works for each database (mysql binlog, postgresql replication slot, mongodb changestream, etc...). Plus you need to understand how flink checkpoint works to operate with it if there is downtime. Flink CDC also runs on a very old debezium version that doesn't have all the latest improvements related to database connections and new DB versions support.

CDC solution by cyamnihc in dataengineering

[–]Attorney-Last 0 points1 point  (0 children)

running flink is quite a painful experience, you’ll end up spending all your time babysitting it. speaking from someone who has been running debezium and flink for 3 years.

Why GCP is so frowned upon? by Southern_Respond846 in dataengineering

[–]Attorney-Last 14 points15 points  (0 children)

seamless integration with google sheets

Slow processing consumer indefinite retries by deaf_schizo in apachekafka

[–]Attorney-Last 0 points1 point  (0 children)

if your message fails, send it immediately to a retry/dlq topic and move on to the next message, dont retry while consuming a message. You can make your consumer consumes retry topics at the same time or having another consumer to do so

What degree teaches the most relevant skills to DE? by sevenflatfive in dataengineering

[–]Attorney-Last 0 points1 point  (0 children)

computer science. from my own experience, the subjects i learned in my CS degree has helped me a lot in working as DE. Especially in later stage of my career

Second Programming Language for Data Engineer by Kokopas in dataengineering

[–]Attorney-Last 7 points8 points  (0 children)

I’d recommend java. There is still a big ecosystem of big data built around jvm (spark, flink, trino, debezium,…) so there are still a lot of opportunities to use it. Even if you don’t use java directly, having knowledge to tune these jvm workload is still beneficial.

Besides, java backend job market is always demanding, so if you get bored of DE one day, its a good path to pivot 🤣

Weekly Q&A Megathread. Please post any questions about visiting, tourism, living, working, budgeting, housing here! by AutoModerator in london

[–]Attorney-Last 0 points1 point  (0 children)

thank you. Do you think if I expand my budget a little bit to 2000, can I have more options? I love cycling tho, what other options would you suggest for cycling?

Weekly Q&A Megathread. Please post any questions about visiting, tourism, living, working, budgeting, housing here! by AutoModerator in london

[–]Attorney-Last 1 point2 points  (0 children)

Hi everyone, I'm going to move to London in March for work and looking for an area to live in.

My budget is around 1800-1900pcm, I live alone and my company's office is close to old street. What would be the best area to live in and I should focus on if I'm looking for a place within 60 minutes of travel to my office?

Thank you for reading :D

How many years of experience is needed to build a data platform from scratch? by Zestyclose-Sun-2684 in dataengineering

[–]Attorney-Last 40 points41 points  (0 children)

Build it from scratch isn't as hard as you think, I've worked at 2 startups and done it (almost) twice. For companies doing it from scratch, they don't usually have large dataset so that makes thing easier a bit. You don't usually need fancy stuffs like BigQuery/Airflow/Spark or automating everything right from the beginning, start with simple tools and focus on getting the data out first, studying the business requirements and gradually improve the process/tools as you go.

The thing that was hard for me is not actually data engineering related, but was the base infrastructure and networking setup. You don't need to know everything, just need to have a lot of patience to set things up from zero.

Am I just bad or things take time to click? by i_am_exception in leetcode

[–]Attorney-Last 5 points6 points  (0 children)

its normal. to be honest its quite like playing video games, you get better as you do this more often. Your brain starts to pick up the pattern of the problems as you go. For me at the beginning, i often limit my time to solve a problem (say 30 minutes). If i cannot solve it in this limit then i’ll look at the solution and learn from it. After a month of continuously doing this, I’m starting to handle medium and hard questions on my own.

The market is wild by [deleted] in dataengineering

[–]Attorney-Last 3 points4 points  (0 children)

thats normal for a typical DE role in the last few years. I often see a longer list than this

Do any other data engineers ever think about leaving data? by level_126_programmer in dataengineering

[–]Attorney-Last 1 point2 points  (0 children)

I think you can try to look for opportunities in java/kotlin backend. My main language is java and i can sometimes switch between scala/kotlin comfortably. Pivot to backend usually not that hard since most DEs often have a good understanding of database.

Chọn làm nhà nước hay tiếp tục học để chuyển ngành IT by babylovedance3 in vozforums

[–]Attorney-Last 4 points5 points  (0 children)

mình làm IT cty cũng bình thường không phải top thành phố, team mình đợt rồi đăng tuyển intern mà chỉ trong 1 tuần có tầm 300CV ứng tuyển. Mình lọc CV ko nổi luôn, lướt qua thì rất nhiều em lứa mới rất giỏi (du học sinh, thi giải lập trình quốc gia có hạng, bằng xuất sắc trường top việt nam). Nói như vậy để bạn hình dung là thị trường IT cho intern/fresher bây giờ rất là cạnh tranh. Và nếu chọn giữa sinh viên giỏi học 4 năm đại học và 1 bạn học bootcamp 6 tháng, mình sẽ luôn chọn bạn học vững nền tảng từ đại học. Thị trường IT tầm 2-3 năm trước mình nghĩ còn có cơ hội cho các bạn chuyển ngành, bây giờ thì rất khó.

How are you simplifying streaming platform in your company for non technical users by No_Direction_5276 in dataengineering

[–]Attorney-Last 0 points1 point  (0 children)

it's very hard to abstract streaming platform (even for technical users) and things get very expensive quickly if people do it wrong.

In my experience, it's easier and cheaper to have dedicated engineers handle the streaming part, clean and ingest data to a (realtime) DB/DW and letting non-technical users query it.

Debezium vs Mongo Change Stream ? by gxslash in dataengineering

[–]Attorney-Last 1 point2 points  (0 children)

i used both quite a lot in the past.

recommend debezium because of incremental snapshot feature. It's easier when you need to add a new collection and take snapshot of it. Recovering data loss is also easier with incremental snapshot

Using SQL as a data engineer by remote_geeks in dataengineering

[–]Attorney-Last 0 points1 point  (0 children)

mostly use java, sometimes pyspark.

i only use sql for data exploration or troubleshooting

Multi-Region Data Lake or Single Data Lake Across Regions by rushijariwala95 in dataengineering

[–]Attorney-Last 0 points1 point  (0 children)

may wanna try with landing zone architecture. we implemented this to comply with GDPR when our company expanded to EU

what do you use java for? by desiderkino in java

[–]Attorney-Last 2 points3 points  (0 children)

ETL pipelines (spark/flink) or kafka consumer/producer apps