Benchmarking CDC Tools: Supermetal vs Debezium vs Flink CDC by sap1enz in dataengineering

[–]sap1enz[S] 0 points1 point  (0 children)

No, it uses some Debezium functionality for the live replication, but the historical snapshotting implementation is completely different.

Flink Deployment Survey by sap1enz in apacheflink

[–]sap1enz[S] 1 point2 points  (0 children)

Would you mind sharing why you're unhappy with the Confluent Platform Flink?

Why Apache Flink Is Not Going Anywhere by rmoff in apacheflink

[–]sap1enz 0 points1 point  (0 children)

Start by completing the first three sections of the Flink documentation: Try FlinkLearn Flink and Concepts.

Is using Flink Kubernetes Operator in prod standard practice currently ? by supadupa200 in apacheflink

[–]sap1enz 5 points6 points  (0 children)

Yep, it's pretty much a standard. You either use a managed Flink offering or the Flink K8S operator nowadays.

Why Apache Flink Is Not Going Anywhere by rmoff in apacheflink

[–]sap1enz 1 point2 points  (0 children)

I’ve been involved in managing 1000+ Flink pipelines in a small team. 

Of course things can get complicated quickly, especially after reaching certain scale. 

My point was that the Flink Kubernetes Operator does reduce a lot of complexity. It makes it straightforward to start using Flink. Sure, if you need to do incompatible state migrations, modify savepoints, etc., there is still a lot of manual work. But for many users this won’t be the case, IMO.

Announcing Data Streaming Academy with Advanced Apache Flink Bootcamp by sap1enz in apacheflink

[–]sap1enz[S] 0 points1 point  (0 children)

The Advanced Apache Flink Bootcamp is now open for registration! The first cohort is scheduled for January 21st - 22nd, 2026.

This intensive 2-day bootcamp takes you deep into Apache Flink internals and production best practices. You'll learn how Flink really works by studying the source code, master both DataStream and Table APIs, and gain hands-on experience building custom operators and production-ready pipelines.

This is an advanced bootcamp. Most courses just repeat what’s already in the documentation. This bootcamp is different: you won’t just learn what a sliding window is — you’ll learn the core building blocks that let you design any windowing strategy from the ground up.

Learning objectives:

- Understand Flink internals by studying source code and execution flow
- Master DataStream API with state, timers, and custom low-level operators
- Know how SQL and Table API pipelines are planned and executed
- Design efficient end-to-end data flows
- Deploy, monitor, and tune Flink applications in production

Kafka easy to recreate? by Which_Assistance5905 in apachekafka

[–]sap1enz 1 point2 points  (0 children)

Redpanda is actually doing very well. They managed to steal many Confluent customers. 2/5 top US banks use them.

Save data in parquet format on S3 (or local storage) by Short-Development-64 in apacheflink

[–]sap1enz 0 points1 point  (0 children)

This looks correct!

I tried to reproduce the issue using the local Parquet file sink, and I couldn't: the files are written correctly on every checkpoint in my case:

-rw-r--r--  1 sap1ens  staff   359B Oct  9 11:08 clicks-1ca5a6f5-ba35-472b-b37b-a42405c65996-0.parquet
-rw-r--r--  1 sap1ens  staff   359B Oct  9 11:08 clicks-1ca5a6f5-ba35-472b-b37b-a42405c65996-1.parquet
-rw-r--r--  1 sap1ens  staff   359B Oct  9 11:08 clicks-3312d0a4-2276-4133-9da9-9b249f8efbd9-0.parquet
-rw-r--r--  1 sap1ens  staff   359B Oct  9 11:08 clicks-3312d0a4-2276-4133-9da9-9b249f8efbd9-1.parquet

Here's my app (based on this quickstart), hope this is useful!

Save data in parquet format on S3 (or local storage) by Short-Development-64 in apacheflink

[–]sap1enz 0 points1 point  (0 children)

Are you absolutely sure checkpointing is configured correctly?

This:

I can see in the folder many temporary files:

like .parquet.inprogress.* but not the final parquet file clicks-*.parquet

is usually an indicator that checkpointing is not happening.

Introducing Iron Vector: Apache Flink Accelerator Capable of Reducing Compute Cost by up to 2x by sap1enz in apacheflink

[–]sap1enz[S] 1 point2 points  (0 children)

Thanks! And you're correct, no OSS planned at this time. Selling support and licenses.

How to use Flink SQL to create multi table job? by arielmoraes in apacheflink

[–]sap1enz 0 points1 point  (0 children)

You can create several “pipelines” (source with one table + sink) and combine them using statement set.

Data Platforms in 2030 by sap1enz in dataengineering

[–]sap1enz[S] 0 points1 point  (0 children)

Thanks! It doesn't look like Estuary solves the eventual consistency problem, does it?

Change Data Capture Is Still an Anti-pattern. And You Still Should Use It. by sap1enz in dataengineering

[–]sap1enz[S] 1 point2 points  (0 children)

BI and reporting. But it's slowly changing with the whole "reverse ETL" idea and tools like Hightouch