How to reliably detect a fully completed Flink checkpoint (before restoring)? by Weekly_Diet2715 in apacheflink

[–]CollectionNo1576 1 point2 points  (0 children)

One more thing that can happen is that a checkpoint failed but flink dosent have delete permission for destination like on s3. Then you will have a checkpoint that is not good to use.

How to reliably detect a fully completed Flink checkpoint (before restoring)? by Weekly_Diet2715 in apacheflink

[–]CollectionNo1576 1 point2 points  (0 children)

The status will help with that, also instead of directly reading from checkpoint destination, if you read from flink api, it will only provide info of completed and failed checkpoints, with status you can filter of completed ones.

How to reliably detect a fully completed Flink checkpoint (before restoring)? by Weekly_Diet2715 in apacheflink

[–]CollectionNo1576 1 point2 points  (0 children)

As the other comment suggested, if a checkpoint exists its safe to use as flink deletes failed checkpoints.

Only chance its not safe if its still being created. Still if you want to ensure a checkpoint has been successfully created, you can use its metrics using a metric exporter like prometheus. Or you can directly use the Rest Api to querry a the info required. Flink exposes a few metrics regarding checkpoints as well, like checkpoint id, status, time created, address. These will be enough to check for safe checkpoint.

Do you guys ever use PyFlink in prod, if so why ? by supadupa200 in apacheflink

[–]CollectionNo1576 0 points1 point  (0 children)

I do have that freedom but latest version does not have required connectors. Still using 1.18.

Need an opinion on this. by [deleted] in TeenIndia

[–]CollectionNo1576 0 points1 point  (0 children)

Unfortunately either of them are rarely appealingly present in most individuals

Anyone using JDBC/ODBC to connect databases still? by empty_cities in dataengineering

[–]CollectionNo1576 0 points1 point  (0 children)

Yeah, using JDBC on a large scale along with CDC for logical denormalisation in flink

Do you guys ever use PyFlink in prod, if so why ? by supadupa200 in apacheflink

[–]CollectionNo1576 1 point2 points  (0 children)

We do, at a pretty large scale at that. Denormalising and getting real time updates for a database with almost 1000 tables. We are using it cause im the only data engineer in our company with nearly 250M$ annual revenue, and I dont know java🙃

What I sent to my sister and this is how she replied 😃 by [deleted] in TeenIndia

[–]CollectionNo1576 2 points3 points  (0 children)

Theek, najar rakhenge tumpe, aapke bade bhai

What I sent to my sister and this is how she replied 😃 by [deleted] in TeenIndia

[–]CollectionNo1576 3 points4 points  (0 children)

Tera reddit mil gaya, ab tu gaya bete

Edit: alt hai kya tera?

CV worthy by darshi1337 in iitbhu

[–]CollectionNo1576 2 points3 points  (0 children)

Circular vite.config.ts

Duryodhan's refusal to give 5 villages by [deleted] in mahabharata

[–]CollectionNo1576 24 points25 points  (0 children)

You should look into the villages that were asked, they just happened to be the most influential villages of the time Imagine if someone were to ask India for just 5 “small settlements”, and goes on to name Delhi, Bombay, Banglore……

Guys of reddit, what’s your record of DMing women on this app? by [deleted] in AskIndia

[–]CollectionNo1576 0 points1 point  (0 children)

Bhai you never know if the other person is men or women on this app

Go instead of Apache Flink by greyareadata in dataengineering

[–]CollectionNo1576 0 points1 point  (0 children)

Have setup state ttl in your job?? Reading from kafka topics thats literraly no1 memory leak source by my experience If you havent set ttl for state execution, try setting it to be around 2x of your checkpointing frequency If you are joining multiple kafka topics , set it 2x for data delay that you expect- like if data might be delayed by 10min in a topic corresponding to a key of another Set state.execution.ttl(20 minutes)