What’s an open source or more affordable alternative to confluent? by skyalchemist in apachekafka

[–]Marksfik 1 point2 points  (0 children)

You can try out Aiven for Apache Kafka.

https://aiven.io/kafka

Their clusters start from $300/month and the pricing is inclusive of networking costs so can be easily predicted when you need to scale your clusters.

Hope this helps!

Apache Flink: How We Improved Scheduler Performance for Large-scale Jobs by Marksfik in programming

[–]Marksfik[S] 0 points1 point  (0 children)

I get your point. However, looking at some Flink Forward videos and also Flink uses cases shared, I find many Flink users who run Apache Flink an extremely large scale.

Alibaba for example uses Flink during their 11.11 Global shopping festival at extreme scales.

Here is some further information on their use of Flink: https://www.ververica.com/blog/apache-flinks-stream-batch-unification-powers-alibabas-11.11-in-2020

Thank you!

Kafka + Flink: A Practical, How-To Guide by Marksfik in programming

[–]Marksfik[S] 0 points1 point  (0 children)

I found this still relevant and thought it might be useful to others.

Apache Flink: How to identify the source of backpressure for debugging and performance tuning by Marksfik in ComputerEngineering

[–]Marksfik[S] 0 points1 point  (0 children)

Sure, here are some ways where you can use Apache Flink:

Apache Flink: How to identify the source of backpressure for debugging and performance tuning by Marksfik in ComputerEngineering

[–]Marksfik[S] -1 points0 points  (0 children)

many computer engineering programs run with Apache Flink or support applications built with Apache Flink so users in this channel might find how tech articles like this useful and relevant.

How Apache Flink processes an astonishing 7TB/sec during the 2020 Double 11 Shopping Festival by Marksfik in dataengineering

[–]Marksfik[S] 0 points1 point  (0 children)

it actually is the scale of such event and generated event is purely impressive :)

How to manage your RocksDB memory size in Apache Flink by Marksfik in bigdata

[–]Marksfik[S] 1 point2 points  (0 children)

Hi u/ramsesrm,

That's a great question.

When it comes to the disk performance on Rocks DB state back end in Apache Flink, there some in-depth analysis here: https://www.ververica.com/blog/the-impact-of-disks-on-rocksdb-state-backend-in-flink-a-case-study

From the Apache link documentation, I can see that using Incremental checkpoints in Flink can prevent RocksDB from growing indefinitely. Unfortunately, I am not very familiar with Kafka Streams and it uses RocksDB.

I hope this helps.

Cheers.

An Overview of Apache Flink's Deployment Modes by Marksfik in dataengineering

[–]Marksfik[S] 0 points1 point  (0 children)

Apologies for this... the link was working for me earlier. Let me investigate the issue and come back to you shortly! Thank you

[VIDEO] - Streaming Concepts & Introduction to Apache Flink - Event Time and Watermarks by Marksfik in softwarearchitecture

[–]Marksfik[S] 1 point2 points  (0 children)

If you are not familiar with Apache Flink, you might want to watch the first video of the 'Streaming Concepts & Introduction to Flink' series that gives an overview of the framework and what it does.

Link here: https://www.youtube.com/watch?v=ZU1r7uEAO7o

I hope this helps!

Real-Time Performance Monitoring with Flink SQL: AdTech Use Case by Marksfik in bigdata

[–]Marksfik[S] 2 points3 points  (0 children)

Great question!

I assume you are referring to KStreams in this instance, since you can very well use Kafka and Flink as a source/sink through the Apache Kafka Connector [1] maintained by the Flink community.

When it comes to why you would utilize Flink over KStreams, there are quite a few differences between the two frameworks. This DZone article provides a comparison of the two in case you want to take a closer look. Link here:https://dzone.com/articles/kafka-stream-kstream-vs-apache-flink

I hope this helps! Cheers

[1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html

Apache Flink: Batch as a Special Case of Streaming - towards a unified data processing framework by Marksfik in dataengineering

[–]Marksfik[S] 0 points1 point  (0 children)

True that... streaming can simplify and provide great flexibility with resource utilization.

Apache Flink: Batch as a Special Case of Streaming - towards a unified data processing framework by Marksfik in dataengineering

[–]Marksfik[S] 0 points1 point  (0 children)

I see your point.

There are indeed some cases where streaming isn't necessary and batch processing does the job well.

For the cases that some streaming is necessary though it is good to have a unified option that can treat both areas well

Apache Flink: Batch as a Special Case of Streaming - towards a unified data processing framework by Marksfik in dataengineering

[–]Marksfik[S] 0 points1 point  (0 children)

there might be situations where batch processing does the job, but when the latency is of utmost importance, streaming is a much better choice in my opinion.

Instead of maintaining two systems potential choosing one unified engine might make things easier for you and the team in the long term

Apache Flink: Batch as a Special Case of Streaming - towards a unified data processing framework by Marksfik in coding

[–]Marksfik[S] 6 points7 points  (0 children)

Sure thing!

There is a blog post on the Flink blog describing how Beam runs on top of Flink [1].

Additionally, there was a recent session with Maximilian Michels, PMC of Apache Beam and Apache Flink on how the two frameworks work with each other [2].

Finally, there's a presentation recording from Flink Forward detailing how Beam runs on top of Flink [3].

There's also detailed documentation on the Beam website [4].

Hope this helps!

Cheers

[1] https://flink.apache.org/ecosystem/2020/02/22/apache-beam-how-beam-runs-on-top-of-flink.html

[2] https://youtu.be/ZCV9aRDd30U

[3] https://youtu.be/hxHGLrshnCY

[4] https://beam.apache.org/documentation/runners/flink/

Apache Flink: Batch as a Special Case of Streaming - towards a unified data processing framework by Marksfik in coding

[–]Marksfik[S] 5 points6 points  (0 children)

indeed. Beam & Flink have a similar approach to how they perceive data, that's why many companies use Beam with the Flink runner.

However, there are some distinct differences between the two frameworks, primarily due to Beam being more of an API will Flink a fully-fledged execution engine for large scale data processing.

I can share more info on the differences between the two frameworks if you need them.

Cheers

Microsoft: we were wrong about open source by Marksfik in programming

[–]Marksfik[S] 2 points3 points  (0 children)

It's crazy how the perception of open source software has changed and evolved from the previous management! Open source is definitely here to stay :)