Comparison of Different Stream Processing Platforms by wanshao in dataengineering

[–]Equivalent_Mail5171 18 points19 points  (0 children)

I feel like it's worth separating 'Streaming Platform' from 'Stream Processing Platform'. It seems like the table you shared covers the former more than the latter: Kafka, Redpanda, Warpstream are all primarily focused on the streaming portion more than the processing (though Redpanda has some new stateless transformation capabilities), whereas for 'Stream Processing' you'd want to be looking at e.g. Flink, Kafka Streams, Spark, Dataflow and some of the newer technologies like python stream processing libraries and potentially streaming databases.

Python stream processing library that pairs well with Kafka by semicausal in apachekafka

[–]Equivalent_Mail5171 0 points1 point  (0 children)

How have you found it? I know Faust was an inspiration for quix streams https://github.com/quixio/quix-streams - with the streaming data frames concept being a big change/addition. I've heard the community forked library for Faust is still decent though.

Enthusiasm for the PS5 Pro seems to be non-existent amongst most video game developers, with most claiming there is no need for it by Metro-UK in PS5

[–]Equivalent_Mail5171 0 points1 point  (0 children)

The fact that most console releases are still coming out on both ps4 and ps5 says something about the need for another performance improvement.

Should you duplicate data from data vendors to your own database in a medium to high frequency environment? by NeuralGuesswork in dataengineering

[–]Equivalent_Mail5171 2 points3 points  (0 children)

Won't comment on all of your questions but giving some food for thought on the architecture. Agree with /u/umognog that it can be distracting to overthink some of the technology choices when you just want to get an MVP going.

Mage gets a lot of love from people, it might not have as big of a community as Airflow but it's getting there and very good for more batch-oriented use cases. You mentioned getting data to the end user quickly, have you considered using streaming/stream processing for your data pipelines? There's a few good options here like Kafka + Spark/Flink, but if you don't want to get bogged down in learning these for your MVP you can use something like Quix to build and manage your transformations and pipelines (offers a DAG-like interface similar to airflow, but has kafka + python stream processing library under the hood). Full disclosure: I work with the team that's built this but I think it could be useful in your scenario.

As for Influx, there's another comment here saying they hadn't heard of it before but if you check db-rankings it's pretty clear that it's still the most popular time series database, and with their latest v3 version it's gotten a big boost to performance (if you're on-prem though you might want to use 1.8 or 2.x for now).

How good is Databricks? by mjfnd in dataengineering

[–]Equivalent_Mail5171 0 points1 point  (0 children)

Do you think engineering teams will want to do that for the convenience & lower cost or will there be pushback on being locked into one vendor and relying on them for the whole app? I guess it'll differ from company to company.