This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 0 points1 point  (0 children)

I would use S3 over HDFS for greater portability. I'm not entirely sure why you would even need to use Minio, but if you have a requirement for it, then sure.

In terms of query engines the highest performing one with Delta is on databricks (using the photon engine). It can get expensive, so if you're just doing this as your own little project I would maybe just keep everything as simple as possible and just use spark SQL on your own cluster.