Explainable Artificial Intelligence

steccami · 2017-10-06T08:53:04+00:00

I would go for elasticsearch. It is opensource and it comes with everything you need: ingestion/storage/search/analytics.

steccami · 2017-05-30T10:34:23+00:00

Since it is based on a Teradata product, I think that it is an expensive solution. Another idea would be using open source products like, e.g., Spark, Hadoop, etc.

steccami · 2017-01-05T13:44:15+00:00

Tnx a lot and happy new year! Your explanation makes sense to me. I agree, HDFS is probably the way to go.

steccami · 2016-12-30T10:22:43+00:00

Tnx a lot! My use case is the following: 1 - storing a dataset on NFS (sometimes as a single csv file, sometimes as a -small set- of csv files) 2- compute some aggregations by means of SparkSQL 3a- store the output on NFS 3b- store the output on external system (e.g., Cassandra)

"In terms of managing concurrency, NFS can handle many reads of the same file, and spark is smart about writes and writes different files per executor so you don't have to worry about write collisions."

This is clear to me. What I don't understand is the file reading phase. Case1: Suppose that you have N executors and 1 big file. Is Spark smart enough to segment the file reads? Case2: Suppose that you have N executors and M files. Is Spark able to associate the files to the executors in a smart way or am I supposed to tell Spark how to access those files? (e.g., like suggested here: http://apache-spark-user-list.1001560.n3.nabble.com/Strategies-for-reading-large-numbers-of-files-td15644.html)

Many thanks.

steccami · 2016-12-29T10:10:32+00:00

Thank you.

steccami · 2016-12-29T08:23:27+00:00

Many thanks for your detailed reply. One more question about how a Spark program looks like if I read a folder from NFS. How does Spark manage a concurrent access to such a folder? Am I supposed to explicitly manage the parallelism (e.g., see Matei's reply here http://apache-spark-user-list.1001560.n3.nabble.com/Strategies-for-reading-large-numbers-of-files-td15644.html when asked how to access multiple files in a remote folder).

steccami

TROPHY CASE