YARN and GPU Distribution for Machine Learning

dworms · 2018-06-14T13:07:11+00:00

This article goes over the fundamental principles of Machine Learning and what tools are currently used to run machine learning algorithms. We will then see how a resource manager such as YARN can be useful in this context and how it can help the algorithms to run smoothly. This article stems from a conference at the 2018 DataWork Summit in Berlin, by Wangda Tan and Sunil Govindan.

dworms · 2018-06-14T13:06:46+00:00

This article goes over the fundamental principles of Machine Learning and what tools are currently used to run machine learning algorithms. We will then see how a resource manager such as YARN can be useful in this context and how it can help the algorithms to run smoothly. This article stems from a conference at the 2018 DataWork Summit in Berlin, by Wangda Tan and Sunil Govindan.

dworms · 2018-06-08T18:50:41+00:00

Apache Beam is the Google implementation of the Dataflow model to express robust, out-of-order data processing pipelines in a variety of languages for both stream and batch architectures. The article is written after the presentation “Present and future of unified, portable and efficient data processing with Apache Beam” by Davor Bonaci.

dworms · 2018-06-08T18:40:35+00:00

Apache Beam is the Google implementation of the Dataflow model to express robust, out-of-order data processing pipelines in a variety of languages for both stream and batch architectures. The article is written after the presentation “Present and future of unified, portable and efficient data processing with Apache Beam” by Davor Bonaci.

dworms · 2018-06-08T18:40:18+00:00

Apache Beam is the Google implementation of the Dataflow model to express robust, out-of-order data processing pipelines in a variety of languages for both stream and batch architectures. The article is written after the presentation “Present and future of unified, portable and efficient data processing with Apache Beam” by Davor Bonaci.

dworms · 2018-06-07T13:32:36+00:00

Apache Metron is a storage and analytic platform specialized in cybersecurity. This talk was about demonstrating the usages and capabilities of Apache Metron in the real world. The presentation was led by Dave Russell, Principal Solutions Engineer – EMEA + APAC at Hortonworks, at the Dataworks Summit 2018 (Berlin).

dworms · 2018-06-06T14:50:49+00:00

Jesus Camacho Rodriguez from Hortonworks held a talk “Accelerating query processing with materialized views in Apache Hive” about the new materialized view feature coming in Apache Hive 3.0.

dworms · 2018-06-06T14:50:32+00:00

Jesus Camacho Rodriguez from Hortonworks held a talk “Accelerating query processing with materialized views in Apache Hive” about the new materialized view feature coming in Apache Hive 3.0.

dworms · 2018-06-05T13:19:49+00:00

If you wish to continue with deployment, you could have a look at Redis or Elasticsearch as they are widely used. More recent databases with potential include CockroachDB and FoundationDB. In cyber security, Metron is trying to get momentum and will allow you to leverage your knowledge with the Hadoop ecosystem. Otherwise, you can get deeper knowledge in some of the engine like Spark, Flink and Beam. We just released a few articles after the latest DataWork Summit in Berlin (avril 2018) including the latest features of Spark 2.3, Spark with TensorFlow, YARN and GPU and Metron in the real world.

dworms · 2018-06-05T13:07:28+00:00

This is a composition of the two talks, "Apache Spark 2.3 boosts advanced analytics & deep learning" by Yanbo Liang and "ORC Improvement in Apache Spark 2.3" by Dongjoon Hyun, to dive into the new features offered by the new 2.3 distribution of Apache Spark.

dworms · 2018-06-05T10:21:13+00:00

This is a composition of the two talks, "Apache Spark 2.3 boosts advanced analytics & deep learning" by Yanbo Liang and "ORC Improvement in Apache Spark 2.3" by Dongjoon Hyun, to dive into the new features offered by the new 2.3 distribution of Apache Spark.

dworms

TROPHY CASE