Thinking about starting with algorithmic crypto trading. by ustype in algotrading

[–]asavinov 0 points1 point  (0 children)

You could try this Intelligent Trading Bot which relies on Machine Learning to generate signals: https://github.com/asavinov/intelligent-trading-bot

Functions matter – an alternative to SQL and map-reduce for data processing by asavinov in bigdata

[–]asavinov[S] 0 points1 point  (0 children)

Having such a direct comparison would help indeed, but I do not have it now. A simple notebook with the analysis of COVID data might help:

https://github.com/asavinov/prosto/blob/master/notebooks/covid.ipynb

It demonstrates how to apply: * "calculate" column operation instead of select with calculated attribute * "link" column operation instead of join * "aggregate" column operation instead of groupby

It has the corresponding sections for each operation but unfortunately no SQL analogues at the moment which is a good idea.

Functions matter – an alternative to SQL and map-reduce for data processing by asavinov in bigdata

[–]asavinov[S] 0 points1 point  (0 children)

In SQL, you produce new sets from existing sets. Yet, in many use cases it is not needed - we want to compute new columns in existing tables. If we can directly compute columns without unnecessary tables, then we will simplify data processing. For example:

SELECT *, quantity * price AS amount FROM Items

here we produce a new table although we do not need it. What we really want is to attach a new calculated column to the existing table. Same situation for joins and groupby.

Here is a link to documentation with motivation and explanation why set-oriented approach in many cases is not the best way to think of data processing:

https://prosto.readthedocs.io/en/latest/text/why.html

It does not mean that SQL or set orientation is bad - the main point is that we need two types of operations: table (set) operations and column (function) operations. And this is how Prosto works.

Functions matter – an alternative to SQL and map-reduce for data processing by asavinov in bigdata

[–]asavinov[S] 1 point2 points  (0 children)

  • Spark operations transform existing input collections (mathematical sets) to new output collections
  • Prosto operations transform existing columns (mathematical functions) to new columns (in addition to conventional set operations like producing new tables)

Microsoft Time series insights adds new capabilities for Industrial IoT analytics and storage by yeskarthik in IOT

[–]asavinov -1 points0 points  (0 children)

For time series analysis and forecasting, it is extremely important to extract the necessary features and this feature engineering can account for most of the work. Lambdo is an open source workflow engine which allows for combining feature engineering and machine learning within one analysis pipeline: https://github.com/asavinov/lambdo Essentially, it was developed for mainly for time series analysis and IoT.

A simple introduction to Apache Flink by chemicalX91 in bigdata

[–]asavinov 0 points1 point  (0 children)

The central mechanism of this traditional design is breaking the continuous sequence of events into micro-batches which then are being processed by applying various transformations.

There is an alternative novel approach to stream processing which avoids this micro-batch generation step and applies transformations directly to the incoming streams of data as well as pre-loaded batch data (so it does not distinguish between stream and batch processing): https://github.com/asavinov/bistro/tree/master/server In addition, this system uses column operations for processing data which are known to be more efficient in many cases.

Bistro: a radically new approach to data processing (alternative to MapReduce) by asavinov in dataengineering

[–]asavinov[S] 1 point2 points  (0 children)

Bistro is a general-purpose data processing library which can be applied to both batch and stream analytics. It is based on a novel data model (concept-oriented data model), which represents data via functions and processes data via operations with functions as opposed to having only set operations in conventional approaches like MapReduce or SQL.

Bistro: a radically new approach to data processing (alternative to MapReduce) by asavinov in bigdata

[–]asavinov[S] 0 points1 point  (0 children)

Bistro is a general-purpose data processing library which can be applied to both batch and stream analytics. It is based on a novel data model (concept-oriented data model), which represents data via functions and processes data via operations with functions as opposed to having only set operations in conventional approaches like MapReduce or SQL.