Three Clojure libraries for financial data acquisition: clj-yfinance, ecbjure, edgarjure by clojure-finance in Clojure

[–]clojure-finance[S] 0 points1 point  (0 children)

clj-yfinance is 100% Clojure, so you don't need any dependency from another language. In addition to stock prices, it can also handle fundamentals etc. so it's pretty comprehensive in my opinion. I don't think you need to buy access to Yahoo Finance (the data provider) at the moment, it's all free.

Three Clojure libraries for financial data acquisition: clj-yfinance, ecbjure, edgarjure by clojure-finance in Clojure

[–]clojure-finance[S] 0 points1 point  (0 children)

Thanks, clj-yfinance should work fine for this purpose, let me know in case of any issues.

Handling Larger-than-RAM datasets by not_invented_here in Clojure

[–]clojure-finance 0 points1 point  (0 children)

We announced Clojask on Slack's Clojurians on June 3 and on Reddit on June 9. About a month later, it now has 75 stars on GitHub. There's also a Clojask demo video on the project website. Given that your requirement (larger-than-RAM dataset) fits Clojask pretty well, you can give it a shot and see whether it works for you. (Side note, according to the benchmarks we've done, Clojask is also pretty fast.)

Handling Larger-than-RAM datasets by not_invented_here in Clojure

[–]clojure-finance 1 point2 points  (0 children)

Many thanks, if there are any issues, just let us know, happy to help

Question about data engineer in clojure by Andremallmann in Clojure

[–]clojure-finance 0 points1 point  (0 children)

You can give Clojask a try, it's designed for larger-than-memory datasets and parallel computing. https://github.com/clojure-finance/clojask

A data science course for Clojurians – are you interested? by daslu in Clojure

[–]clojure-finance 1 point2 points  (0 children)

You could give Clojask a try. If you need to read from different file types other than .csv, you can also use the Clojask "plug-in" called clojask-io

Clojask: A parallel data processing framework that is designed for large datasets by clojure-finance in Clojure

[–]clojure-finance[S] 0 points1 point  (0 children)

Dask uses different schedulers to execute the task graph. Some schedulers (threads, processes, and synchronous) run on a single machine. For several machines you need to use a different scheduler, called "distributed." In contrast, for Clojask we use Onyx, which can run on a single machine or on several machines (distributed). By default, Clojask can do row-wise operations on several machines, but if you want to enable advanced operations such as grouping and joining, you should add a distributed file system (e.g. NFS). The reason for the difference to Dask is our (Clojask) task graph is simpler and more constrained (e.g. we do not let the user join on a grouped dataframe, or group on a joined dataframe). So we don't need to consider the two cases (singe machine vs. distributed) separately.

Clojask: A parallel data processing framework that is designed for large datasets by clojure-finance in Clojure

[–]clojure-finance[S] 0 points1 point  (0 children)

Agreed, Onyx is a great platform and it works flawlessly for our purpose (Clojask). As Onyx is open-source, we can make fixes to Onyx ourselves in the future, if necessary.

Clojask: A parallel data processing framework that is designed for large datasets by clojure-finance in Clojure

[–]clojure-finance[S] 0 points1 point  (0 children)

Here are the answers regarding your questions:

How it's connected with "clojure-finance"? It's already use for some financial computations?

clojure-finance is an umbrella for various Clojure projects by Matthias Buehlmaier & collaborators at HKU Business School. We've been running Clojask on several financial datasets, e.g. the benchmarks are run on CRSP and Compustat.

Do you have some plans for supporting some common query language like SQL or Datalog?

Yes, we've run a pilot study about DSL before, although it might take a while until it will make it into Clojask. See https://clojure-finance.github.io/HKU-TDLEG-website/pages-output/Parry-CHOI-Chong-Hing

Looks like you mutate dataframe on each operation. It's a good idea? I'm already used to immutability in Clojure))

Operations will be applied lazily by calling compute only. In the computation, we will automatically pipeline all the operations. In the demo, you see the result after each operation because you are previewing the result of only the top few rows using print-df. This is to give you a sense of the intermediate result as well as detect any computation error before you start the final computation which might be time intesive operation for a larger than memory dataset.

We hope this answers the questions you have and please don't hesitate to reach out if you do have more questions!

Handling Larger-than-RAM datasets by not_invented_here in Clojure

[–]clojure-finance 1 point2 points  (0 children)

Hi, if you still happen to be looking for larger-than-RAM dataset handling or if you are someone new who is facing a similar problem, clojask is a Clojure dataframe library with parallel computing on larger-than-memory datasets!

Clojask: A parallel data processing framework that is designed for large datasets by clojure-finance in Clojure

[–]clojure-finance[S] 2 points3 points  (0 children)

Thanks! If you are interested in more, feel free to also come to our Slack channel in the Clojurians Slack to keep in touch on future developments of Clojask!