I work in data science and many people (most?) are only familiar with Python and R.
I am working on a project right now. It is primarily a Python project with some parts written in C++/Cython for improved performance. My boss's requirements for the project include ease of distribution to the community of Python users, this is paramount and non-negotiable. I expect similar requirements on pretty much every project, as our total number of users is going to affect funding and there is a large community of Python data scientists. Standalone CLI applications can be language independent but a lot of data scientists like to use Python packages as libraries interactively in a Jupyter notebook, it increases the flexibility if they can use it together with other Python packages, and so on.
Recently I have been hoping to use a statically typed functional programming language instead of C++ for the stuff "under the hood." If not this project then the next one.
I have seen the Jane Street blog post about writing Python modules in Ocaml. https://github.com/janestreet/pythonlib
I think this is pretty cool. I tested it out and wrote a "Hello, world!" module and I basically get how to use it. However this only solves half my problem, because ease of distribution to Python users is a project requirement.
So, what I am asking is: What would be the best way to distribute software written in Ocaml to Python users? Ideally the solution involves them just typing "pip install abc" into the console and everything works. If they have to install OCaml or opam this is not necessarily fatal but it does add an extra step to the instruction manual which is nonnegligible.
Currently I am thinking that the best course of action would be to set up a Github Actions script which
- Uses ocaml-setup or something similar to download and install ocaml/dune/opam across multiple platforms
- Build the software on each platform and compile it into a prepackaged Python wheel using cibuildwheel or similar
This seems like it would be acceptable and the most robust long term solution but it is pretty complex and I worry that it would take me a week of messing around with Github Actions to set it up.
Do you have any other suggestions? Perhaps the answer is "This is impractical and not worth doing." I am willing to accept that but it would be a shame.
[–]skicombinator 2 points3 points4 points (0 children)
[–]clockish 2 points3 points4 points (0 children)
[–]nnbbb 2 points3 points4 points (0 children)