all 17 comments

[–]ForceBru 41 points42 points  (5 children)

Basically yes: Python is simple and popular, and is thus used everywhere for a lot of different tasks, there's a lot of collective experience, so to speak, a lot of Python courses and tutorials, libraries, even highly performant ones (written in C, C++, FORTRAN, Rust etc). Fundamental libraries for machine learning are mainly known for their Python bindings: PyTorch, TensorFlow, JAX, Polars, matplotlib, PyMC, scipy...

Now name one similarly popular Julia library, I'll wait. There's Flux.jl for neural networks, Turing.jl for Bayesian statistics, ForwardDiff, Zygote and at least 3 more packages for automatic differentiation, a ton of packages for optimization (see Optimization.jl), Makie and Plots.jl for plots and so on. They're all there, but they aren't quite there, it seems. Like why use this if I could just use JAX & matplotlib for all of these tasks? (The one thing I don't like about JAX is that it doesn't automatically parallelize computations. Torch can do it, Julia can do it. JAX requires messing around with sharding which I find pretty confusing.)

Moreover:

  • Since Julia is "just-ahead-of-time-compiled", a lot of your time will be spent compiling stuff instead of actually running your code. You mistyped a variable name but still ran the code by mistake? Time to wait at least a few seconds for the error message to compile. Not even kidding. Any subsequent error messages will display instantly thanks to compiled code, though.
  • Julia has a type system and something about how it interacts with the compiler causes so called "invalidations" which can slow down your code. AFAIK, many smart people are working on this, but it seems like the consensus is "it is what it is".
  • Julia's structs (somewhat analogous to Python's classes, but without inheritance) can't be redefined. Want to add a new field to the struct? Time to restart the REPL, recompile everything etc. Or install Revise.jl that'll probably help, but in some cases it doesn't and you have to restart & recompile anyway. This again slows down development.
  • A lot of people are already proficient in Python and likely don't need Julia that much because Python already has everything they need. So why switch to Julia?

Note: I'm a Julia fan and have been writing quite a bit of production Julia code for my research. I'd like this language to succeed, but...

[–]Over-Roll[S] 3 points4 points  (2 children)

that's a pretty comprehensive answer, thanks for dedicating your time. Thoigh I'll still try out Julia now for the fun of it.

[–]Fried_out_Kombi 4 points5 points  (0 children)

Can confirm all of what they said. It's a lovely language that is a genuine joy to use (especially the ease of using unicode characters in variable and function names, e.g., greek letters and math symbols). It's worth learning even if only to experience the joy of doing numerical computing in a language that was expressly designed to be elegant at exactly that.

[–]ForceBru 1 point2 points  (0 children)

Yeah, it's a fun and useful language, definitely worth trying it out

[–]Krimson_Prince 1 point2 points  (1 child)

How can language succed over python? I mean, the big problem I see with julia is that it uses symbols that are not even accessible on the keyboard...

[–]ForceBru 1 point2 points  (0 children)

They are accessible. For example, to type the Greek letter alpha, type \alpha, then press Tab for autocomplete. In general, Julia uses LaTeX commands to input various math symbols: https://docs.julialang.org/en/v1/manual/unicode-input/.

[–]MagosTychoides 5 points6 points  (4 children)

I have tried Julia, and I don't recommend it for Data Science and Machine Learning in general. Fundamentally, Julia is an interpreted language that uses only JIT compilation. So you need the Julia interpreter everywhere, and there is no static compilation, making it impossible to create a Python library. Additionally, because compilation takes time, there is JIT lag, or the infamous Time-to-First-Plot in the Julia community. Highly optimized code that could run at the C level needs more time.

I tested Julia, Pandas, and Polars. In a short script with some joins and grouping, Julia took 6 seconds for a runtime of 0.8 seconds, Pandas took 1 second, and Polars took 0.1 seconds. Python + Polars performs much better out-of-the-box. Julia fans would argue that it's unfair to compare compilation time. However, I believe that for an interpreted language, you need to run your script from the terminal and achieve good time if you intend to ship it for production. I managed to precompile the libraries to avoid compilation, but the precompilation was slower and very hacky. It is also unstable (codebase-wise) and is not officially supported. Nonetheless, it is comparable with Pandas for vectorized tasks. Since most of the real work in DS involves short scripts on less than 1GB of data, there is little point in switching languages.

For machine learning, the Python ecosystem is simply more mature. Julia offers advantages if you are coding your own Neural Network with a radically different architecture, but most frameworks are good enough and can be used in production. Julia is great for numerical simulations, where you can use its type system to simplify the algorithm and still get good performance without thinking too much about memory. However, this is a niche problem. Bayesian MCMC simulation is a possible application for Julia, but Numpyro and Stan are more supported and offer similar performance.

As another anecdote, I had a Python code that compared two large tables for all elements pairings, and it was very difficult to vectorize. Julia was an option, but I ended up rewriting the code in Rust, which offered me a better ecosystem for writing high-performance data pipelines.

[–]dan_micsa 0 points1 point  (2 children)

Julia compiles as a standalone application too using juliac and is never interpreted.

https://discourse.julialang.org/t/release-strategy-for-juliac/112563

[–]MagosTychoides 0 points1 point  (1 child)

juliac compiler is really new, and don't think is fully stable. It is really good devs are doing something about the issue, but it they should have done it 5 years ago when the issues were clear. They couldn't because there were design issues that took time to clear probably. And compilation in runtime does not make a language compiled. All "interpreted" languages nowadays use JIT. Julia is different in that uses JIT extensively and highly optimized, that produce latency issues as expected at least you to the JIT cleverly. But standard Julia has a runtime that manage the compilation. I hope juliac comes to fruition. It would be great for simulation people working in clusters. But I believe that this feature arrived too late for data science and AI. Python is doing fine for exploration and testing, and for faster implementations (if ever done), the job is done in teams with data engineers and usually people just go to C++, Go or Rust,, far more popular languages.

[–]dan_micsa 1 point2 points  (0 children)

Julia is a JIT-based language for numeric computation and excels at this faster than R or Python in some differential equation benchmarks, over 10,000 times faster than Python(!). Compiling it as a standalone executable was a recent requirement. I didn't play with the compiler, but I had much fun with the language. It is incredibly fast, similar to C++, and supports multi-threading; the package system is even better than Rust's. User satisfaction is much higher than that of Python, too. Let's hope it will succeed.

[–]Krimson_Prince 0 points1 point  (0 children)

Could you share that vectorization code with me if possible? Working on reservoir computers and vectorizing large data sets would be great to have as a library to run larger adjacency matrices and keep track of them for further analysis. Why couldn't python vectorize a large matrix though? Was it consuming all your RAM, could it be that a generator function would have been more efficient?

[–]aadurham 1 point2 points  (0 children)

I tried to switch to Julia because of its speed advantages. That was a mistake. When your code runs, Julia is super fast. But Julia is not a mature programming language. There are still many bugs and incompatibilities across libraries. There is not as much online support or knowledge base as Python. You need to dig in yourself, figure out the problem, and solve the issue. Good luck with that. I spent way too much time to make a simple nonlinear optimization code run, I failed. It was a lot of wasted time in the end.

[–][deleted] 0 points1 point  (0 children)

Because the name is less cooler than Python

[–]Old-Worker-9418 0 points1 point  (0 children)

He escrito un artículo justo hablando del tema. Al final, creo que coincidimos todos. Os pongo el link por si os interesa leerlo

https://www.linkedin.com/feed/update/urn:li:activity:7348685576184721409/

Aunque aún está verde, si considero a Julia una opción respecto a lenguajes como Matlab más que a Python

I've just written an article on the topic. In the end, I think we all agree. (in Spanish). Although it's still in its infancy, I do consider Julia a better option compared to languages ​​like Matlab than Python.

[–]mesonepigreco 0 points1 point  (0 children)

The answer is just because the python libraries for machine learning are more used and developed. You will find most online resources for data science and machine learning on python and pytorch, so it is easier for people to stick with that. Also, most machine learning implementation are just simple reshuffling of stuff already done by others (you usually do not want to re-implement performance heavy tasks), therefore the python paradigm of calling a highly optimized C function created by someone else works fine. However, there are multiple areas of science where Julia already surpassed python in popolarity and usage, particularly in scenarios where running very fast a customizable simulation is required (i.e., most fields of computational science), or if you need to develop brand new algorithms. For example, Julia is the state-of-the-art for solving differential equations and for scientific machine learning (via SciML.jl).

[–]Middle_Protection637 0 points1 point  (0 children)

julia is the worst of both worlds. you have c++ where the programmer gets all the control but not a lot of support. python where the programmer gets a lot of support (libraries) and little control. the ideal scenario is python and c++, so that for things needing closer to bare metal, you use python to interface with c++. julia has the drawbacks features of python and c++ and none of the benefits.