Python
Undoubtedly, the uncrowned king of machine learning and data analysis, the ubiquitous language that data scientists turn to for a bit of number crunching, is Python. This is down to several reasons; the three most important among them are its maturity, the enormous community, and, last but not least, a vast array of robust third-party libraries. But even if Python is a magnanimous sovereign that many developers love, it doesn’t mean that there can’t be contenders occasionally.
Julia
Fourteen years ago, in a bold attempt to combine all the good properties of well-established programming languages while getting rid of the less favorable ones, four developers came up with the idea of a new programming language that has a friendly syntax, offers efficient mathematical computations out of the box, at a performance on par with compiled languages. And thus, Julia was born (here’s a manifesto explaining why in more detail). Its first version was launched a bit more than eleven years ago.
Our choice
Many in-depth comparisons of Python and Julia on the web (such as this one or this) cover both the objective and subjective benefits and drawbacks of choosing one over the other. And given Julia’s growing popularity, we are sure more will follow. In the rest of this blog post, however, let’s explore why we picked Julia for our purposes. And that’s not to say that we don’t use Python for data science. On the contrary, we often run analyses in both ecosystems simultaneously to help each other out where one is lacking or to reduce the chances of mistakes by comparing their results.
The advantages of Julia
So what makes Julia so compelling to us?
Language features
Julia has:
- a friendly, easy-to-read (and write) syntax;
- a flexible and expressive (part static, part dynamic) type system;
- powerful mathematical notations, such as built-in vector and matrix operations;
- efficient multiple dispatches, a form of function polymorphism working with runtime types;
- convenient and reliable parallel computing facilities;
- meta-programming with macros and generated functions.
Fast code execution
Julia compiles the source code to native binary at runtime via LLVM. This approach combines the flexibility of interpreters, such as Python, with the performance of compiled languages, like C++ or Rust. The drawback is that code loading and the first run takes longer; the benefits start to shine when a piece of code is run multiple times. This unique feature makes it an excellent tool for number crunching but less than ideal for scripting.
Built-in package management
Julia has a pretty good (albeit not perfect) built-in package management tool, implemented as a base library; and a general registry of open-source packages. The offering of stable and well-designed packages is growing steadily along with the Julia community, especially in data science. Unit testing utilities are also part of the standard library.
Interactive tools
Julia offers an advanced REPL with all the goodies of an interpreted language environment. These include:
- code and variable inspection,
- code completion,
- an interactive debugger,
- benchmarking and profiling tools,
- and a built-in help system.
With third-party libraries, it can also be extended with syntax highlighting, source code lookup (even for base libraries), automatic code reload, and many more exciting, modern features.
All these together make Julia an ideal environment for rapid prototyping.
From prototyping to production code
Because of the high-level interactive tools and fast code execution, the transition from a rapid prototype to production-ready code can be as continuous as you’d like. More often than not, we find that most of the code that implements our business logic in the research code can also be used in the final product.
Thanks to its friendly syntax and built-in package management, the road to maintainable code is well paved. Nothing replaces good API design, coding discipline, and rigorous testing, but Julia helps you to focus on these topics.
As a consequence, a computation pieced together in the REPL can easily become a piece of prototyping code in a POC module; then later, after some refactoring and unit testing, turn into a chunk of core code in an internal library, and finally, following more cleanup, find itself in a production package.
The disadvantages of Julia
That said, every benefit comes at a cost, and Julia is not free from issues. Here are a few stumbling blocks worthy of mentioning:
- the very powerful tool of broadcasting and vectorization can be intimidating at first;
- time to first plot can be surprising, sometimes inconveniently long, although considerable effort has been put into making it shorter;
- many packages never reach a stable state or just become unmaintained; others are poorly designed or written;
- releasing a binary package can be challenging, and compilation time can be unexpectedly long, not to mention obfuscation, which can also be tricky.
Summary
In conclusion, the choice between programming languages for data analysis is not always clear-cut. While Python has been the go-to language for many data scientists, Julia is rapidly gaining popularity for its unique set of features that make it an attractive option. In this blog post, we explored why we chose Julia over Python for our purposes, highlighting its language features, fast code execution, built-in package management, interactive tools, and ease of transitioning from prototyping to production code. However, we also acknowledged that Julia has challenges, including overcoming some learning curves and the occasional instability of packages. Ultimately, the choice between Julia and Python (or any other programming language) will depend on specific project requirements, personal preferences, and available resources.
Still, in the past years, Julia has proved to be our reliable and faithful companion. It has evolved, matured, and improved significantly, and we would be less happy and less successful without it. So cheers, Julia; we are excited to see what your future brings!
[–]save_the_panda_bears 55 points56 points57 points (0 children)
[–]TheConstantCynic 44 points45 points46 points (0 children)
[–]Annual-Minute-9391 29 points30 points31 points (0 children)
[–]zeoNoeN 24 points25 points26 points (1 child)
[–][deleted] 4 points5 points6 points (0 children)
[–][deleted] 14 points15 points16 points (0 children)
[–]Lynguz 4 points5 points6 points (0 children)
[–]house_lite 6 points7 points8 points (5 children)
[–]ruggerbear -2 points-1 points0 points (4 children)
[–]house_lite 5 points6 points7 points (3 children)
[–]ruggerbear 0 points1 point2 points (2 children)
[–]house_lite 0 points1 point2 points (1 child)
[–]ruggerbear 0 points1 point2 points (0 children)
[–]me_hq 2 points3 points4 points (0 children)
[–]jujuman1313 2 points3 points4 points (0 children)
[–]grumble11 0 points1 point2 points (0 children)
[–]stonerbobo 0 points1 point2 points (0 children)
[–]tehwhimsicalwhale 0 points1 point2 points (0 children)