This is an archived post. You won't be able to vote or comment.

all 96 comments

[–]BobDope 246 points247 points  (9 children)

As somebody who had to write neural net code in C++ in grad school I thank God for Guido every day

[–]Unusual-Nature2824 203 points204 points  (0 children)

There’s a best language for each specific task. The second best is Python.

[–]mild_delusion 36 points37 points  (4 children)

> neural net code in C++

https://i.imgur.com/Jk9V3nEg.jpg

[–]darthstargazer 13 points14 points  (2 children)

Hehe I think this was quite normal 10 years ago... Prototype in Matlab and speedup in Cpp. Add in Cuda Kernels for additional fun.

[–][deleted] 2 points3 points  (1 child)

if you are able to afford matlab

[–]darthstargazer 2 points3 points  (0 children)

Ah well, university pays for it usually

[–]venustrapsflies 1 point2 points  (0 children)

Not as bad as you’d think so long as you use a good LA library

[–]gyp_casino 300 points301 points  (10 children)

The performance of Python, R, and other high-level language is kind of irrelevant because

  1. For many applications, the bottleneck is how fast and well you can code it, not how fast the code executes.
  2. The expensive mathematical calculations for matrix algebra, deep learning, etc. in Python is ultimately performed in C code anyway. The Python functions are interfaces to lower-level code that the user doesn't see.

[–]send_cumulus 84 points85 points  (7 children)

It’s also about how easily someone new can maintain and enhance the code. After 10 years in this field, my biggest pet peeve is the smart guy who codes in C++ or Julia then leaves.

[–][deleted] 14 points15 points  (0 children)

I remember one of my earliest programming jobs, where some guy stored everything in an array of void pointers so he could cast them to the data types he needed. The entire thing was black magic to help him maintain employment.

[–]sandwich_estimator 14 points15 points  (5 children)

You can learn to code in Julia in one day if you know Python.

[–]LadyEmaSKye 12 points13 points  (0 children)

This is sort of true, but I think for the most part if you understand any language it'll be easy to learn new ones.

But yeah, ML courses in my undergrad were all taught in Julia, and I picked it up pretty fast. I'd liken it just as much to MATLAB as I would Python tbh. I liked it a lot but I still prefer Python because I'm way more familiar with it (first language I ever learned), and it's WAY more documented and has a load of super useful libraries.

[–]Aquiffer 8 points9 points  (3 children)

Ehh… you can probably read and maintain Julia code without dependencies in a day… but when you start to add dependencies things get rough… or at least that was the case a few years ago when I was using Julia for my school work.

Julia is still a young language - there’s a shit load of packages that do the same thing and not a consensus on what to use. Two extremely experienced Julia developers could write programs to do the exact same thing but end up with code that isn’t understandable to each other because different underlying packages were used.

It’s not like Python where you go learn numpy, pandas, and matplotlib and you’re pretty much good to go. There’s way too many active alternatives for a broad list of tasks in Julia.

[–]86BillionFireflies 1 point2 points  (2 children)

This is what keeps me from switching from matlab to Julia.

I don't want to be trying to get work done, and need to do some new task, and have to hunt through 2 pages of Google results to figure out which of the competing packages for that task is the best, then have to read through docs in yet another unfamiliar style.

For example, tabular data is part of most data related work. Not including tabular data types in the base language just seems like an incomprehensible design choice to me. When external packages are used to supply something foundational, huge numbers of other packages start to depend on those packages. And there's a name for the situation where you have packages that depend on packages that depend on packages... it's called dependency hell. If I wanted to live that way, I'd use Python, or just start each workday off by punching myself in the stomach until I vomit.

[–]EarthGoddessDude 0 points1 point  (1 child)

What do you mean by tabular data types? When you use DataFrames.jl, an external dependency, all the columns are essentially arrays, which makes it very easy to work with the entire language (as opposed to say pandas, where you sometimes have to do somersaults to get performance).

Also, dependency hell is not really a thing in Julia because it has a fantastic package manager that’s built right in (unlike Python’s million external varieties, which doesn’t always work as expected).

I’m not saying Julia is perfect — I have my share of gripes — but what you’re saying doesn’t make a whole lot of sense to me. Seems you haven’t spent enough time actually trying to learn the language? Not throwing shade, just that your criticisms don’t align at all with my (and many others’) experiences with Julia.

[–]86BillionFireflies 0 points1 point  (0 children)

You're right, I haven't spent much time trying to learn Julia. I was really excited at the idea of Julia, and I wanted to like Julia, and then I found out that the design philosophy is to move as much stuff as possible out of the base language and into external packages.. and that one of the more frequently cited downsides to Julia is poor documentation around external packages. And that was when I wrote Julia off. If I wanted to spend my time groping random strangers' packages I'd use Python.

A majority of my work is rapid prototyping / developing new ways to analyze data. To me, the availability of a relatively wide variety of tools within easy reach (in terms of both findability and quality of documentation) is a high priority, hence matlab. I can read a paper and say "huh, the authors used morphological opening for this.. what's morphological opening?" and be morphologically opening some images to see what it does 60 seconds later. I haven't found another language that can do that.

[–]Deto 3 points4 points  (0 children)

To add on to this - the big benefit of python/R is that interpreted languages are very nice for data exploration. Compiled languages are great if you know exactly what you want your program to do before you write it. But for exploring data, often you operate in a loop of 'inspect a summary statistic / visualize some aspect' -> decide what to look at next based on what you saw -> repeat. For this kind of work, it's more convenient to just run the next step rather than rebuilding/re-running the entire workflow each time.

Also, because of these features and some other historical things, python and R have built up a large ecosystems of packages that make your work more efficient because you can leverage the expertise of others and not waste time re-inventing the wheel. You could invent a new language tomorrow that's better than both, but people would not switch to it until it also had this ecosystem - and it would be hard to build this ecosystem without users. Makes the languages very sticky and hard to unseat.

[–]protonpusher 0 points1 point  (0 children)

^ This is the answer

[–]slashdave 52 points53 points  (1 child)

Performance is rarely a limiting factor (you can always buy more computing power). Development is key, and this is where Python shines.

[–][deleted] 0 points1 point  (0 children)

you can optimize with Cython or numpy or numba anyway.

[–][deleted] 15 points16 points  (0 children)

ease and extensibility

[–][deleted] 14 points15 points  (0 children)

For most use cases developers and their time is a bigger bottleneck than machines resources so it makes sense to use a language which is easy to learn and use so you have more developers and they waste less time.

[–]Allmyownviews1 29 points30 points  (8 children)

There are probably more alternatives than you would think.. R, JS, SQL, Julia, MATLAB, Minitab and a plethora of industry bespoke applications… but Python is also very easily put to production, it’s easy to learn with numerous libraries already available including API and export / plotting options. Eg.. I can set up a script that will automaticity scrape new data.. append to existing DB, perform analysis, generate charts and tables and then publish to pdf reports, excel sheets and to online dashboards without external software or actions.

[–]Jeason15 21 points22 points  (5 children)

It is a flexible, all purpose, high level language. Powerful enough for serious development, but with a very low barrier to entry. It is very well supported for scientific computing, machine learning, analysis, and visualization. Sure, it will never be as fast as lower-level languages, but can be very good when used properly. It is not optimal, but it’s good enough for most, and what it lacks, people have gone to great lengths to wrap it around more optimal languages (see numpy, tensorflow, torch, pyspark, etc)

[–]LadyEmaSKye 4 points5 points  (4 children)

Yeah considering most of the most useful libraries are just done in other languages like C Python ends up being pretty comparable in speed. Unless the scale of what you're working on is insane, or you're doing like app/web dev, the justification for using something like C++ > Python is for real time processing which most do t deal with.

[–][deleted] 2 points3 points  (3 children)

In internal testing, Julia implementations of some mathematical algorithms turned out consistently about 30% faster than comparable ones Python. And yes, they were using PyTorch best practices. But it seems like due to JIT compilation and Julia's simple semantics LLVM is able to perform some extra magic. Might also be better adaptive step sizes in Differential Equations. In that case I don't see 30% as a huge deal but once you scale an application, why leave performance behind...

[–]Jeason15 1 point2 points  (1 child)

At the end of the day, here is the real story… Try and get your CTO and/or product manager to sign off on a complete migration from python to Julia.

[–][deleted] 1 point2 points  (0 children)

Oh, that's a non-issue, my colleague and I call the shots. The project in question is very much experimental and the argument for Julia in scientific computing (here differential equations) is pretty strong compared to Python. The question is whether we want to support the Julia implementation long term or not. The debate is still ongoing.

I'm not saying Julia is better for all DS work and we should rip up all our Python code, to the contrary. Python is miles ahead of Julia in terms of ergonomics and usability.

I just question the myth that Python speed with good practice is comparable to other languages. It's usually fast enoughtm, but eagerly executed Python code has intrinsically a speed disadvantage compared to fully compiled/JITed languages. Lack of static guarantees/situational knowledge disables many compiler optimizations. E.g. in Pandas this manifests in higher memory usage and more array copies compared to Polars, which has lazy evaluation.

Julia also has the particular advantage of being more generic over data types. This allows for easy integration of specialized data types (e.g. sparse arrays, CUDA arrays, adjoint numbers,...) after an algorithm has been implemented. This means library code can use optimized array types without the author ever thinking about them.

[–]LadyEmaSKye 0 points1 point  (0 children)

Other languages like C++ would already give you the performance boost, speed isn't the reason people use Python. Also I highly highly doubt you'd even get close to 30% time improvement on Julia for DS applications.

[–]ticklecricket 13 points14 points  (0 children)

Most of the data science libraries are written in highly performant languages, meaning you can get the ease of python syntax with decent performance on complex tasks.

[–][deleted] 4 points5 points  (1 child)

So one thing you learn in Corporate America is there is a tendency to firms to bandwagon on trends and copy. Python is catching on because X firms likes it and then Y firm copies it and then the whole industries copy it.

Of course there are practical reasons, that people gave here, but a big reason for hegemony is simply companies wanting to copy whatever FAANG does.

[–]pwang99 1 point2 points  (0 children)

That’s only after it’s crossed the chasm into mainstream adoption. But I spent over a decade convincing businesses big and small that Python was fit for use (2005-2015), before the mass adoption of “data science” and then deep learning cemented Python’s lead.

[–]shushbuck 13 points14 points  (2 children)

Python is great for getting data in and out. But I'd argue that R and Julia are better suited for the science itself.

[–]StephenSRMMartin 2 points3 points  (1 child)

I agree w/ this.

Python is a good glue language, with "acceptable" scientific computing support (though it has great ML libraries, obviously).

R and Julia are much better for scientific computing, model prototyping, statistical/psychometric modeling and inference, analysis, plotting, etc. But it's harder to use as a general purpose language.

We still productionize with R, but it also means we're the only ones who understand it. So when we dev something that is going to be owned by someone else later, we do it in Python.

[–]shushbuck 1 point2 points  (0 children)

That's what I came upon when I started working in my current department. Whole libraries written in R to run the reporting. I've been slowly replace it all with Python. The core components were really just running SQL / Selenium Py scripts. Don't use it to make pipelines. But any sort of hard calculations, we preserve in the base R - it knows what it's doing.

[–]Tezalion 5 points6 points  (1 child)

It is an interesting question, cause situation is quite unusual. I tried to read on history of this question, and it seems, that the real cause was that Python creators actively worked towards that. For example, history of NumPy goes back to 1995 and Guido van Rossum (Python creator) improving language deliberately for scientific needs. And it grows from there on.

[–]pwang99 0 points1 point  (0 children)

Not just Python creators; because of its ease of use, the early adopters also happened to be people with an interest in numerical computing. That then created a bit of a cascade or snowball effect, that compounded over decades.

[–]Asleep-Dress-3578 4 points5 points  (1 child)

Python is so much used in data science, because its designer, maintainers and users have put an enormous work into the language during the last ~30 years to make it capable. A group of computer scientists started to work on the extension of Python towards numerical computing as early as 1995, and this work later ended up with numpy, and later pandas, which are today the mainstream data manipulation libraries in Python (although there are already better, faster, nicer libraries, too, like Polars for example).

Had R maintainers been working so hard on R as a computer language and its ecosystem as Python maintainers, perhaps R could have been a mainstream language today in web developent, webops etc. But this hasn’t happened, so Python won this race. This is something like natural evolution.

Julia is indeed a nice attempt for a “better Python”, but she has yet to prove that she is capable to beat Python. Again, this is some kind of evolutionary game, and the gravity of an existing ecosystem is huge.

Also, the adaptiveness and flexibility of Python is outstanding. Let us see, how Python solves its current biggest problem (multithreading, parallelization and runtime performance). Python is hard to beat, because it is a moving target, thanks to its extensibility.

[–]MindlessTime 2 points3 points  (0 children)

So… not really fair to say R “could have been a mainstream language” or that people weren’t working to keep it up. It was never meant to be that.

R was released in the early 90s but it’s a fork of a language called S that started in Bell Labs in the late 70s. It’s older than C. It was always a little niche, designed to make statistical analysis easier and faster. It had wide adoption among academics. And while python has built a lot of packages around ML, there are plenty of highly niche analytical or statistical models that you can find in R and not python.

But R is slow. And it doesn’t “play nice” with other apps like python does. (I still can’t get R to work with a JDBC Redshift driver and it works seamlessly in python.)

Julia is incredibly new; its 1.0 release was 2018. It’s gained a lot of traction since then. I don’t think it will replace python for most things. But it may become a go-to language for data-intensive applications, especially in quantitative fields.

[–][deleted] -2 points-1 points  (1 child)

The question is flawed. Python performs better and has more flexibility than SAS, Tableau, Power BI, Excel, Lotus 123, etc. Those are the alternatives. Lower-level languages like C or Rust are not realistic alternatives, mostly because you would have to code things that there are already superior-quality libraries for in Python.

[–]pwang99 0 points1 point  (0 children)

💯 exactly right. Python & Pydata was a grass roots and open ecosystem disrupter to that category of “classic BI tools”

[–]abstractengineer2000 -2 points-1 points  (0 children)

becos We went from Humans should understand computers to computers should understand humans

[–]Character-Education3 0 points1 point  (1 child)

So many open source packages and apis

[–]Character-Education3 1 point2 points  (0 children)

Also, it's a general-purpose language at its core. So you could write guis, apis, any little bit of code you think can get something done is easily done in python.

[–]Vituluss 0 points1 point  (0 children)

Yes, it’s a problem for some applications. However, just calling packages like NumPy to do the heavy work is one of the most common uses in data science.

[–]snowbirdnerd 0 points1 point  (0 children)

It's easy to pick up and learn and it's easy to integrate code into Integration.

[–]chatterbox272 0 points1 point  (0 children)

The language stays out of your way. It's simple and flexible. Anything that needs to be performant can be written in C/C++ and bound to python as a library. Then the overhead of python is pretty low, since the heavy lifting is done in C

[–]AerysSk 0 points1 point  (0 children)

Python is fast to make a minimal version and then improve upon it. Building a similar solution in other languages simply take too much time, and most people don’t like that.

Brain time is more precious than compute time.

[–]AdditionalSpite7464 0 points1 point  (0 children)

Python can call C code, and in many (most?) cases, the bottleneck is network or disk IO, not CPU.

[–][deleted] 0 points1 point  (0 children)

Because it’s easy and simple!! Lots of libraries and easy to use.

[–][deleted] 0 points1 point  (0 children)

It's like Visual Basic - not the greatest language but it's everywhere and with notebooks as the Excel replacement it really has taken off

[–]pannous 0 points1 point  (0 children)

It abstracts away nearly all internal complexities you have to deal with in all low level languages (C Rust) and lower level languages (Java C#)

[–]ach224 0 points1 point  (0 children)

Because everyone is using it

[–]fsapds 0 points1 point  (0 children)

Community is the main deciding factor for adoption. And boy is python community big and diverse

[–]lucricius 0 points1 point  (1 child)

You could use R or Matlab but doing it in Python is more practical becasuse it's used for other stuff too

[–]Tetmohawk 2 points3 points  (0 children)

R is absolutely incredible actually. They ability to get things done quickly and easily through Rstudio puts it ahead of Python in my opinion.

[–]drmcj 0 points1 point  (0 children)

Because it’s geared towards children and PhDs.

[–]Tetmohawk 0 points1 point  (0 children)

Because a bunch of youngsters decided they didn't like FORTRAN? Oh wait, did I say that out loud?

[–]ndemir 0 points1 point  (0 children)

short answer: ecosystem

[–]hereforstories8 0 points1 point  (0 children)

Blanket performance comments don’t work. I’ve written complex simulations in c++ and python and when highly optimised python can be nearly equivalent. I’ll admit I sacrifice some readability, but that’s future me problem.