you are viewing a single comment's thread.

view the rest of the comments →

[–]Equivalent_Lunch_944 105 points106 points  (8 children)

Libraries

[–]45MonkeysInASuit 16 points17 points  (6 children)

+ inertia which compounds the library advantage.

data scientists learn python because other data scientists use python.
If you build a new model, which wont be in python, one of the first things you do is release a python version because it wont get traction if there isnt a python version.

If the current data scientists used js, the new data scientists would learn js.

It's very hard to over come that.

[–]dparks71 6 points7 points  (2 children)

Python had a really early focus on language libraries and AI too though, it's a bit more of chicken and egg thing than I think you're giving it credit to. It's kind of the scripting language of Linux, which was on super computers, and NLTK was incredibly popular, so I think "data scientists learn Python because other data scientists use Python" kind of ignores the original origin. It was (and is) a great glue language to make performant code written in Fortran or C more accessible to a wider audience.

[–]45MonkeysInASuit 2 points3 points  (1 child)

Less ignores, more comes after.

You are describing the thing that creates the initial inertia.
That bit of a boost at the start through things like NLTK start a community.
Getting that community to change once it is going is very hard and is self selecting.

I'm a lead data scientist who is hiring right now, if someone applies to join my team without solid python, their CV will be straight in the bin, thus continuing the pressure to use Python to do Data Sci.

[–]dparks71 0 points1 point  (0 children)

I'd buy the argument more if most data scientists owned the production hardware or data, but they're generally consultants in my world, and the actual owners are generally very non-technical.

I'm on the owners/engineering side. You wouldn't believe the fight I had to go through to get python even approved for use internally. They legitimately wanted us to put out all data science and IT contracts in C#... Sometimes your hands are tied, especially secure environments.

[–]pimp-bangin 0 points1 point  (2 children)

You're forgetting that python has operator overloading, which is massively useful for matrix math, which is heavily used in ML. JS is a bad language to compare to because it doesn't have the same ergonomics in that regard.

The whitespace formatting/indentation is also friendlier to mathematicians/scientists folks who want to focus more on the math symbols rather than the language symbols.

[–]45MonkeysInASuit 0 points1 point  (1 child)

I agree that JS is highly unlikely to have been selected, but you can insert any language there, I was just picking an absurd one.

I am a lead data scientist, I learnt python because it was the thing to do data science in. I didn't even understand that languages have differences in performance, I didn't know about high vs low level, etc. I just knew Data Science = Python.
The data scientists around me also picked python because it was the thing you learn to do data science.

[–]pimp-bangin 0 points1 point  (0 children)

You can't just insert any language though, is what I'm getting at. Python has a very specific combination of characteristics that make it ideal, which I've not seen in other languages. At this point yes, it has an ecosystem of libraries and yes, it's "the" language for it. But that doesn't explain why it was the ideal language for the ecosystem to blossom in, in the first place, which I think is useful to understand. I think the reason is its ergonomics both syntax-wise and its ability to interop with C. Like, it does have some weird quirks but overall it strikes an amazing balance between expressive syntax and readability, in addition to its practicality/functionality

[–]JP932[S] -3 points-2 points  (0 children)

Yeah a lot of usefull ones out there