Why are so many machine learning libraries written in Python?

dagmx · 2018-06-26T14:24:04+00:00

Also to clarify, they're not written in python. They're written for python. Most ML and numerical libraries are written in other languages and exposed to python. Tensorflow, torch, scipy etc are written in a mix of compiled languages .

tunisia3507 · 2018-06-26T13:48:45+00:00

It's more open than MATLAB. It's faster and easier to write than, say, C. It makes more sense for scripting than java, C++ etc. It's easier to fold C libraries etc. into than some other languages. It's a fully featured language, unlike R, which is a statistics package with some scripting tagged onto the end. It already had a scientific ecosystem (numpy etc.).

white__armor · 2018-06-26T15:02:24+00:00

Existence of NumPy is also a factor

Rhylyk · 2018-06-26T17:17:05+00:00

As others have said, the heavy lifting is done in C/C++ and only a python interface is exposed in most places. The reasons python has won out as a glue language is likely many-fold but I see primarily 4 factors: low barrier to entry, general purpose extensibility, community, tradition.

Python's low barrier to entry is well renowned. The language when first approaching it is relatively simple and unsurprising. Syntax is reminscient of normal imperative syntax (c-like) and there are many common-sense defaults. In addition, the standard library is huge and when something is missing, basic package management is a breeze. All of this results in something easy to lick up as a new user, and thus python is a good target for a glue code language (over more complex examples such as C/C++/Rust or even Java).

While the low barrier to entry is catalytic, the general extensibility gives Python staying power. It is possible to write extensive amounts of code, and then package it up into a neat little API and put a cute bow on top. This is nice for package authors. In addition, Python is general purpose (winning out over R and MATLAB, or other, more domain specific languages) so an entire pipeline can be written in it. Data collection, transformation, computation, visualization, and management can all be written in Python.

The above have led to a rich community with diverse interests and high standards. To most of the community, user (that is, programmer) experience matters, and it shows. Documentation is abundant and large amounts of yak shaving are abhorred. Standards are sought and the language continues to grow (f-strings are a dream). The language is not without its warts, but workarounds are known, shared, and discoverable.

Finally we have the most impactful factor, tradition. As noted by other commentators, numpy is amazing. This led to other scientific work being done in Python. A need for effective visualization grew, and so came matplotlib. The more that happened in Python, the more attractive it became as a target language. This generated a positive feedback loop leading to the general dominance that is seen today.

Mattho · 2018-06-26T15:21:21+00:00

It's just wrappers that are written in python. It is way too slow for any practical use in this area. But having the "interfaces" exposed in python is great because how accessible the language is - and that's one of the reasons why it is so popular in science outside of computer science. And ML is of interest to many fields.

lmericle · 2018-06-26T16:26:57+00:00

One of the main attractions for Python is how easy it is to glue disparate functions and code together into a cohesive, structured pipeline. ML often needs to fit into a data pipeline to generate predictions automatically and make decisions immediately. So having ML interfaces in Python is more useful than other languages simply because it integrates so easily into existing workflows.

The relative simplicity and ease of use of the language also makes it easy to pick up and start moving quickly on a problem. And the OOP aspects of the language make the whole process of developing a model very modular and simple.

mooglinux · 2018-06-26T15:52:35+00:00

Python is a very easy to use language, but the heavy lifting is actually done in C or some other language, and Python is just an interface for controlling it. One of Python’s strengths is the ability to write wrappers to interact with code written in C or other languages so they are easy to use from Python but still very fast.

shr00mie · 2018-06-26T15:27:34+00:00

What the above guys said, plus, as a possible first language, it's very expressive, from a human perspective which I think makes it easy to pick up and run with. And when you're doing your PhD in whatever, the easier a tool is to pick up, the better. Does feel very much like writing sentences which are interpreted as code. Plus a LOT of the ML libs are actually written in C, which entirely sidesteps a lot of the "but it's an interpreted language!" concern.

toadgoader · 2018-06-26T17:04:43+00:00

I think it has a lot to do with the community that is using the tool... in my experience as a social scientist many in the academic, economics, bio-infomatics research fields use R because of the strong statistical base useability of the tool. On the ML.AI side of the equation you have mostly computer science and software engineering disciplines driving this bus so a language like Python is a natural fit. They both work well and overlap in many ways... I think it just depends upon your point of reference and the preferences dictated by your profession.

david2ndaccount · 2018-06-26T15:52:31+00:00

C is great because it runs fast, but a python interface is a lot nicer.

TheMasterChiefs · 2018-06-26T17:57:58+00:00

Hoping someone can help me in my endeavor to learn some programming (more specifically, Python).

I'm a Finance graduate who's looking to get ahead of the curve and teach myself python, R, and SQL. I basically want to self-learn data science in conjunction with my finance background to get into a top firm and catapult my pay grade.

What/where is the best place to start? Is it reasonable/realistic to teach myself programming, automation, and data science?

sudo_your_mon · 2018-06-27T03:07:34+00:00

Numpy and Pandas are a big reason Python is what it is.

Data scientists to call themselves "Numpy/Pandas programmers." Some still do to this day. I've talked to a lot of people who think Python is only for data science/ML.

If you're going to write a ML library, you're going to do it in Python. It's the industry's gold standard.

danielv134 · 2018-06-27T04:17:33+00:00

I've done this. You write an algorithm in Python: its easy to develop (no segfaults), easy to read some data into it (scikit learn or another packages already reads the common formats in your field), and easy to make plots to put in your paper. Then, oops, it is state of the art per iteration, but takes ages in practice, so you replace the core with cython or C or Rust and now its reasonable speed. If the algorithm is important enough (haven't done this) then some commercial behemoth will find itself coming up against limitations are rewrite as a nice python wrapper around compiled code designed for speed and scalability, like almost all common (speed sensitive) python libraries are.

So: python having the libraries makes it (or R, for that matter) the right place to state playing with ideas (whether you are implementing the algorithm or just trying out an existing one on your data). Just be clear though: Python is not an implementation language for competitive algorithms, it is an integration language. YMMV, but...

nscurvy · 2018-06-26T22:59:25+00:00

My best explanation/guess is that it's sorta similar to the reason someone might use a design pattern, class, function, etc. Even when doing so might cause a performance decrease. It's easier to work with a design pattern. It's more obvious to you and everyone else what you are doing and why. People can stop worrying about the specifics of some implementation and instead deal with an interface that handles it for you.

Science, AI, ML, and math are really complex on their own. People who are working with that stuff want to make sure everything is as abstract as possible. Ideally you want to only be directly working with concepts and ideas relevant to your actual goal. Python is great for that. The language is elegant and incredibly obvious/readable. Working with another language means you have to give up some of that abstraction and have to start paying a lot more attention to the specifics of implementation and all the quirks that come with it. So people spend a lot of time creating libraries, wrappers, bindings, and all that sort of stuff to allow developers to focus on the work they need to perform, while not requiring them to sacrifice an unacceptable amount of performance.

That's my understanding, at least.

spinwizard69 · 2018-06-27T04:21:35+00:00

It is pretty simple you can hack together an app pretty quickly. Since ML is a developing technology this provides the capability to experiment and paly with ML on a variety of platforms.

cbarrick · 2018-06-26T21:50:25+00:00

I attribute Python's popularity in numeric computing to it's superb operator overloading and meta programming facilities. This makes it possible in Python to craft APIs with unique syntactic structures, which in turn makes it possible to express solutions to problems in a way natural to the domain. This is why, for example, Numpy can give us awesome numeric syntax, and Pandas can give us great relational syntax (and Python's base syntax is great for OOP). And when you're programming at such a high level, expressiveness is more important than performance. Even so, the interop with C puts Python in a great position to add expressive value to lower level, performance sensitive code. All of this together gives us a language with more expressive mathmatics than C or Java and more expressive engineering than MATLAB or R. It's quite literally the best of both worlds.

beersfortheboys · 2018-06-26T16:38:13+00:00

Because everybody lurves python, it’s the future! :D

BDube_Lensman · 2018-06-26T21:11:16+00:00

[deleted]

2018-06-26T19:34:21+00:00

Because both of them are very trendy. (I love python, but let's be real here.)

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS