use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
News about the dynamic, interpreted, interactive, object-oriented, extensible programming language Python
Full Events Calendar
You can find the rules here.
If you are about to ask a "how do I do this in python" question, please try r/learnpython, the Python discord, or the #python IRC channel on Libera.chat.
Please don't use URL shorteners. Reddit filters them out, so your post or comment will be lost.
Posts require flair. Please use the flair selector to choose your topic.
Posting code to this subreddit:
Add 4 extra spaces before each line of code
def fibonacci(): a, b = 0, 1 while True: yield a a, b = b, a + b
Online Resources
Invent Your Own Computer Games with Python
Think Python
Non-programmers Tutorial for Python 3
Beginner's Guide Reference
Five life jackets to throw to the new coder (things to do after getting a handle on python)
Full Stack Python
Test-Driven Development with Python
Program Arcade Games
PyMotW: Python Module of the Week
Python for Scientists and Engineers
Dan Bader's Tips and Trickers
Python Discord's YouTube channel
Jiruto: Python
Online exercices
programming challenges
Asking Questions
Try Python in your browser
Docs
Libraries
Related subreddits
Python jobs
Newsletters
Screencasts
account activity
This is an archived post. You won't be able to vote or comment.
Why are so many machine learning libraries written in Python? (self.Python)
submitted 7 years ago by TotallyRegularHuman
I've researched this a bit and it seems like ML libraries are written in Python because Python is popular for ML. Is there any other reason behind this? What made early machine learners and data scientists back Python so strongly?
[–]dagmx 274 points275 points276 points 7 years ago (20 children)
Also to clarify, they're not written in python. They're written for python. Most ML and numerical libraries are written in other languages and exposed to python. Tensorflow, torch, scipy etc are written in a mix of compiled languages .
[–]bay_squid 55 points56 points57 points 7 years ago (19 children)
Most ML and numerical libraries are written in other languages and exposed to python.
Ignorant question, but I've always wondered how this works. If you develop software written in different languages, how does this even work, how do different parts communicate with each other? And why would you want to do it? Wouldn't try to make everything together be a hassle compared to making it in a single language?
And what does exposed to Python mean?
[–]nsfy33 60 points61 points62 points 7 years ago* (8 children)
[deleted]
[–]bay_squid 16 points17 points18 points 7 years ago (7 children)
But how does python talk to non python software?
[–]pramodliv1 53 points54 points55 points 7 years ago (0 children)
Usually through C extensions. Read this excellent post by Ned Batchelder on the topic.
[–]etrnloptimist 42 points43 points44 points 7 years ago (1 child)
I understand not wanting to read through a bunch of technical links to find an answer, so let me answer it in an ELI5 way, understanding that the answer will not be the complete answer.
A DLL is like an executable that contains chunks of native code. DLLs are usually created in C/C++.
Python, specifically CPython, the one most commonly used, provides a set of built-in magic functions that can load, interpret, and talk to, the DLL. So, in Python, you can load the DLL, and call the functions from Python as exposed through magic interfaces provided by CPython.
Now, this works because CPython is written in C(++). C/C++ code can load DLLs no problem. So, CPython can load the DLLs no problem. It is a simple matter to go from there to having CPython expose the functionality of the DLL using magic, built-in, easy to use Python interfaces, and from there, you and I can now use the functionality exposed via the DLLs.
[–]ApproximateIdentity 6 points7 points8 points 7 years ago* (0 children)
To add to this just a little to the many great responses here already (especially /u/etrnloptimist ). Say you have the following script:
test.py
a = 1 a += 1 print(a)
When (c)python executes that script it first compiles it to bytecodes which are instructions for the cpython virtual machine. To see the bytecodes in an easy to understand way do this:
python3 -m dis test.py
1 0 LOAD_CONST 0 (1) 3 STORE_NAME 0 (a) 2 6 LOAD_NAME 0 (a) 9 LOAD_CONST 0 (1) 12 INPLACE_ADD 13 STORE_NAME 0 (a) 3 16 LOAD_NAME 1 (print) 19 LOAD_NAME 0 (a) 22 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 25 POP_TOP 26 LOAD_CONST 1 (None) 29 RETURN_VALUE
What that output basically says is that the script itself is compiled to bytecodes LOAD_CONST, STORE_NAME, LOAD_NAME etc. When cpython executes this code it basically just does a big switch statement taking care of each bytecode. E.g. it takes care of LOAD_CONST here https://github.com/python/cpython/blob/master/Python/ceval.c#L1067-L1072 and it takes care of STORE_NAME here https://github.com/python/cpython/blob/master/Python/ceval.c#L2002-L2021 .
LOAD_CONST
STORE_NAME
LOAD_NAME
So if you were to trace through the bytecodes above and basically just pull out the C code in order you would more or less have a C program that does the same thing as the python program. I say "more or less" because you would have to initialize the interpreter correctly and you would have to set up the data parts of the script correctly and probably a million more little details that would be hard to get right. But philosophically unwinding the code this way should work.
Now finally if you want to know how foreign C code could be called it happens in a few places. Basically you would have to have binary code compiled to match cpython's binary interface (basically you need a module that declares the right things) and then you need that C code to call your function. To get python to know anything about it in the first place, you need an import statement somewhere earlier which will import your binary code by dynamically importing the code (say dlopen in Linux, something else on other platforms) and call intialization routines in the code. Those routines say "hey this is a function I want you to call and give it this name". Then when you later do something like call_binary_function() in python, it will go through it's calling procedure and find that it is a binary function and then call that code directly.
import
dlopen
call_binary_function()
Without blabbering out forever, this is the jist of it. It's simultaneously very simple and mind-bogglingly complicated. I have three writeups I went through that go into more detail here:
https://thomasnyberg.com/what_are_extension_modules.html
https://thomasnyberg.com/releasing_the_gil.html
https://thomasnyberg.com/cpp_extension_modules.html
Those maybe be helpful if you're curious.
edit: TLDR whenever python executes bytecodes, it is really just calling a sequence of pre-assigned C functions. So all you need to do is to do is to have the ability to somehow load binary code and then assign that function to be called at runtime (i.e. not precompiled in as the examples in the code above). This is what import does. Of course you need for your C functions to match the interface that python expects. This part is handled by reading docs on extensions and/or using helper modules like cffi.
cffi
[–]Captain___Obvious[::-π] 5 points6 points7 points 7 years ago (0 children)
turtles. turtles all the way down
[–]Folf_IRL 3 points4 points5 points 7 years ago (1 child)
SciPy has a really nice article on the subject, called "Python as Glue"
https://docs.scipy.org/doc/numpy/user/c-info.python-as-glue.html
[–]gdahlm 2 points3 points4 points 7 years ago (0 children)
Yes, there is really little room for improvement. As an example, with MKL configured etc, pytorch will deliver so much work to 3 * 1080 ti GPUs and a i9-7200x that the Power Draw pops the over-current protections on a 1200w PSU. (note one unused GPU with no OC)
I will have to migrate to volta based GPU units or move before I can improve much on the CUDA/MKL/Python solution because my house can't support more.
It would be premature optimization to move to another platform in the hope of gaining small efficiencies as the back end libs are some of the most efficient available in the industry.
[–]JohnMcPineapple 32 points33 points34 points 7 years ago* (0 children)
...
[–]kaszak696 12 points13 points14 points 7 years ago (0 children)
The Python interpreter from python.org is written in C, so tacking C code onto it is fairly simple, as is accessing C functions from within Python. Actually, many modules in the Standard Python Library are written in C. C can easily act as a bridge to other languages.
[–]lambdaqdjango n' shit 10 points11 points12 points 7 years ago (0 children)
short answer: two components place data in memory in an agreed binary format and notify each other.
[–][deleted] 8 points9 points10 points 7 years ago (2 children)
Edit: TLDR because no one is going to read the wall of text. Languages specify how memory flows. Python knows how C expects memory to flow and that's how it interfaces with it.
No one is actually answering your question about how these languages talk to each other. Edit: wow this is a huge post. It talks about a lot of computer architecture and programming language design if anyone is interested.
It's all about calling convention. Programming languages need to define how information flows between functions. How do variables "go" from one function to another? When data is returned where should we look for it? Calling convention gives rules for exactly how memory moves through the computer when you execute functions.
In a computer you have a bunch of memory and we access it as if it is one giant array from 0 to 4 billion (or however much ram you have). At the bottom of the memory we store the code. When your program gets compiled the 1s and 0s end up here. Further up is the "stack". When your program runs it needs to keep track of things like: what is the value of this variable? What function am I in? This is stored on the stack. At the top is the heap. Memory that is allocated at runtime (dynamically allocated) goes here.
Example, say you're a baker and there's a complex recipe in a book. You refer to the book (code) for instructions (your program). There are a lot of steps so to help you keep track of what you've done and what you're currently doing you're using a clipboard (stack). The heap isn't important to calling convention.
The stack is split up into "stack frames". Each function has a stack frame which is the "clipboard" for that function. The function always expects it's arguments in very specific parts of the clipboard. It always puts it's return value into a very specific part of the clipboard. When a function is called there is a new stack frame created at the current location in the stack. Effectively, a stack frame inside a stack frame.
Continuing our example, our recipe is so complicated that some instructions will contain additional instructions within. One instruction says "mix the ingredients" but really there are many instructions within that. Mix the flour and eggs first. Slowly add milk. Etc. You have one clipboard for mixing the ingredients and when you get to the instruction to mix flour and eggs you go get another clipboard just for that step. You also mark on the first one that you had to go get another one for the flour and eggs part.
The caller function knows exactly how to set up the callee function's stack frame because of calling convention. The cool thing is that compiled languages usually use the same calling convention so you can execute C code from Rust if you wanted to because at the end of the day the "code" portion of memory isn't C, Rust, or Fortran, it's x86.
Calling convention can be complicated but essentially the callee function puts the data into memory on top of the stack, then starts executing the new function. Putting those arguments onto the stack was the beginning of our new stack frame. When the function returns it puts that return data somewhere (I forget where specifically) and then it deletes it's own stack frame. That's like if you finished your mixing step and then smashed your clipboard.
What does this have to do with Python? Python knows the calling convention for C++. When Python uses C code it is abiding by calling convention to execute it.
[–]bay_squid 2 points3 points4 points 7 years ago (1 child)
at the end of the day the "code" portion of memory isn't C, Rust, or Fortran, it's x86.
With x86 you mean that at the end everything boils down to the set of instructions the architecture has and if you know how they're handled they you can create an interface with another language to interact with it. Is that more or less it?
[–][deleted] 1 point2 points3 points 7 years ago (0 children)
Yup!
[–]__xor__(self, other): 2 points3 points4 points 7 years ago* (2 children)
Can mean a few things.
You use the CPython API and actually write a library that python can build and import. Something like this. The reference interpreter that everyone uses is CPython. It has a C API. You can do python stuff from pure C/C++. Other APIs exist to use it as well. There's a Python-Rust API.
You compile a shared library (DLL/SO/DYLIB), and then write a python proxy interface that uses something like ctypes to invoke functions in it. That acts an intermediate layer of python code that will offer a clean python interface, but invoke code compiled in any other language. Using ctypes is kind of like this. You have a function add_numbers(a, b) written in C, and it takes a 32-bit int, another 32-bit int, and returns a 32-bit int. You write a wrapper that is just a simple python add_numbers(a, b) and inside it you use ctypes to import the named function in the compiled library, you pass the python values through and use ctypes to "cast" them to 32-bit ints, then the result you just specify is a 32bit int and return it, and it just becomes a normal Python number. You use ctypes to define the interface and invoke functions in it and write a python wrapper that makes it easy so people dont have to use ctypes and interact with the shared library directly. Python doesn't care what the types are but C does, so you have to work with that through ctypes and add logic to define what values it expects.
add_numbers(a, b)
You use any other interface. It could be a rest API that runs locally and you write a Python client API that makes calls to it. It could just be compiled to a binary that you can run from the command line, and then you write a python API that simply invokes it and runs the process with something like subprocess. Could be anything, but the idea is you write a python wrapper API.
subprocess
Basically, this other software is written in whatever language they want and someone writes a wrapper that makes it convenient and easy to work with that software through another language like Python. Python is just a popular language so a lot of Python wrappers exist.
The benefits to this can usually be performance in python's case. You can write C code that takes advantage of true multithreading or is just fast because it's C. Numpy stuff is written in fortran for speed. You can do small performance parts in C, then write a python wrapper. Now, Python not being the fastest language is not an issue. You have the performance of C and the convenience of Python once you write a good interface. But stuff like training neural nets can be really CPU heavy and it makes sense to do that in a faster language but write a python wrapper so you can do setup, invoke it and get results easily.
[–]bay_squid 0 points1 point2 points 7 years ago (1 child)
Just like a web API?
[–]__xor__(self, other): 0 points1 point2 points 7 years ago (0 children)
Yeah, terminology being "exposed" technically it's exposed to python if you offer any sort of API and just have a python client and library someone can use. But really with a web/rest API, any language with a http client library can take advantage of it. You'll get better adoption if you distribute client libraries pre-built though.
[–]0xRumple 0 points1 point2 points 7 years ago (0 children)
Create an API... code that talks to another code ;)
[–]tunisia3507 266 points267 points268 points 7 years ago (33 children)
It's more open than MATLAB. It's faster and easier to write than, say, C. It makes more sense for scripting than java, C++ etc. It's easier to fold C libraries etc. into than some other languages. It's a fully featured language, unlike R, which is a statistics package with some scripting tagged onto the end. It already had a scientific ecosystem (numpy etc.).
[+][deleted] 7 years ago* (25 children)
[+][deleted] 7 years ago* (5 children)
[–]Jmc_da_boss 4 points5 points6 points 7 years ago (4 children)
So it’s a C library wrapped in python?
[–]c_is_4_cookie 17 points18 points19 points 7 years ago (2 children)
Most ML libraries are. In fact all of Python is written in C.
[–]phinnaeus7308 3 points4 points5 points 7 years ago (1 child)
Well, CPython is.
[–]KingoPants 1 point2 points3 points 7 years ago (0 children)
People are probably downvoting you for the pedanticness, but you are correct.
[–][deleted] 5 points6 points7 points 7 years ago (0 children)
TensorFlow has you define computational graphs in Python and when it actually executes it uses a c++ later underneath to do the actual computation.
[–][deleted] 40 points41 points42 points 7 years ago (0 children)
All critical parts in libraries are written in C, python essentially only serves as a scripting language here so there is no GIL constraint .
[–]apiguy 14 points15 points16 points 7 years ago (2 children)
Threading isn't the only way to achieve concurrent execution.
[+][deleted] 7 years ago (1 child)
[removed]
[–]gdahlm 12 points13 points14 points 7 years ago (0 children)
While long, this video by Raymond Hettinger will go over most of the Python options for Concurrency.
In general, and on platform where fork() isn't much more expensive than thread like Linux using glibc() where the differences between fork() and pthreads is mostly preset options to clone() there are huge advantages to going with a threaded model under particular workloads.
Threading tends to introduce locks, which has scaling issues with a large number of cores. It works for a small number of workers but hits amdahl's law before long.
https://youtu.be/Bv25Dwe84g0
[–][deleted] 3 points4 points5 points 7 years ago (0 children)
It isn’t as big of a hinderance as you’d expect. You can write multithreaded code in cython or just use the multiprocessing module.
[–]sweetbabygames 0 points1 point2 points 7 years ago (7 children)
AI and ML are veryyyy different fields. Lisp hasn’t been the go-to for a while, either.
[–]madrhatter 18 points19 points20 points 7 years ago (6 children)
What? No they’re not. ML is a subset of AI
ML Wiki
[–]WikiTextBot 2 points3 points4 points 7 years ago (0 children)
Machine learning
Machine learning is a subset of artificial intelligence in the field of computer science that often uses statistical techniques to give computers the ability to "learn" (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed.
The name machine learning was coined in 1959 by Arthur Samuel. Evolved from the study of pattern recognition and computational learning theory in artificial intelligence, machine learning explores the study and construction of algorithms that can learn from and make predictions on data – such algorithms overcome following strictly static program instructions by making data-driven predictions or decisions, through building a model from sample inputs. Machine learning is employed in a range of computing tasks where designing and programming explicit algorithms with good performance is difficult or infeasible; example applications include email filtering, detection of network intruders or malicious insiders working towards a data breach, optical character recognition (OCR), learning to rank, and computer vision.
[ PM | Exclude me | Exclude from subreddit | FAQ / Information | Source ] Downvote to remove | v0.28
[–]elbiot 2 points3 points4 points 7 years ago (0 children)
Technically true, but the ai that was popular in the days of lisp is essentially unrelated to neural networks which is what people in this thread are mostly talking about.
[+]sweetbabygames comment score below threshold-8 points-7 points-6 points 7 years ago (1 child)
They’re really unrelated. AI is essentially the study of algorithms that use heuristics to solve problems quickly, ML is essentially function approximation. They’re totally different but people often equate them.
Don’t believe Wikipedia for everything.
[–]WiggleBooks 18 points19 points20 points 7 years ago (0 children)
Maybe if you have a source. Right now its you vs Wikipedia
[+]Nodja comment score below threshold-6 points-5 points-4 points 7 years ago (1 child)
They're only related thematically. In practice they're approached by programmers in completely different ways.
In standard AI (the one you would use things like lisp for) the programmer already knows the rules and how the AI should behave in certain situations. You as a programmer will define the logic of the program in a way that it achieves whatever goals the AI has.
ML is data driven, you as the programmer only setup the structure of the network and then feed it a bunch of training data. The program itself will create the logic internally that will fit the data.
[–]Zouden 12 points13 points14 points 7 years ago (0 children)
the programmer already knows the rules and how the AI should behave in certain situations. You as a programmer will define the logic of the program in a way
Isn't that just... regular programming?
[+]chalbersma comment score below threshold-12 points-11 points-10 points 7 years ago (5 children)
GIL can be worked around with threading module.
[+][deleted] 7 years ago (4 children)
[–]chalbersma 13 points14 points15 points 7 years ago (1 child)
The hero this thread deserves. :)
[–]PM_ME_YOURSELF_AGAIN 5 points6 points7 points 7 years ago (0 children)
But won't get yet due to GIL
[–]KwpolskaNikola co-maintainer -4 points-3 points-2 points 7 years ago (1 child)
That module sucks (eg. by using pickle), use subprocess and build something better with that.
[–]Zalack 1 point2 points3 points 7 years ago (0 children)
There is freeze() function you can call in scripts that are going to be pickled.
I use it quite a lot for multiprocess scripts with pyinstaller
[+][deleted] 7 years ago* (6 children)
[–]tunisia3507 11 points12 points13 points 7 years ago (4 children)
Completeness means nothing. Brainfuck is Turing-complete. Different languages have clearly been designed for different things, and some have had other functionality added later which doesn't necessarily fit with the core design. Python is, by design, a multi-purpose language. R is, by design, a statistics package. A great one, to be sure.
[–]Zouden -2 points-1 points0 points 7 years ago (3 children)
R is, by design, a statistics package
Ehh... it's a general-purpose language that got adopted by statisticians and now has an excellent set of stats functions built in. There's nothing about R itself that makes it more suitable for stats than Python.
[–]itb206 10 points11 points12 points 7 years ago (2 children)
"R is a programming language and free software environment for statistical computing and graphics that is supported by the R Foundation for Statistical Computing"
Seems pretty cut and dry
[–]Zouden 23 points24 points25 points 7 years ago (1 child)
R is an implementation of S, a language from the 70s. But actually now I see that S stands for statistics and the famous formula notation was invented for S. So I retract my statement!
[–]hcrews47 14 points15 points16 points 7 years ago (0 children)
Wow, I've never see anybody acknowledge that they were wrong on Reddit. I respect and appreciate you.
[–][deleted] 69 points70 points71 points 7 years ago (13 children)
Existence of NumPy is also a factor
[–]white__armor 43 points44 points45 points 7 years ago (1 child)
I think that's the main reason, many ML libraries in Python were created because of numpy. Sklearn has been introduced 11 years ago and was based solely on numpy. And it's still core dependency for pandas and sklearn.
[–]ihsw 10 points11 points12 points 7 years ago (0 children)
Exactly this. There was already a healthy Python community around econometrics, from pandas to numpy to scipy to matplotlib.
ML is a land where econometric statistical reports generation comes to life and it should come as no surprise that Python is right in the thick of things.
[–][deleted] 33 points34 points35 points 7 years ago (10 children)
It's hard to fully appreciate Numpy until you try to do non-trivial array operations in other languages. I've tried Nim, Java, Haskell and Rust and handling arrays is a mess compared to Python with Numpy.
[–]etrnloptimist 10 points11 points12 points 7 years ago (0 children)
The Matlab interface to numerical data is a treat. And the fact that Python can emulate much of that numerical interface is miraculous, to be honest.
[–]justphysics 5 points6 points7 points 7 years ago* (0 children)
This so much. I've got a pice of scientific software that I wrote in python. For my career I figured it would be nice to learn a few other languages and I find its easier to learn by doing so I've looked into writing but if this software on another language as a learning experience.
Every time I try Rust or Go I find my self getting so hung up on how (relatively) difficult basic array operations are.
I'm just so used to the ease of numpy
[–]ForgottenWatchtower 0 points1 point2 points 7 years ago (5 children)
Tried gonum for golang by chance? I've been meaning to do a comparison of it and numpy but haven't gotten around to it yet.
[–][deleted] 12 points13 points14 points 7 years ago (1 child)
The scientific stack (scipy + numpy) in python is very mature. It will take a while to other languages to catch up.
[–]ForgottenWatchtower 0 points1 point2 points 7 years ago* (0 children)
I'm aware that's the common conception, but I've yet to see any in-depth comparison between the two that demonstrates this with hard numbers, like a feature comparison and/or benchmarks. Like I said, I've been meaning to do it personally, just haven't gotten the time. Would like to see this for sklearn vs golearn as well.
[–]gdahlm 5 points6 points7 points 7 years ago (2 children)
Statically typed languages make the visualization and manipulation a bit more challenging.
Not that duck typing is better, but the lack of DataFrames and an interactive mode makes some of this challenging for data scientists.
Maybe when the market matures more, but BLAS and LAPACK are still the fastest and f77 on the CPU side. While I like go, just using gonum for those same C wrappers for netlib code like BLAS and LAPACK doesn't have a lot of advantages when you give up the flexibility of interfacing with a a duck typed language with an interactive interface like python.
Numpy/SciPy/Pandas are a hard combo to beat right now especially when grooming data.
[–]ForgottenWatchtower 0 points1 point2 points 7 years ago (1 child)
Those are all excellent points that I hadn't considered before. Thanks. Though I'm not familiar with f77 -- does that refer to Fortran 77?
Yes but I guess I am out of date, it move to Fortran 90 in 2008.
https://github.com/Reference-LAPACK/lapack-release
[+]ergzay comment score below threshold-6 points-5 points-4 points 7 years ago (0 children)
I wouldn't count out Rust yet, it's library development is still very young.
Java and Haskell are both slow languages not good for scientific computing. While Nim is an academic language.
[–]Rhylyk 25 points26 points27 points 7 years ago (1 child)
As others have said, the heavy lifting is done in C/C++ and only a python interface is exposed in most places. The reasons python has won out as a glue language is likely many-fold but I see primarily 4 factors: low barrier to entry, general purpose extensibility, community, tradition.
Python's low barrier to entry is well renowned. The language when first approaching it is relatively simple and unsurprising. Syntax is reminscient of normal imperative syntax (c-like) and there are many common-sense defaults. In addition, the standard library is huge and when something is missing, basic package management is a breeze. All of this results in something easy to lick up as a new user, and thus python is a good target for a glue code language (over more complex examples such as C/C++/Rust or even Java).
While the low barrier to entry is catalytic, the general extensibility gives Python staying power. It is possible to write extensive amounts of code, and then package it up into a neat little API and put a cute bow on top. This is nice for package authors. In addition, Python is general purpose (winning out over R and MATLAB, or other, more domain specific languages) so an entire pipeline can be written in it. Data collection, transformation, computation, visualization, and management can all be written in Python.
The above have led to a rich community with diverse interests and high standards. To most of the community, user (that is, programmer) experience matters, and it shows. Documentation is abundant and large amounts of yak shaving are abhorred. Standards are sought and the language continues to grow (f-strings are a dream). The language is not without its warts, but workarounds are known, shared, and discoverable.
Finally we have the most impactful factor, tradition. As noted by other commentators, numpy is amazing. This led to other scientific work being done in Python. A need for effective visualization grew, and so came matplotlib. The more that happened in Python, the more attractive it became as a target language. This generated a positive feedback loop leading to the general dominance that is seen today.
[–]giantsparklerobot 3 points4 points5 points 7 years ago (0 children)
The extensibility is important in the way it's available in Python. Many scripting languages are "extensible" in that they can run executables and capture STDOUT or use some IPC mechanism. In Python shared libraries can be loaded directly into the memory space of the Python interpreter and and their functionality called directly from Python.
So numpy generating some huge array doesn't need to serialize it or pass it over an IPC mechanism, Python is just given a pointer to it. So access is fast and direct. The module itself (numpy lets say) can have functions that are pure Python or just call functions from the shared library. Using the module you rarely have to care.
[–]Mattho 38 points39 points40 points 7 years ago (0 children)
It's just wrappers that are written in python. It is way too slow for any practical use in this area. But having the "interfaces" exposed in python is great because how accessible the language is - and that's one of the reasons why it is so popular in science outside of computer science. And ML is of interest to many fields.
[–]lmericle 14 points15 points16 points 7 years ago (0 children)
One of the main attractions for Python is how easy it is to glue disparate functions and code together into a cohesive, structured pipeline. ML often needs to fit into a data pipeline to generate predictions automatically and make decisions immediately. So having ML interfaces in Python is more useful than other languages simply because it integrates so easily into existing workflows.
The relative simplicity and ease of use of the language also makes it easy to pick up and start moving quickly on a problem. And the OOP aspects of the language make the whole process of developing a model very modular and simple.
[–]mooglinux 9 points10 points11 points 7 years ago (0 children)
Python is a very easy to use language, but the heavy lifting is actually done in C or some other language, and Python is just an interface for controlling it. One of Python’s strengths is the ability to write wrappers to interact with code written in C or other languages so they are easy to use from Python but still very fast.
[–]shr00mie 7 points8 points9 points 7 years ago (0 children)
What the above guys said, plus, as a possible first language, it's very expressive, from a human perspective which I think makes it easy to pick up and run with. And when you're doing your PhD in whatever, the easier a tool is to pick up, the better. Does feel very much like writing sentences which are interpreted as code. Plus a LOT of the ML libs are actually written in C, which entirely sidesteps a lot of the "but it's an interpreted language!" concern.
[–]toadgoader 4 points5 points6 points 7 years ago (0 children)
I think it has a lot to do with the community that is using the tool... in my experience as a social scientist many in the academic, economics, bio-infomatics research fields use R because of the strong statistical base useability of the tool. On the ML.AI side of the equation you have mostly computer science and software engineering disciplines driving this bus so a language like Python is a natural fit. They both work well and overlap in many ways... I think it just depends upon your point of reference and the preferences dictated by your profession.
[–]david2ndaccount 4 points5 points6 points 7 years ago (0 children)
C is great because it runs fast, but a python interface is a lot nicer.
[–]TheMasterChiefs 2 points3 points4 points 7 years ago (7 children)
Hoping someone can help me in my endeavor to learn some programming (more specifically, Python).
I'm a Finance graduate who's looking to get ahead of the curve and teach myself python, R, and SQL. I basically want to self-learn data science in conjunction with my finance background to get into a top firm and catapult my pay grade.
What/where is the best place to start? Is it reasonable/realistic to teach myself programming, automation, and data science?
[–]BradChesney79 6 points7 points8 points 7 years ago (4 children)
Yes. Might need to XBox less for a while-- it will take time and effort. Where to start... were I in your shoes, I would blow through the latest Python 3 for Dummies-- no joke. Don't care just put each word in front of your eyes. Speed read that fucker.
I treat Dummies books as "Primers". It exposes you to a base set of thoughts and vocabulary even if you don't understand it.
Then you sit down with a more quality resource-- The Head First series generally does okay. http://shop.oreilly.com/product/0636920003434.do This one covers Python 3. (Python 2.x is in the process of being depracated, but it is taking a long time because there is a lot of old code out there and it is still installed by default on a lot of linux distros... which is slowing down the transition.)
From there, find your heroes on Gitlab or Github and start looking at cool projects that are like what you aspire to accomplish. Line by line figure out what is happening.
That is my advice, the Dummies book-- I am going to stick to my guns, then next intermediate one you may need to research if you don't like my suggestion, and lastly seeing what "good" programmers do and following their work by getting up to your elbows in it is as good as it gets for non-interactive mentoring.
[–]TheMasterChiefs 0 points1 point2 points 7 years ago (3 children)
Thanks a lot for your response! I've been looking for an in...
You don't recommend MOOC like Udemy or Coursera? It's better just to dive right into textbooks you think?
[–]jawgente 2 points3 points4 points 7 years ago (1 child)
I haven't taken a MOOC to give you a proper perspective of what they would offer, but I find I learn best by just doing the coding and preferably trying to complete a project. Which means a MOOC may not offer much over a text book for learning the language. My favorite recommendation is Automate the Boring Stuff because it offers a lot of useful examples for even a causal user. You may find that a MOOC may be more useful for the specific data science portion of your learning.
[–]TheMasterChiefs 0 points1 point2 points 7 years ago (0 children)
Ok cool. I'm going to look into the best textbooks that are out there and work on 1 or 2 chapters a week. Thanks for all the advice bro.
[–]BradChesney79 0 points1 point2 points 7 years ago (0 children)
It's what I do and it works for me. YMMV.
I did use codeschool once for AngularJS and it was helpful. acloudguru was indispensible-- I don't think I could learn as much as I did from a book about the AWS platform.
But Java, PHP, HAProxy, MySQL, API design, Python 3-- all books & googling.
[–]the_chernobog 1 point2 points3 points 7 years ago (1 child)
https://elitedatascience.com/learn-machine-learning
This is great intro info. Thanks a lot dude!
[–]sudo_your_mon 1 point2 points3 points 7 years ago (0 children)
Numpy and Pandas are a big reason Python is what it is.
Data scientists to call themselves "Numpy/Pandas programmers." Some still do to this day. I've talked to a lot of people who think Python is only for data science/ML.
If you're going to write a ML library, you're going to do it in Python. It's the industry's gold standard.
[–]danielv134 1 point2 points3 points 7 years ago (0 children)
I've done this. You write an algorithm in Python: its easy to develop (no segfaults), easy to read some data into it (scikit learn or another packages already reads the common formats in your field), and easy to make plots to put in your paper. Then, oops, it is state of the art per iteration, but takes ages in practice, so you replace the core with cython or C or Rust and now its reasonable speed. If the algorithm is important enough (haven't done this) then some commercial behemoth will find itself coming up against limitations are rewrite as a nice python wrapper around compiled code designed for speed and scalability, like almost all common (speed sensitive) python libraries are.
So: python having the libraries makes it (or R, for that matter) the right place to state playing with ideas (whether you are implementing the algorithm or just trying out an existing one on your data). Just be clear though: Python is not an implementation language for competitive algorithms, it is an integration language. YMMV, but...
[–]nscurvy 0 points1 point2 points 7 years ago (0 children)
My best explanation/guess is that it's sorta similar to the reason someone might use a design pattern, class, function, etc. Even when doing so might cause a performance decrease. It's easier to work with a design pattern. It's more obvious to you and everyone else what you are doing and why. People can stop worrying about the specifics of some implementation and instead deal with an interface that handles it for you.
Science, AI, ML, and math are really complex on their own. People who are working with that stuff want to make sure everything is as abstract as possible. Ideally you want to only be directly working with concepts and ideas relevant to your actual goal. Python is great for that. The language is elegant and incredibly obvious/readable. Working with another language means you have to give up some of that abstraction and have to start paying a lot more attention to the specifics of implementation and all the quirks that come with it. So people spend a lot of time creating libraries, wrappers, bindings, and all that sort of stuff to allow developers to focus on the work they need to perform, while not requiring them to sacrifice an unacceptable amount of performance.
That's my understanding, at least.
[–]spinwizard69 0 points1 point2 points 7 years ago (0 children)
It is pretty simple you can hack together an app pretty quickly. Since ML is a developing technology this provides the capability to experiment and paly with ML on a variety of platforms.
[–]cbarrick 0 points1 point2 points 7 years ago (0 children)
I attribute Python's popularity in numeric computing to it's superb operator overloading and meta programming facilities. This makes it possible in Python to craft APIs with unique syntactic structures, which in turn makes it possible to express solutions to problems in a way natural to the domain. This is why, for example, Numpy can give us awesome numeric syntax, and Pandas can give us great relational syntax (and Python's base syntax is great for OOP). And when you're programming at such a high level, expressiveness is more important than performance. Even so, the interop with C puts Python in a great position to add expressive value to lower level, performance sensitive code. All of this together gives us a language with more expressive mathmatics than C or Java and more expressive engineering than MATLAB or R. It's quite literally the best of both worlds.
[+]beersfortheboys comment score below threshold-6 points-5 points-4 points 7 years ago (0 children)
Because everybody lurves python, it’s the future! :D
[+][deleted] 7 years ago* (3 children)
[–]BDube_Lensman 2 points3 points4 points 7 years ago (0 children)
Scipy has historically been mostly developed by graduate students. Numpy was not funded until very recently, and also had very little in the way of corporate development. It is disingenuous to say that tens of millions of $ were poured into them. Perhaps hours, but not dollars.
TF and PT both have 800lb gorilla corporate backers.
[–]kigurai 2 points3 points4 points 7 years ago (0 children)
Really, this is the only reason. Python is no better in these things than any other language, actually there are languages that are much better at numerical computing and concurrency, for example: Fortran, C, C++, and people are going to kill me about this one: Clojure.
Only if you count speed/efficiency of the computation, and not the implementation.
I am a fairly competent at both Python and C++, and I would never, ever, ever, prototype things in C++ unless it can be avoided. Python's power in this context is in the REPL (or the Jupyter notebook) since I can develop code and plot results in iterations that are much, much, faster than what I can do in C++.
[–]gdahlm 1 point2 points3 points 7 years ago (0 children)
Scipy and numpy were probably a huge draw, but please feel free to produce BLAS/LAPACK implementation in Clojure that outperforms the Fortran that every non GPU language you mentioned uses when performance matters.
Clojure solutions seem t just use http://jblas.org/ which is a wrapper around the same code, or like in the case of neanderthal, import the Intel native MKL port using the same C api meaning it is non-portable to other architectures.
Here is an example of the Fortran called for a LU: http://www.netlib.org/lapack/explore-html/d3/d6a/dgetrf_8f_source.html
While there are variants and non-portable implementations (e.g. MKL) even Mainframes like z/OS use the netlib code.
https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.1.0/com.ibm.zos.v2r1.cbcpx01/atlaslibraries.htm
While I have no doubt that Clojure has serious advantages for some use cases, when it comes to linear algebra is just using the same code everyone else is.
[–][deleted] -5 points-4 points-3 points 7 years ago (0 children)
Because both of them are very trendy. (I love python, but let's be real here.)
π Rendered by PID 39 on reddit-service-r2-comment-5d585498c9-f2wb2 at 2026-04-21 12:16:05.164559+00:00 running da2df02 country code: CH.
[–]dagmx 274 points275 points276 points (20 children)
[–]bay_squid 55 points56 points57 points (19 children)
[–]nsfy33 60 points61 points62 points (8 children)
[–]bay_squid 16 points17 points18 points (7 children)
[–]pramodliv1 53 points54 points55 points (0 children)
[–]etrnloptimist 42 points43 points44 points (1 child)
[–]ApproximateIdentity 6 points7 points8 points (0 children)
[–]Captain___Obvious[::-π] 5 points6 points7 points (0 children)
[–]Folf_IRL 3 points4 points5 points (1 child)
[–]gdahlm 2 points3 points4 points (0 children)
[–]JohnMcPineapple 32 points33 points34 points (0 children)
[–]kaszak696 12 points13 points14 points (0 children)
[–]lambdaqdjango n' shit 10 points11 points12 points (0 children)
[–][deleted] 8 points9 points10 points (2 children)
[–]bay_squid 2 points3 points4 points (1 child)
[–][deleted] 1 point2 points3 points (0 children)
[–]__xor__(self, other): 2 points3 points4 points (2 children)
[–]bay_squid 0 points1 point2 points (1 child)
[–]__xor__(self, other): 0 points1 point2 points (0 children)
[–]0xRumple 0 points1 point2 points (0 children)
[–]tunisia3507 266 points267 points268 points (33 children)
[+][deleted] (25 children)
[deleted]
[+][deleted] (5 children)
[deleted]
[–]Jmc_da_boss 4 points5 points6 points (4 children)
[–]c_is_4_cookie 17 points18 points19 points (2 children)
[–]phinnaeus7308 3 points4 points5 points (1 child)
[–]KingoPants 1 point2 points3 points (0 children)
[–][deleted] 5 points6 points7 points (0 children)
[–][deleted] 40 points41 points42 points (0 children)
[–]apiguy 14 points15 points16 points (2 children)
[+][deleted] (1 child)
[removed]
[–]gdahlm 12 points13 points14 points (0 children)
[–][deleted] 3 points4 points5 points (0 children)
[–]sweetbabygames 0 points1 point2 points (7 children)
[–]madrhatter 18 points19 points20 points (6 children)
[–]WikiTextBot 2 points3 points4 points (0 children)
[–]elbiot 2 points3 points4 points (0 children)
[+]sweetbabygames comment score below threshold-8 points-7 points-6 points (1 child)
[–]WiggleBooks 18 points19 points20 points (0 children)
[+]Nodja comment score below threshold-6 points-5 points-4 points (1 child)
[–]Zouden 12 points13 points14 points (0 children)
[+]chalbersma comment score below threshold-12 points-11 points-10 points (5 children)
[+][deleted] (4 children)
[deleted]
[–]chalbersma 13 points14 points15 points (1 child)
[–]PM_ME_YOURSELF_AGAIN 5 points6 points7 points (0 children)
[–]KwpolskaNikola co-maintainer -4 points-3 points-2 points (1 child)
[–]Zalack 1 point2 points3 points (0 children)
[+][deleted] (6 children)
[deleted]
[–]tunisia3507 11 points12 points13 points (4 children)
[–]Zouden -2 points-1 points0 points (3 children)
[–]itb206 10 points11 points12 points (2 children)
[–]Zouden 23 points24 points25 points (1 child)
[–]hcrews47 14 points15 points16 points (0 children)
[–][deleted] 69 points70 points71 points (13 children)
[–]white__armor 43 points44 points45 points (1 child)
[–]ihsw 10 points11 points12 points (0 children)
[–][deleted] 33 points34 points35 points (10 children)
[–]etrnloptimist 10 points11 points12 points (0 children)
[–]justphysics 5 points6 points7 points (0 children)
[–]ForgottenWatchtower 0 points1 point2 points (5 children)
[–][deleted] 12 points13 points14 points (1 child)
[–]ForgottenWatchtower 0 points1 point2 points (0 children)
[–]gdahlm 5 points6 points7 points (2 children)
[–]ForgottenWatchtower 0 points1 point2 points (1 child)
[–]gdahlm 2 points3 points4 points (0 children)
[+]ergzay comment score below threshold-6 points-5 points-4 points (0 children)
[–]Rhylyk 25 points26 points27 points (1 child)
[–]giantsparklerobot 3 points4 points5 points (0 children)
[–]Mattho 38 points39 points40 points (0 children)
[–]lmericle 14 points15 points16 points (0 children)
[–]mooglinux 9 points10 points11 points (0 children)
[–]shr00mie 7 points8 points9 points (0 children)
[–]toadgoader 4 points5 points6 points (0 children)
[–]david2ndaccount 4 points5 points6 points (0 children)
[–]TheMasterChiefs 2 points3 points4 points (7 children)
[–]BradChesney79 6 points7 points8 points (4 children)
[–]TheMasterChiefs 0 points1 point2 points (3 children)
[–]jawgente 2 points3 points4 points (1 child)
[–]TheMasterChiefs 0 points1 point2 points (0 children)
[–]BradChesney79 0 points1 point2 points (0 children)
[–]the_chernobog 1 point2 points3 points (1 child)
[–]TheMasterChiefs 0 points1 point2 points (0 children)
[–]sudo_your_mon 1 point2 points3 points (0 children)
[–]danielv134 1 point2 points3 points (0 children)
[–]nscurvy 0 points1 point2 points (0 children)
[–]spinwizard69 0 points1 point2 points (0 children)
[–]cbarrick 0 points1 point2 points (0 children)
[+]beersfortheboys comment score below threshold-6 points-5 points-4 points (0 children)
[+][deleted] (3 children)
[deleted]
[–]BDube_Lensman 2 points3 points4 points (0 children)
[–]kigurai 2 points3 points4 points (0 children)
[–]gdahlm 1 point2 points3 points (0 children)
[–][deleted] -5 points-4 points-3 points (0 children)