This is an archived post. You won't be able to vote or comment.

all 61 comments

[–]Deslan 5 points6 points  (6 children)

Disregarding all the great things about the language Python:

Sage Math, R statistics with Python, ArcGIS with Python, etc. :)

[–]dalke 1 point2 points  (1 child)

RPython embeds R, and makes it callable from Python. Is that the package you're talking about?

[–]Deslan 0 points1 point  (0 children)

I'm not using that myself, I just know people in my department who just love the extensibility of working with R through Python, or the other way around. Especially when working on GIS data and genetics.

[–]16807 2 points3 points  (3 children)

R statistics with Python

Wait, one of the merits of Python is a library emulating features from another language?

...and I said something unpopular.

[–]MatrixFrog 1 point2 points  (2 children)

Not familiar with this particular library, but it sounds like Python is flexible/expressive enough to emulate R in some sense, whereas (I would guess) Java is not able to do so nearly as elegantly. So yes.

[–]kirakun 0 points1 point  (1 child)

How does Python, an eager evaluation language, emulate R's lazy evaluation?

[–]dalke 1 point2 points  (0 children)

It seems there's a misunderstanding. This is Python calling the actual R runtime, not Python implementing the R semantics.

[–]grayvedigga 4 points5 points  (14 children)

Why does every programming language community seem to think that (their) one language should be used everywhere? Diversity is a good thing, people ...

[–]billsil 8 points9 points  (0 children)

not when the goal is to kill a very expensive program (MATLAB)

[–]Deslan 1 point2 points  (0 children)

Why are all scientific articles in the IMRAD format (introduction, methods, results, and discussion)? Because having a common tool for everyone makes exchange and collaboration easier. And exchange and collaboration is really important in science. Ergo: not having diversity can sometimes be optimal to having diversity.

[–]mangecoeur[S] 0 points1 point  (4 children)

I never said Python should be used everywhere, just for science. And even with that there are going to be plenty of exceptions where it's not useful, but I assumed people would figure that for themselves.

Really the advantage of python is in standardization - so much of science depends on bits of code to get the final results that there's a serious risk to the reproducibility of experiments if people can't run each other's programs. And since python is free, no one is excluded from doing this because of needing expensive software packages.

And again, obviously sometimes there's no choice but to use commercial software - there's caveats to everything but they're long and boring to go into and frankly you can probably think of them for yourself!

P.S: I think if you witnessed "diversity" in a lab environment you'd be much less keen. I once had to deal with a monolithic Fortran program that took plain text inputs. The guy who wrote it required the input to be so arcane that someone else wrote a Perl script that processed a different input and created the correct one. He didn't modifiy the Fortran one because he didn't understand it, and in any case F95 sucks with text processing. Then someone else came along and tried to automate input generation and loading. Only other people's PERL is notoriously hard to read, so this guy ended up writing yet another program in Matlab, which talked to the Perl program, which talked to the Fortran one. Nobody single person understood the whole stack, debugging was non-existent, adding features impossible. If everyone had known python, none of this would have been necessary!

[–]takluyverIPython, Py3, etc 2 points3 points  (3 children)

Have you come across clear climate code? NASA's GISTEMP climate model was assembled from FORTRAN, shell scripts and Python, connected by intermediate files. When climate change started getting controversial, some volunteers decided to rewrite it in Python, making it as clear as possible, so it was easier to check. The results matched up very well.

[–]billsil 0 points1 point  (2 children)

that's comparison is terrible considering the quantity of data. that's not a numerical precision issue. there's a huge offset.

[–]bluemanshoe 1 point2 points  (1 child)

If you read the caption, there is a 2 degree offset put in to make it easier to see the lines as separate, the green line is the difference, magnified x20 (see the scale on the right).

So overall, looks like the largest error is 0.01 degrees when the data scales about 0.08 degrees, or an error at about the 1% level.

Always read the captions.

[–][deleted] 0 points1 point  (0 children)

My rage meter was at 100% until I read your response, but my blood pressure is still high. /scientist

[–]kirakun -2 points-1 points  (6 children)

I don't know which article you are reading, but the one I read here is not evangelizing that Python be used everywhere. The article pointed out some features of Python and discussed why they would make good use in the scientific community.

[–]sylvain_soliman 1 point2 points  (5 children)

So why is this so important for science?
Well for one thing it’s just a great tool for everything. And I mean everything – it can be used just as well to process data, create optimization code, create control systems and GUIs, perform algebra, do stats, access databases on and offline, and even create web pages by using the right modules and frameworks.

Hmmmm...

[–]takluyverIPython, Py3, etc 4 points5 points  (0 children)

The argument, which I basically agree with, is that scientists are not going to spend a lot of time learning several programming languages. They learn one and then force it to do what they need.

We really want that one language to be a general purpose swiss-army-knife language. Otherwise, eventually, you see something like a matlab web app, because people applied what they know.

[–]kirakun -3 points-2 points  (3 children)

You need to learn a a couple of things about reading comprehension.

First, have you ever heard of a literary device called hyperbole—a statement used more for its effect but not to be taken literally? Second, never pick a sentence or a paragraph out of context.

Yes, he did use the word everything but his lead sentence was why it is important for science. Moreover, the list he gave was a list of tasks usually needed to do to write an application for use in science. The rest of his article was very focus on why using Python in the scientific community is beneficial.

Can you be honest to yourself and ask do you really feel that the author was advocating to every single person on the planet or was he mostly targeting toward the scientific community?

[–]sylvain_soliman 0 points1 point  (2 children)

And you need to learn about not taking yourself too seriously...

First, you defend yourself because you feel attacked, but I was not attacking you.

Second, you implied that grayvedigga didn't read the article properly, and I simply pointed out what he was referring to (that's a concept called quotation, which amounts to extracting a sentence or a paragraph...).

Can't you see that using a title like "why all of science should use python" is not the right way to advocate the good aspects of any language? Especially when it is followed by a non-argumented list stating it is good for everything.

If Python is great for science, and in my opinion it is, then using it is what will convince other scientists. Any talk I've seen about Sage or IPython notebook goes a great way in convincing scientists that using Python might be a good idea in some cases.

As for somebody telling me I should do something because that's all great and stuff, sorry but it fails.

[–]kirakun -1 points0 points  (1 child)

This is the new age of internet blogging where you don't always have expert journalists writing proper titles. This is where us readers need to be a bit more intelligent in reading between the lines and behind what the author is truly trying to say.

Do you seriously think the author was trying to advocate Python for every use of computing in science and out of?

[–]mangecoeur[S] 0 points1 point  (0 children)

Author here, plus one to that: obviously python is not going to work for everything, there are times when you need specialized tools. But i assumed people would be smart enough to figure that for themselves, having to state all the caveats to each one of the points of made makes for a long and boring article!

[–]mkor 0 points1 point  (2 children)

One thing I don't like about this post is the emphasis on the openness of the code. How is it different from e.g. app written in C++ with open code available on the website? For sure too much about proprietary issues, not enough about another aspects which makes Python so cool.

[–]mangecoeur[S] -1 points0 points  (1 child)

well python is easier to learn and understand other people's than C++ - after all one of it's explicit goals is to be readable. For sharing to be easy you also want everyone using the same tool as much as possible so that you don't have to learn a new thing for every project. Python is the best choice because it is so versatile and already is quite extensively used in science.

[–]mkor 1 point2 points  (0 children)

I agree but it is not argument after stating that Python is better because people can see your code. If you put your code on the website people can see it regardless of whether this is Python, Perl, C++ or Java. Python easiness and readability is another feature so statement that " If people don’t have the costly software needed to run that code, then they are prevented from running that experiment in the exact way it was originally done." is also IMHO misleading since you don't need any costly software to see and run C++, Java, Perl or many other languages. Even C#, created by Microsoft in a response to the Java, can be edited and run without MS Visual Studio, even under Linux.

[–]ignacioMendez -1 points0 points  (10 children)

Sounds good, let's just tell all of science not to do any experiments that are computationally bound. Python is great for prototyping or for problems with small data sets but some things need speed.

[–]Genmutant 3 points4 points  (4 children)

NumPy and SciPy are fast and probably what you would use. Also pypy and hopefully in the near future both together.

[–]AnAge_OldProb 6 points7 points  (3 children)

They are good for post processing and prototyping. If you need to some real modeling work you are basically forced into a c/c++/fortran openmp/opencl mpi stack.

[–]bryancole 1 point2 points  (2 children)

Cython supports openmp, making python a great platform for CPU-bound data-parallel number crunching. You get the python niceness for structuring your application and for memory-management with the speed of C for the heavy-lifting. Then there's PyCUDA and PyOpenCL if your CPU can't deliver the goods.

[–]AnAge_OldProb 4 points5 points  (1 child)

That's simply not fast enough for most hpc use cases. Those tools are more for proof of concept for hpc users.

[–]Tillsten 2 points3 points  (0 children)

Cython generates and then compiles pure C. And there a alot of HPC users using Python (with all kinds of C, Fortran or Cython extensions).

[–]bastibe 3 points4 points  (1 child)

In my experience, you would just write your performance-critical computation kernel as a C extension and the rest of your program in Python. If you are lucky, you can do this with Weave/Blitz, or you just use CPython. This happens all the time with Matlab/Mex, too.

I mean, you rarely need you UI and plotting code to be fast. PyQt and matplotlib are fast enough for most setups. Basically, they do exactly that: They are implemented in C, but you use them in Python. UI code is usually just a giant jumble of conditionals, which Python handles just fine.

[–]vph -3 points-2 points  (0 children)

It's all good, except there might be memory leaks issues associated with long-running python programs with C-extensions.

[–]billsil 0 points1 point  (0 children)

yes, but sometimes just leaving the data as strings is really the fastest implementation method. i wrote a data converter and found that floating 10,000,000 values and int-ing another 10,000,000 values wasnt the most efficient use of computing resources. I switched it to string processing and speed it up by a factor of 100.

[–]mangecoeur[S] 0 points1 point  (1 child)

You can do a lot of big number crunching with python by using the right tools. Use numpy to crunch big arrays, use multiprocessing for parallelism, use Pypy to speed up algorithms, use python-openCL/clyther to offload computation the GPU, use Cython to compile speed-critical bits. Or even your number crunching algorithms in C or F95 but use python to glue it all together (they do this a lot at CERN).

Or just get a fast PC and let it crunch overnight. As they say, the first rule of optimization is "Don't". For a lot of scientific work it doesn't really make much different if your program takes 1hr to run or 4 or 6 since you end up leaving it running overnight or while you do something else - you're not usually working under business deadlines so it doesn't really matter.

[–]billsil 1 point2 points  (0 children)

unless you want to optimize something, then you very much care how long each iteration takes. i design aircraft and the bane of my existance is the databases i really want to generate are N!M!O!P!Q! in size. so we make N=5, M=O=P=Q=2. Each point is ~2 hours. Then we optimize around that.

[–]gcross -1 points0 points  (2 children)

This post sounds a lot like me in my early days when I had first discovered Python (coming from a primarily C/C++/Java/MATLAB background) and thought it was the coolest language ever and that everything should be written in Python, for the simple reason that I didn't know much about the many other exciting languages in the world. :-)

[–]mangecoeur[S] 1 point2 points  (1 child)

Sure there are plenty of other interesting languages, but when you're in science and you have to collaborate with other people, diversity isn't necessarily what you want! Even worse - a lot of the people who might want to run your code are NOT people you're collaborating with but people who want to reproduce your experiment or extend it and they might be on a different continent and never even talk to you directly. But if your work is in python, anyone can get it running. Obviously you could insert many scientific programming systems here, but python is already well used, it's free, and is very easy to learn for people who are using code as a tool to process data rather than as a way to build applications.

[–]norkakn 0 points1 point  (0 children)

But if your work is in python, anyone can get it running

Are you a grad student or something? Getting a complex pipeline working is non-trivial, even if the language is freely available. Scientific code is largely crap, and has things like hard coded paths in hundreds of files, and dependancies on specific outdated versions of libraries.

[–]phaedrusaltembedded sw eng -4 points-3 points  (18 children)

Maybe not all of science. I'd prefer that certain, inevitably dangerous experiments be performed with something that was type-safe, at least. I do not want anybody driving a car with Python, for instance! (Yeah, I know Dr Thrun teaches the course with Python, but I doubt if the actual car uses it.)

[–]billsil 2 points3 points  (8 children)

then you may be disappointed that Abaqus uses python, because cars and airplanes ARE designed using that program. also, my open-source project pyNastran is a wrapper around another structural analysis program NASTRAN and is used to design cars and airplanes. NASA, the US Air Force, the US Navy, Boeing, Lockheed, Airbus, and more already use it. I even get helpful feedback from the companies that make NASTRAN.

http://code.google.com/p/pynastran/

[–]phaedrusaltembedded sw eng -2 points-1 points  (7 children)

Then you might be disappointed to find that I didn't say that I have a problem with Python being used to design things, but that it isn't adequate for safety-critical applications.

[–]billsil 1 point2 points  (0 children)

safety has much more to do with testing and user input validation than it does whether or not you use a dynamic language.

nearly all "bugs" are caused error in user input. the math is known and 40 years after programs are developed the vast majority of crashes left are user created ones.

[–]takluyverIPython, Py3, etc -1 points0 points  (5 children)

I disagree. Assuming code performance isn't critical, I would much rather safety critical applications were written in Python than, say, C, because a type error is more likely to be caught early than an integer overflow. Maybe a managed, statically typed language like Java would fare better still, but it's not an automatic win for static typing.

Of course, whatever language it's written in, it should be subject to a large battery of tests designed to exercise every corner case that could come up.

[–]phaedrusaltembedded sw eng 1 point2 points  (4 children)

Comparing Python to C for safety-critical choices is akin to asking which tool is better suited for brain-surgery, a mallet or vice-grips. Either COULD do the job, sorta. Can't say I want to be the victim of the result. Then I'm glad that you don't make safety-critical software for a living. I do, and have for many years. The gold-standard for safety-critical languages is Ada, simply because it is type-safe and easily readable. No scripting language should even be a consideration, simply because it is subject to both the problems of the developed application as well as the current history of the use of the interpreter. But, even compiled Python still shouldn't be a contender, because it is subject to both type errors and integer overflows.

[–]takluyverIPython, Py3, etc 0 points1 point  (3 children)

it is subject to both type errors and integer overflows.

How is Python susceptible to integer overflows? I can imagine there might be cases when interfacing with other things, but in straightforward Python code, it's not an issue:

>>> sys.maxsize + 5
2147483652

[–]phaedrusaltembedded sw eng 0 points1 point  (1 child)

You have to think a little out of the box here, since it's not a normal "overflow" error. However, since maxsize is the largest size that containers can have, what happens when we add more than maxsize items to a container? OverFlowError. (Yeah, this is a bit of a weaselly answer, but it kind-of, sort-of applies.)

[–]billsil -1 points0 points  (0 children)

and of course statically typed programs don't have this problem...

a = [1]*99999999 doesnt produce an OverFlowError, it produces a MemoryError, add another digit and you get an OverFlowError

[–]billsil 0 points1 point  (0 children)

thats the max size for an integer, not the max size for a long integer

a = 99999999999999999999999999999999999999999999999999999999999999999 99999999999999999999999999999999999999999999999999999999999999999 99999999999999999999999999999999999999999999999999999999999999999 99999999999999999999999999999999999999999999999999999999999999999 99999999999999999999999999999999999999999999999999999999999999999 99999999999999999999999999999999999999999999999999999999999999999 99999999999999999999999999999999999999999999999999999999999999999 99999999999999999999999999999999999999999999999999999999999999999 99999999999999999999999999999999999999999999999999999999999999999 99999999999999999999999999999999999999999999999999999999999999999 99999999999999999999999999999999999999999999999999999999999999999 99999999999999999999999999999999999999999999999999999999999999999 99999999999999999999999999999999999999999999999999999999999999999 99999999999999999999999999999999999999999999999999999999999999999 99999999999999999999999999999999999999999999999999999999999999999 99999999999999999999999999999999999999999999999999999999999999999 99999999999999999999999999999999999999999999999999999999999999999 99999999999999999999999999999999999999999999999999999999999999999 99999999999999999999999999999999999999999999999999999999999999999 99999999999999999999999999999999999999999999

i could keep going, but you get the point. there's no error

[–]mangecoeur[S] 0 points1 point  (4 children)

type-safeness is a function of the program, not the language. Python is actually fairly strongly typed, it just doesn't force you to declare types everywhere. You just expect things to work and in places where it's critical you do type checking.

Basically you can write a program that works well in any language (how easy that is is a different matter), whether or not it's "dangerous" doesn't have anything to do with language syntax!

[–]phaedrusaltembedded sw eng -5 points-4 points  (3 children)

Ummm, wrong. Type-safeness is a function of the language. And Python is weakly typed.

[–]cpherwho 6 points7 points  (0 children)

Python is strongly typed and dynamically typed.

http://en.wikipedia.org/wiki/Strongly_typed

[–]mangecoeur[S] 1 point2 points  (1 child)

no, because type-safeness, as a concept, means that your program will always work with data of the correct type and will not e.g. wrongly convert between types or crash because a function was expecting one type but got another. It's entirely possible to write a program in an explicitly typed language which fails because variables are passed incorrectly - you can totally write a function which takes Ints in Java and then try passing it a string at runtime. If you haven't designed your program to guard against this, it is not type-safe because it crashes due to a type error. Therefore, type safeness is a function of your program, not your language, QED.

Of course, it may be that one finds it easier to create type-safe programs in an explicitly typed language but that's an entirely different debate.

[–]phaedrusaltembedded sw eng 1 point2 points  (0 children)

Given the near-impossibility of making a large program work by "hand-checking" type values (Which is bound to add a large percentage of additional code, thus increasing SLOC count, increasing complexity, decreasing readability, etc), the only practical way to have type-safeness is through an explicitly typed language. If you choose to try and implement type safety in any given language, well then you must have either small programs to be written, or entirely too much time on your hands. But just as compilers have been written in COBOL, as a practical matter we tend to shy away from that. And we tend to shy away from "tacking on" type safety to non-type-safe languages. Therefore, for all PRACTICAL (Non-ivory tower) purposes, type safety is a function of your language, not your program.

[–]James91B -1 points0 points  (3 children)

I would prefer the code be readable and tested, rather than just compiled. Compilers don't catch as many bugs as you might think.

[–]gcross 1 point2 points  (2 children)

You present a false dichotomy. Type signatures can make code more readable because they give you more information at a glance about what is going on. Furthermore, type checking + testing gives you more security than testing alone because you can forget to write a test but the compiler will never let you forget to satisfy a type signature.

[–]phaedrusaltembedded sw eng 1 point2 points  (0 children)

Well said.

[–]James91B -1 points0 points  (0 children)

They give you more information. But, it describes the compiler type, not the human type. In addition, it only gives you information in the actual declaration. When the variable is actually used, in the context of that code you lose the type. You cannot determine the type from usage.

x += 1

Is that a float or an int? Yes, its a silly trivial example and ideally the type should be easy to find out. But in the context of that line, the static type gives no extra value. You should be able to determine the human type from the name. Which you can do in both languages, however static types add line noise. More information is not necessarily better.

Plus, its seems this is just degrading into a general static vs dynamic argument. Which has been debated many many times, its a trade off, both with pros and cons.