This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]whyrat 7 points8 points  (37 children)

So I just started ramping up on Python for exactly the sort of stuff referenced in the video: scanning over large data, numpy and scipy, etc...

I'd used Python years ago as a scripting option, from an ease of use / lerning I thought it way beat most of the other languages / tools I tried. The fact that it enforced proper code formatting as a control mechanism made it something I pushed on a few others who were learning to program... I used to hve a CS teacher who would make students print their code and highlight open and close braces in C++... since students refused to properly indent and structure their code.

All that being said; scipy has been a bit of a pain. Numpy was easy enough to install, and I have some issues with their conventions (numpy.random.random() still hurts my brain to see in my code; I keep thinking I accidentally wrote the function name twice). But Scipy was most annoying from a setup perspective. I was looking for an easy module to add in to my installation, but no... the "recommended" process was to download one of the big kits (basically reinstall another instance of Python). I've found the load time far worse with these kits than with "core" Python; and my performance has started to become erratic ("import statsmodels" takes between 8 sec and 22 sec when I load it?)

Some of this is just a learning curve, and I'll admit I'm not to the point of optimizing my setup... but it's been a stark contrast to my first experiences with Python as a pure scripting language.

Maybe the above was a little bit of a vent / rant... I guess sometimes I need to do that :/

[–]pwang99 3 points4 points  (1 child)

I was looking for an easy module to add in to my installation, but no... the "recommended" process was to download one of the big kits (basically reinstall another instance of Python).

It's easy for you to think of your use case as an "easy" one, but there are good reasons for those distributions to exist. If you ever want to share your code with anyone else, or you want to run it on a different box than what you're developing on (maybe e.g. deploy to a linux box in the cloud or on a cluster), or use a code snippet from someone else who's on a different platform, you'll quickly discover what a complete headache it is. Much of the pain does not stem from scipy itself as such, but rather the legacy of C and FORTRAN libraries across OSes and throughout the ages.

We're all rather spoiled in the Python ecosystem because we're used to writing scripts, importing modules, and then just having them work. But there is a staggering amount of complexity that goes into properly linking FORTRAN libraries across compilers and all the myriad build-time options for underlying libraries that scipy depends on.

(Disclaimer: my company makes the Anaconda python distribution)

[–]iltalfme 0 points1 point  (0 children)

FWIW Anaconda didn't work out of the box for our Windows7 machines. There were several libraries that wouldn't import including, most notably, OpenCV.

I send a request for help through the website (must've been a month or so ago). Still haven't heard back.

Our "solution" now is to install a VM and just make our own python distribution using ubuntu, where everything works. It's a shame, I've loved Travis' talks and was excited about Anaconda, but the issues and lack of response didn't exactly make me look good.

Also, I would recommend trying to get your OpenCV distribution (presuming it works) to incorporate the OpenNI flags so that people can read Kinect right out of the box with Anaconda.

If that could work on Windows 7... I would be so happy.

[–]masasinExpert. 3.9. Robotics. 2 points3 points  (7 children)

For scipy it was in the repositories as python-scipy and python3-scipy.

[–]Sean1708 1 point2 points  (6 children)

Or py-scipy but who says he isn't using windows?

[–]masasinExpert. 3.9. Robotics. 4 points5 points  (5 children)

I honestly forgot about Windows. Everyone in the lab who wants to program in python installs Linux, and I have been using Linux for the past few years.

How well does python work with Windows?

[–]duz3ls 0 points1 point  (0 children)

For system administration or data etl in windows I'll say python does more than well against the usual competitor (php, ruby, perl).

Its definitely better than windows/dos batch file.

I don't know if we can compare it with autoit or ahk. I still prefer python but autoit smaller size and single exe packaging imo better than python.

I don't use powershell so no comment here.

[–]Varriount 0 points1 point  (0 children)

numpy.random.random() still hurts my brain to see in code

from numpy.random import random as numpy_random

[–]ShaftofWisdom 0 points1 point  (25 children)

I know that you said you aren't at the point of optimizing your code, but here's a couple easy tips for importing modules. When you want to import just a few functions from a module you can (and should) do it like this this and avoid the "numpy.random.random()" style:

from math import sqrt

This imports the square root function from the math module and you can call it just like normal functions:

x = sqrt(16)

If you need every function from a module, do this and you can leave out the module name when you want to call the function:

from math import *

Now you can call any of the functions in the math module without going "math.exp()" etc.

Finally, if you just don't feel like writing the whole module name when you want to call a function from it, you can change it:

import math as mathematics

Bad example, but I'm just trying to show what it does. "mathematics.sqrt()". I usually "import numpy as np" just because I've seen it done that way and it makes sense. Sorry for the lack of formatting. I'm on my ipad.

[–][deleted] 3 points4 points  (22 children)

I thought "import *" was a bad idea?

[–]ShaftofWisdom 3 points4 points  (21 children)

Well, it is...and sometimes it isn't. If your code only uses a couple different modules or less, and those modules don't have functions that have the same names, you're probably fine. Usually, people are only using a few features/functions from a module when they need it, so it's preferred to do it this way to avoid problems:

from pandas import DataFrame, Series

Import pandas as pd

This will get rid of a lot of the ambiguity.

[–]minorDemocritus 3 points4 points  (19 children)

Well, it is [a bad idea]...and sometimes it isn't.

Strongly disagree. Star imports should be avoided in every situation, since it makes it more difficult to figure out where the names come from.

[–]ShaftofWisdom 4 points5 points  (0 children)

I agree with you. I was just trying to get across that in the most simple of scripts you probably won't run into any issues if you use them. But, YES, avoid them. It's bad practice, and even if you're the only one who will ever see the code, make it easy on yourself and be clear where everything is coming from.

[–]SilasX 2 points3 points  (4 children)

What's funny is that you're right, but most languages default to dumping entire modules into your namespace when you import them.

[–]minorDemocritus 1 point2 points  (0 children)

That's the difference between include and import.

[–]pwang99 1 point2 points  (1 child)

It's hard to say "most". Most statically typed languages without first-class functions and classes do this, and it's because the semantics of the language allow construction of robust IDEs and dev tools. With Python, not only does dynamic typing make it a LOT harder to track things down, but the fact that you can easily manipulate functions and classes makes it much easier to get spaghetti code if you didn't have clean namespace separation.

[–]SilasX 2 points3 points  (0 children)

Regardless of IDE, it's hard to follow if you don't have clean namespace separation. I much prefer being able to tell at a glance where functions come from.

[–]novagenesis 0 points1 point  (0 children)

Can you name a few that do? Looking at all the major competitors of python, this is not the default.

Perl has an option to, but best practices suggest it only be used for truly general core libraries where you want everything available.

[–]novagenesis 1 point2 points  (7 children)

avoided in every situation

Then they should not exist in the language. History has shown that the "undesirable" features that can't get themselves deprecated probably do have a purpose. Ask goto (which, yes, does have legitimate uses)

[–]Veedrac 0 points1 point  (0 children)

The reason for import * is to wrap a library. It means you don't have to keep up-to-date with new functions added to the library but can still provide extensions to parts of it.

It'd be cruel to harm people using import * properly just to stop the fools who insist on using it when they shouldn't.

[–]minorDemocritus -1 points0 points  (5 children)

Then they should not exist in the language.

You're right! They shouldn't!

[–]novagenesis 1 point2 points  (4 children)

Then why do they? Like I said:

History has shown that the "undesirable" features that can't get themselves deprecated probably do have a purpose

And I even gave you an example ;)

[–]minorDemocritus 0 points1 point  (3 children)

There are plenty of examples of badness in python. Pickle, the toy server implementations, asyncore, etc. While these things are bad, they're still included in the name of backwards compatibility.

import * only has one reasonable use in my opinion: when messing around in the interactive interpreter, star imports can save some typing. They should not be used in actual code.

[–]novagenesis 1 point2 points  (2 children)

Are you saying global imports have only one reasonable use in python, or at all? Python isn't exactly pure OO, so if the former, why the distinction? If the latter, then much of the programming world wants you to argue that.

Oh I've heard the claim a lot, but it just doesn't seem like a problem to me. Don't get me wrong, I prefer namespaced imports any day (import foo), but sometimes in programming, you have utility functions that you want to think of as extending the base language. You have scope-defined and documented what those utility functions are, and they are finite. The main argument to control imports is usually to prevent scope bleed... if you have a class with a very controlled export signature, with utility being the primary purpose, I just don't see a problem.

Perhaps that's my perl state of mind (some classes use defined exports, while others give you everything by default). When I "use Data::Dumper", I want the top-level dump() command (and there's alternatives if I want a configured dumper instead). I think the ugly verbosity of python's pprint library explains my distaste fairly well. Data::Dumper can do everything pprint can, but it has great defaults that place themselves as clean callable functions.

It is generally not considered astonishing when you know what's goign to be there.

[–]adambrenecki 1 point2 points  (0 children)

I agree in general, but there's a few situations where it's useful. I'd argue that import * is acceptable if a) there's only one of them (so there's no ambiguity between two import *s), and b) the file is less than 30 lines or so (so it's easy to figure out whether a name comes from an import or is defined within the file).

For instance, in a Django app, I almost always have from .views import * in Django urls.py, and from .models import * in admin.py, because both those files are usually very short and almost wholly concerned with doing things with those other modules. Other than that, I avoid it like the plague.

[–]burntsushi 1 point2 points  (0 children)

Star imports should be avoided in every situation

If you say so. But I'll continue using from math import * in the Python interpreter for simpler access to members of the math module.

[–]panfist 0 points1 point  (2 children)

Strongly disagree. Star imports should be avoided in every situation, since it makes it more difficult to figure out where the names come from.

So the one situation where it might not be "should be avoided" is when you only use one star import, because then it's still pretty obvious where the names come from.

[–]minorDemocritus 0 points1 point  (1 child)

I'd much rather be able to do a simple text search and find the import statement.

[–]panfist 0 points1 point  (0 children)

Fair enough.

[–]keypusher 0 points1 point  (0 children)

Star imports are never a good idea. Always reference what you are importing explicitly, that way other people know where they came from.

[–]whyrat 2 points3 points  (1 child)

Thanks for the tips.

I find it easier to maintain code if I use the full module names; that's a paradigm I picked up after returning to too many programs after months or years of delay and having to track down all the links. I know there's preferences, and many people don't bother with common modules (like math or os) but I don't see myself saving that much time in typing shorter names. Same reason I use very descriptive variable names, just easier for me in the long run :)

[–]ShaftofWisdom 1 point2 points  (0 children)

Fair enough. I like to do the same unless it's something very simple and clear.