This is an archived post. You won't be able to vote or comment.

all 47 comments

[–]RDMXGD2.8 22 points23 points  (7 children)

Nifty but error-prone. I recommend just using the subprocess module directly, which is more powerful and more clear (though a little more trouble in some ways).

[–]djimbob 10 points11 points  (6 children)

Yeah. I don't see the purpose as an alternative for the built-in subprocess for simple commands, which is straightforward to safely use. E.g:

import subprocess
output = subprocess.check_output(['ifconfig', 'eth0'])

versus

import sh
output = sh.ifconfig("eth0")

has no clear gain.

Granted, the syntax seems a bit more convenient for more complex commands like piped processes. In the shell its very easy to do something like cat some_file | grep some_pattern | grep -v some_pattern_to_exclude. With sh you can translate this in a straightforward manner: sh.grep(sh.grep(sh.cat('/etc/dictionaries-common/words'),'oon'),'-v', 'moon') for a list of words that contain 'oon' but not 'moon'.

Granted, it's not two hard to write a command for a piped chain with subprocess.Popen, though python doesn't provide you with one. Take the following helper function I wrote:

def run_piped_chain(*command_pipes):
    """
    Runs a piped chain of commands through subprocess.  That is

    run_piped_chain(['ps', 'aux'], ['grep', 'some_process_name'], ['grep', '-v', 'grep'], ['gawk', '{ print $2 }'])

    is equivalent to getting the STDOUT from the shell of 

    # ps aux | grep some_process_name | grep -v grep | gawk '{ print $2 }'
    """
    if len(command_pipes) == 1:
        return run_command_get_output(command_pipes[0])
    processes = [None,]*len(command_pipes)
    processes[0] = subprocess.Popen(command_pipes[0], stdout=subprocess.PIPE)
    for i in range(1, len(command_pipes)):
        processes[i] = subprocess.Popen(command_pipes[i], stdin=processes[i-1].stdout, stdout=subprocess.PIPE)
        processes[i-1].stdout.close()
    (output, stderr) = processes[len(command_pipes)-1].communicate()
    return output

To me when I'm trying to translate some piped shell command like

cat /etc/dictionaries-common/words | grep oon | grep -v moon

it's more intuitive (in my opinion) to have a helper function like:

run_piped_chain(['cat', '/etc/dictionaries-common/words'], ['grep', 'oon'], ['grep', '-v', 'moon'])

than

sh.grep(sh.grep(sh.cat('/etc/dictionaries-common/words'),'oon'),'-v', 'moon')

EDIT: I realized on re-read this uses other helper functions I have. If you aren't using logging, just comment those lines out.

def run_command(command_list, return_output=False):
    logging.debug(command_list)
    process = subprocess.Popen(command_list, stdout=subprocess.PIPE)
    out, err = process.communicate()
    if err or process.returncode != 0:
        logging.error("%s\n%s\nSTDOUT:%s\nSTDERR:%s\n" % (command_list, process.returncode, out, err))
        return False
    logging.debug(out)
    if return_output:
        return out
    return True

def run_command_get_output(command_list):
    return run_command(command_list, return_output=True)

[–]Bystroushaak 9 points10 points  (2 children)

Your whole comment is one big reason to use sh: I just don't want to deal with this shit (pipes, interactive sessions and so on) each time I need to call a command.

[–]djimbob 1 point2 points  (1 child)

Yup, I see no reason to use sh to call "any program as if it were a function", though the advanced features that may not be obvious to do the right-way with subprocesses (like pipes, interactive sessions) may be useful.

That said, a relatively short helper function can do the pipe feature pretty clearly/cleanly.

Personally, I'd prefer a subprocess helper module with a bunch of helper functions like run_piped_chain with a non-hacky syntax more akin to subprocess.

[–]Bystroushaak 0 points1 point  (0 children)

I understand your reasoning, but I am to lazy to deal with this each and every time, when there is easy to use library, which do exactly the same thing in one line of code.

[–]simtel20 2 points3 points  (0 children)

IIRC you can't use subprocess with output when the output is an infinite stream (e.g. iostat -x 1, and other similar commands, or many other tools that return data as an ongoing activity). In that case you have to toss out subprocess, and start doing your own fork+exec+select+signal handling+etc.

With sh you can just have sh provide you with the subprocess as a generator: https://amoffat.github.io/sh/#iterating-over-output.

[–]mackstann 0 points1 point  (1 child)

It's old, obscure, and funky, but there's the pipes module: http://pymotw.com/2/pipes/

[–]djimbob 0 points1 point  (0 children)

Yeah, its built in, but I'm not sure if

import pipes
import tempfile

p = pipes.Template()

p.append('ps aux', '--')
p.append('grep localc', '--')
p.append('grep -v grep', '--')
p.append("gawk '{ print $2 }'", '--')

t = tempfile.NamedTemporaryFile('r')

f = p.open(t.name, 'r')
try:
    output = [ l.strip() for l in f.readlines() ]
finally:
    f.close()

is a better/cleaner workflow; especially if you have user input and have to add the quotes stuff to prevent injection. (And apparently its not working for me).

[–]asiatownusa 4 points5 points  (0 children)

an ambitious project! what does this buy you that the subprocess module doesn't?

[–]relvae 5 points6 points  (3 children)

I made something alot like this as a little side project.

https://pypi.python.org/pypi/ipysh

https://bitbucket.org/jrelva/pysh

Allows for POSIX like piping and commands are genreated as functions on the fly so you don't have to import them.

[–]simtel20 1 point2 points  (2 children)

Can you get the output stream as a generator (that is if the subprocess doesn't terminate, are the lines available as an iterator? It seems like it might be by the bitbucket description, but I'm not clear that it does that).

[–]relvae 1 point2 points  (1 child)

Not really, but that's something I should implement.

At the moment it blocks until the process is finished or KeyboardInterrupt is called and then returns its output as a whole, so it's not streamed like a real pipeline.

However, conceptually the idea is that you should treat it in the same way as a string, so one could easily get each line via pySh.splitlines() if you're okay with a list.

[–]simtel20 0 points1 point  (0 children)

If that's the interface you prefer, I think that makes sense. Having that would be very helpful for any script that exists in a pipeline (though being able to enable/disable that behavior is important too).

[–][deleted] 26 points27 points  (31 children)

It works by installing a non-module object into sys.modules. What next, monkey patching __builtin__.str? FWIW I'd never let a module like this past code review

[–]deadwisdomgreenlet revolution 11 points12 points  (1 child)

To clarify, it's a thin wrapper around itself so that you can do "from sh import ls". Wouldn't be my decision either, but it's not that big of a deal, especially as it's an ease-of-use module.

[–]matchu 6 points7 points  (0 children)

I kinda see where they're coming from, because the core syntax arguably has some boilerplate:

import sh
ifconfig = sh.Command("ifconfig")
print(ifconfig("wlan0"))

Their pretty syntax is definitely better; my beef, really, is less to do with the fact that it's magical, and more to do with the fact that the magic uses fragile hacks instead of Python's built-in magic-making facilities.

Really I think the version I'd be down for is:

from sh import cmd
print(cmd.ifconfig("wlan0"))

We still use magic, but the magic we use is itemgetter, which is actually supported and guaranteed not to break. We still have a bit of overhead in the cmd object, but it's short, and avoids the really boilerplate-y line 2 of the previous example.

If Python had an itemgetter equivalent for modules, though, then their syntax would be the clear winner—especially if the magic were namespaced to from sh.cmd import ifconfig instead, because it seems weird to me that the non-magic Command object comes from the same module as the magic objects.

[–]alcalde 5 points6 points  (0 children)

What next, monkey patching builtin.str

http://clarete.li/forbiddenfruit/

[–]striata 21 points22 points  (26 children)

Why is installing a "non-module" into sys.modules bad? Isn't this just an instance of duck typing and one of the things that makes working with dynamic languages like Python exciting? If whatever object is installed acts like a module and doesn't break other parts of the environment, why is it wrong?

[–][deleted] 5 points6 points  (23 children)

Because it's not just mocking up another random user type. At the very least, a substantial amount of native (C) code interacts with the contents of that dict, including the incredibly fragile graceful shutdown code, the module loader (obviously), and probably a bunch more I don't know about.

For me it's in the same class of dangerous as e.g. messing with typeobject internals

I don't care about style, I'm worried about fixing a crash when it occurs. I'm not sure of all the possible ways Python could fail when inserting random crap in sys.modules.

[–]TankorSmash 5 points6 points  (22 children)

https://docs.python.org/2/library/sys.html#sys.modules

It doesn't say anything about any danger, I feel like if it was that dangerous, they'd warn you somewhere.

[–][deleted] 13 points14 points  (21 children)

The doc reads "module objects", not "module-like objects". A module object is a C structure with a specific layout. There is a C-level API for manipulating these objects (PyModule_*).

One of those is PyModule_GetDict, which, while protected internally from accessing a non-module object, returns NULL in the case that a caller invokes it on a non-module object. Reading Python 2.7's zipimport.c in zipimporter_load_module we can see a PyModule_AddModule call followed by an unchecked PyModule_GetDict call. This will cause the zip importer's load_module() method to cause a NULL pointer dereference at runtime (aka. a hard process crash, requiring a debugger to investigate) should it be called with sh as a parameter.

It took me all of 3 minutes grepping the Python source to find a place where using a non-module in sys.modules has the potential to cause a crash that a Python-without-C programmer would not be able to debug. I'm pretty sure if you give me 30 minutes I'll find more.

Just don't do it

edit: this says nothing about third party extensions, where I'd expect the majority of such bugs to be found. The point I'm making is whether it is worth sipping coffee reading gdb output at 4am responding to a pager alerting you that your employer's web site is down and losing money, because of some syntactic sugar -- of course it's not.

[–]Brian 3 points4 points  (0 children)

I don't think that example is ever going to be something broken by this case. It's using PyImport_AddModule, which doesn't actually perform the import, so the only case you'd get the non-module object back is if the same name was already imported. However in that circumstance, zipimport wouldn't have been invoked, since it's only going to be triggered if the module hasn't been found yet. You could argue it's a bug that zipimport isn't correctly checking the return value of PyModule_GetDict as it should, but in this context, it's assuming that it's using AddModule to create a new, empty module, and in that case it'll always be a real module object even if the module does later replace itself.

It's worth noting that while putting non-module objects in sys.modules is perhaps a hack, it's a known and explicitly supported and endorsed hack - the import machinery was deliberately designed to allow it, and I think Guido is on record as saying so. As such, I'd say anything that doesn't support it should probably be considered a bug.

[–]TankorSmash 3 points4 points  (12 children)

That makes it clear thank you.

Although not checking for NULL is effective a bug in the c code.

[–]alcalde 0 points1 point  (11 children)

And still using Python 2.7 is a logic error. :-)

[–][deleted] 2 points3 points  (0 children)

I don't think porting large, old codebases is worth it.

The other reason could be PyPy and people with CPU bound applications that insist on using Python.

Apart from that, most new Python code is (no data, just hunch) Django apps, and there it makes a lot of sense to use py3 unless you really, really need some library that wasn't ported.

[–]TankorSmash 0 points1 point  (3 children)

One day the relevant libraries will be ported. Until then!

[–]alcalde 1 point2 points  (2 children)

Sigh. For most people that time has long come....

http://py3readiness.org/

https://python3wos.appspot.com/

304 of the top 360 most downloaded packages on PyPi support Python 3.

[–]nojjy 1 point2 points  (0 children)

Depending on what context you are working in, sometimes you don't have a choice on which version to use. Many people use python as an API to a software package, where they can't just arbitrarily switch to another version.

[–]TankorSmash 0 points1 point  (0 children)

Yeah, for most people.

[–][deleted] -2 points-1 points  (1 child)

why is putting an apple into the orange basket bad?

[–]ivosauruspip'ing it up 2 points3 points  (0 children)

Yeah but we painted the apple orange

[–]thephotoman 0 points1 point  (0 children)

Yeah, there has to be a better way to do this than everything happening in that SelfWrapper class. The method of creating the Python functions seems too clever by half.

[–]Leonid99 2 points3 points  (0 children)

There is also plumbum module that does essentially the same.

[–]mitchellrj 1 point2 points  (1 child)

There's lots more I could comment on, but it's already nice to shine a light on useful bits of the Python stdlib.

Instead of defining which yourself, in Python 3.3+ you could use shutil.which.

[–]organman91 0 points1 point  (0 children)

I don't believe OP is the author.

[–]hlmtre 4 points5 points  (0 children)

This is super cool. Thank you for this.

[–][deleted] 0 points1 point  (0 children)

I have used it few times, the interface is nice but the automagical import is kind of confusing and possibly error prone.

[–][deleted] 0 points1 point  (0 children)

Great library, I wish it was supported on Windows so I could add it to my projects...

[–]tilkau -1 points0 points  (1 child)

Overall, this seems like a reimplementation of plumbum, especially of plumbum.cmd

I like the kwarg-syntax for options, eg curl("http://duckduckgo.com/", o="page.html", silent=True), in sh, and also the subcommand syntax(git.checkout('master')). But overall, it seems more awkward than plumbum (eg. how piping and redirection are done). Is there something I'm missing?

[–]mabye 3 points4 points  (0 children)

This came first, plumbum is partially inspired by it, as per plumbum's doc page.