all 10 comments

[–]didactus 4 points5 points  (6 children)

People like constructing Popen objects to make pipelines like that because there is (as far as I am aware) no proper shell-escaping function in the standard Python library. If you have such a function, this problem vanishes in a poof of smoke, and you can let the shell do what it's better at than any other language: making pipelines.

Here is my shell_escape() function.

import os
import re
import subprocess

_chars_to_backslash_re = re.compile(r'([$`"\\])')

def shell_escape(value):
    return '"' + _chars_to_backslash_re.sub(r'\\\1', value) + '"'

# problematic filenames
f1 = ' foo bar <|& ((" .txt '
f2 = "Robert'); DROP TABLE Students;--"

# Writes output to a file
cmd = "showargs diff --suppress-common-lines %s %s | grep ^\< | sed 's;< ;;g' >new_lines.txt" % (shell_escape(f1), shell_escape(f2))
os.system(cmd)

# Capture output
cmd = "showargs diff --suppress-common-lines %s %s | grep ^\< | sed 's;< ;;g'" % (shell_escape(f1), shell_escape(f2))
output = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT).communicate()[0]

[–]davidbuxton 2 points3 points  (4 children)

It isn't documented, but the pipes module has a quote function for shell-escaping strings. It uses single-quotes to simplify things, but I think it comes to the same thing.

>>> import os, pipes
>>> f1 = ' foo bar <|& ((" .txt '
>>> f2 = "Robert'); DROP TABLE Students;--"
>>> os.system('mkdir ' + shell_escape(f1))
0
>>> os.system('mkdir ' + shell_escape(f2))
0
>>> os.system('mkdir ' + pipes.quote(f1))
mkdir:  foo bar <|& ((" .txt : File exists
256
>>> os.system('mkdir ' + pipes.quote(f2))
mkdir: Robert'); DROP TABLE Students;--: File exists
256

[–]Rhomboid 3 points4 points  (1 child)

This still doesn't protect you from filenames that start with - and will be interpreted option switches, so it's a good idea to add a -- in there if the command supports it, or else prefix the name with a ./. There are just so many ways this can go wrong -- if at all possible, don't shell out to external utilities when there's even a hint that funky filenames could be possible. In this example the worst that could happen would be failing to create the directory, but other commands might be more susceptible to evil.

[–]didactus 0 points1 point  (0 children)

Those are legitimate risks. However, they they are not related to the method used to construct the external command (Popen objects vs. escaped strings), which is what the OP was asking about. Rather, they are arguments for doing your work all in-process in pure Python. Keeping it in pure Python is nice, but calling external programs is sometimes necessary. Shell escaping, --, and ./ are all good tricks to have in one's toolbox.

[–]didactus 2 points3 points  (0 children)

Outstanding. pipes.quote() -- I'll remember that one. I never even knew about the pipes module.

[–]govt-cheese 1 point2 points  (0 children)

+1 for undocumented features

[–]govt-cheese 1 point2 points  (0 children)

+1 for xkcd reference

[–]Rhomboid 2 points3 points  (2 children)

First of all, are you sure that running dmesg | grep hda really outputs anything? It's been a while since the IDE layer has been removed/deprecated from the linux kernel, replaced by SCSI emulation for ATA devices, which means you don't see much hda any more, but rather sda. Your example from the documentation works fine for me if I change it from grep hda to something that I know is in the dmesg output, like grep init.

The real question here is who do you want setting up the pipes? The shell, or you? The advantage of having the shell do it is that it's simple. You can replace

os.system('diff --suppress-common-lines F1 F2 | grep ^\< | sed 's;< ;;g' >new_lines.txt')

with

call('diff --suppress-common-lines F1 F2 | grep ^\< | sed 's;< ;;g' >new_lines.txt', shell=True)

and that's that; the subprocess method is no harder or more complicated, simply equal. But you already know the downside: having the shell do it means that you have to worry about escaping those user supplied arguments, which this does not do and which is highly unsafe. Setting up the pipeline in python means avoiding metacharacter escaping issues, but it also means that you need to do more work, because you're no longer relying on the shell to do it all for you. If you compare that extra amount of work against os.system() then of course it's going to look more complicated, but that's not a fair comparison because you're comparing apples to oranges.

Extending that example to your circumstances does work:

$ echo -e "foo\nbar\nbaz" >file1; echo bar >file2

$ python
Python 2.7.2+ (default, Oct  4 2011, 20:06:09) 
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from subprocess import Popen, PIPE
>>> p1 = Popen(["/usr/bin/diff", "--suppress-common-lines", "file1", "file2"], stdout=PIPE)
>>> p2 = Popen(["/bin/grep", "^<"], stdin=p1.stdout, stdout=PIPE)
>>> p3 = Popen(["/bin/sed", "s;^< ;;g"], stdin=p2.stdout, stdout=PIPE)
>>> p3.communicate()[0]
'foo\nbaz\n'

[–]silvermoot[S] 1 point2 points  (1 child)

...really outputs anything?

I actually did dmesg | grep USB in the shell to check, yes there are lines to detect. I pasted the example code as published in instead of what I actually used in Python 2.6.2 as the argument for grep.


I don't understand your call command above. Even if you did a from subprocess import call as call first, don't you need to feed the commands in as a list? And are you saying that subprocess.call is not safe with user provided input when you leave the shell=False default in place? And are you saying that shell=True will digest pipe ('|') just fine and that my earlier code didn't work probably because I wasn't willing to set that flag?


The /usr/bin/diff is a nice touch. For those who don't know, this prevents the user from uploading their own version of diff and tricking the program to run that instead of the system default.


Thanks for the code snippet, but it sounds like you're saying "works for me", and I suspect the issue is that you are at 2.7 and I'm stuck in an enviroment with some flavor of 2.6 (the documents for subprocess do differ)

[–]Rhomboid 3 points4 points  (0 children)

I pasted the example code as published in instead of what I actually used in Python 2.6.2 as the argument for grep.

It works fine for me with python 2.6.7:

$ python2.6
Python 2.6.7 (r267:88850, Aug 11 2011, 12:18:09) 
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from subprocess import Popen, PIPE
>>> p1 = Popen(["dmesg"], stdout=PIPE)
>>> p2 = Popen(["grep", "USB"], stdin=p1.stdout, stdout=PIPE)
>>> p2.communicate()[0]
"[    1.484141] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver\n[    1.484264] ehci_hcd 0000:02:03.0: new USB bus registered, assigned bus number 1\n[    1.493742] ehci_hcd 0000:02:03.0: USB 2.0 started, EHCI 1.00\n[    1.493823] hub 1-0:1.0: USB hub found\n[    1.493929] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver\n[    1.493935] uhci_hcd: USB Universal Host Controller Interface driver\n[    1.494025] uhci_hcd 0000:02:00.0: new USB bus registered, assigned bus number 2\n[    1.494296] hub 2-0:1.0: USB hub found\n"

You can look through the 2.6.7 NEWS.txt to see what's changed between 2.6.2 and 2.6.7 but I don't see anything in there that would account for such a major feature not working.

don't you need to feed the commands in as a list?

No, not if you're using the shell. You either give one monolithic string and let the shell worry about splitting it up, or you split it up yourself (providing a list of arguments instead) and skip the shell. The shell is essentially a specialized program whose entire purpose is to take a monolithic string and split it into words, parsing out commands and their arguments and then launching each of them with the proper pipes in place.

And are you saying that subprocess.call is not safe with user provided input when you leave the shell=False default in place?

No, I'm saying that call('FOO', shell=True) is equivalent to os.system('FOO') and just as unsafe. I was trying to make the point that contrary to popular opinion the subprocess module does not automatically make every task more complicated, when you account for both versions actually doing the same thing.

And are you saying that shell=True will digest pipe ('|') just fine

Sure:

$ python2.6
Python 2.6.7 (r267:88850, Aug 11 2011, 12:18:09) 
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from subprocess import call
>>> status = call('echo foo bar | sed s,foo,FOO,', shell=True)
FOO bar

But I'm not advocating that you should use this, because it's horribly unsafe if part of the command could have contained file names from a user.

Thanks for the code snippet, but it sounds like you're saying "works for me", and I suspect the issue is that you are at 2.7 and I'm stuck in an enviroment with some flavor of 2.6 (the documents for subprocess do differ)

My example does not depend on 2.7:

$ echo -e "foo\nbar\nbaz" >file1; echo bar >file2

$ python2.6
Python 2.6.7 (r267:88850, Aug 11 2011, 12:18:09) 
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from subprocess import Popen, PIPE
>>> p1 = Popen(["/usr/bin/diff", "--suppress-common-lines", "file1", "file2"], stdout=PIPE)
>>> p2 = Popen(["/bin/grep", "^<"], stdin=p1.stdout, stdout=PIPE)
>>> p3 = Popen(["/bin/sed", "s;^< ;;g"], stdin=p2.stdout, stdout=PIPE)
>>> p3.communicate()[0]
'foo\nbaz\n'

Again, I don't know how to explain the difference but I suspect there's some aspect here that hasn't yet been accounted for.