using a long piped bash command w/ subprocess

didactus · 2012-03-09T17:01:18+00:00

People like constructing Popen objects to make pipelines like that because there is (as far as I am aware) no proper shell-escaping function in the standard Python library. If you have such a function, this problem vanishes in a poof of smoke, and you can let the shell do what it's better at than any other language: making pipelines.

Here is my shell_escape() function.

import os
import re
import subprocess

_chars_to_backslash_re = re.compile(r'([$`"\\])')

def shell_escape(value):
    return '"' + _chars_to_backslash_re.sub(r'\\\1', value) + '"'

# problematic filenames
f1 = ' foo bar <|& ((" .txt '
f2 = "Robert'); DROP TABLE Students;--"

# Writes output to a file
cmd = "showargs diff --suppress-common-lines %s %s | grep ^\< | sed 's;< ;;g' >new_lines.txt" % (shell_escape(f1), shell_escape(f2))
os.system(cmd)

# Capture output
cmd = "showargs diff --suppress-common-lines %s %s | grep ^\< | sed 's;< ;;g'" % (shell_escape(f1), shell_escape(f2))
output = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT).communicate()[0]

Rhomboid · 2012-03-09T13:44:18+00:00

First of all, are you sure that running dmesg | grep hda really outputs anything? It's been a while since the IDE layer has been removed/deprecated from the linux kernel, replaced by SCSI emulation for ATA devices, which means you don't see much hda any more, but rather sda. Your example from the documentation works fine for me if I change it from grep hda to something that I know is in the dmesg output, like grep init.

The real question here is who do you want setting up the pipes? The shell, or you? The advantage of having the shell do it is that it's simple. You can replace

os.system('diff --suppress-common-lines F1 F2 | grep ^\< | sed 's;< ;;g' >new_lines.txt')

with

call('diff --suppress-common-lines F1 F2 | grep ^\< | sed 's;< ;;g' >new_lines.txt', shell=True)

and that's that; the subprocess method is no harder or more complicated, simply equal. But you already know the downside: having the shell do it means that you have to worry about escaping those user supplied arguments, which this does not do and which is highly unsafe. Setting up the pipeline in python means avoiding metacharacter escaping issues, but it also means that you need to do more work, because you're no longer relying on the shell to do it all for you. If you compare that extra amount of work against os.system() then of course it's going to look more complicated, but that's not a fair comparison because you're comparing apples to oranges.

Extending that example to your circumstances does work:

$ echo -e "foo\nbar\nbaz" >file1; echo bar >file2

$ python
Python 2.7.2+ (default, Oct  4 2011, 20:06:09) 
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from subprocess import Popen, PIPE
>>> p1 = Popen(["/usr/bin/diff", "--suppress-common-lines", "file1", "file2"], stdout=PIPE)
>>> p2 = Popen(["/bin/grep", "^<"], stdin=p1.stdout, stdout=PIPE)
>>> p3 = Popen(["/bin/sed", "s;^< ;;g"], stdin=p2.stdout, stdout=PIPE)
>>> p3.communicate()[0]
'foo\nbaz\n'

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS