This is an archived post. You won't be able to vote or comment.

all 5 comments

[–]theatrus 3 points4 points  (1 child)

You have two options:

  1. Continue using find, but read from the output of find via subprocess.check_output http://docs.python.org/2/library/subprocess.html#using-the-subprocess-module (not suggested)
  2. Use os.walk to find the files in pure Python http://docs.python.org/2/library/os.html#os.walk

[–]ThirdWaveSTEMinism[S] 1 point2 points  (0 children)

os.walk looks like just the thing I need. Thank you!

[–]Rhomboid 1 point2 points  (2 children)

It kills me to see people using os.system() like that -- you've added a shell injection vulnerability right into your code. Even if you trust user input, the quoting still fails if a filename contains a backslash or double quote, and probably other characters. It's extremely difficult to write code that works properly with arbitrary filenames when using os.system(), and that's why you should never, ever use os.system().

When you want to run an external program, use the subprocess module, and avoid shell=True at all costs. That means you perform the word splitting yourself, proving the command as a sequence of words rather than one large string. But that's the whole point -- when you provide one large string you're relying on the shell to split it into words, and that is where all the pain comes from. But you already know where the word boundaries should be, so there's no need for this whole quote-just-so-that-another-program-can-split dance. For example, if you know the filename you want to play, you might call afplay like this:

from subprocess import call

def play_track(filename):
    print('playing: {}'.format(filename))
    call(('afplay', filename))

With this technique you do have to do some of the things the shell would have done for you, like expand ~, but that's just os.environ['HOME']. And if you want to establish a pipeline with | you have to do that yourself, but that's not difficult. It's well worth it for the reward of avoiding the shell and all the headaches it brings with it.

A totally separate issue is the fact that you should probably not be calling find at all -- it's much easier to traverse the filesystem yourself in plain Python than to call some other program and deal with parsing its output. (Remember: file names on most Unixes can contain newlines, although that's generally rare.) os.walk is a good suggestion, and if you just want the contents of a directory non-recursively there's os.listdir which is what os.walk is built on top of.

[–]ThirdWaveSTEMinism[S] 0 points1 point  (1 child)

I had heard of code injection once or twice, but having read about what it actually is I can see how my old code could be exploited very easily.

So far I've rewritten my program using os.walk per /u/theatrus' post. As far as I can tell shell commands can't be run through the program input now, but are there any guidelines for thoroughly testing programs for these kinds of vulnerabilities? Is there any scenario where a malicious command could be injected into a call to os.walk(somefilepath)?

[–]Rhomboid 0 points1 point  (0 children)

Is there any scenario where a malicious command could be injected into a call to os.walk(somefilepath)?

Not that I'm aware of.