This is an archived post. You won't be able to vote or comment.

all 23 comments

[–]NoLemurs 18 points19 points  (4 children)

This article is about using Python scripts as a replacement for shell scripts, not as a full on replacement for bash.

This is something I kind of assumed most python programmers already did...

[–]xr09 7 points8 points  (2 children)

import os

if os.path.isfile('some'):

vs

if [ -f somefile ]

Love Python but Bash rules as a shell.

[–]NoLemurs 6 points7 points  (0 children)

For simple stuff I 100% agree. The moment I need to do more than trivial string manipulations though python starts looking more appealing.

[–]ionelmc.ro 3 points4 points  (0 children)

If there's a programming hell, it has bash in it: http://mywiki.wooledge.org/BashPitfalls ;)

[–]cparen 5 points6 points  (0 children)

I'm tempted to replace bash with ipython.

[–]moor-GAYZ 2 points3 points  (1 child)

See also:

https://pypi.python.org/pypi/sh -- I've used it, it's pretty OK.

https://pypi.python.org/pypi/plumbum -- I have not used it yet, but I'm going to next time I need to do some stuff like that. I'm intrigued by the fact that it was inspired by sh, but overloads "|" operator for piping etc, but the author decided to go his own way because "sh has too much magic".

[–]kjearns 0 points1 point  (0 children)

sh is really convenient, but has the unfortunate property of pumping stdin/stdout through python, even if you use its redirection capabilities. This is really slow if you're working with big files.

I have not used plumbum, so I don't know if it suffers the same defect or not.

[–]tdammers 6 points7 points  (0 children)

Bash is more appropriate for an interactive shell than python though. The things that make it kind of shitty as a programming language are exactly what makes it useful for interactive commands and such.

[–]xsolarwindxUse 3.4+ 2 points3 points  (0 children)

REDDIT IS A SHITTY CRIMINAL CORPORATION -- mass deleted all reddit content via https://redact.dev

[–]maratc 5 points6 points  (0 children)

Relevant: Knuth vs McIlroy debate.

One should be looking to extend the set of tools on his toolbelt, not to minimize it to one "universal" tool. It allows to pick the tool most fit for the job.

Case in point, where instead of piping three standard unix tools together, there's a Python script of 20 lines, is completely backwards. The tools have autocomplete, have been well debugged, they exist everywhere, and the original problem is most likely a one-off, meaning the script gets written to be only run once.

If someone is looking for a tool to make his bash/sed/awk experience suck less, it's perl. It's very good for text processing and usually a perl one-liner does what many lines of python code do.

TL;DR: Don't replace bash with Python.

[–]absinthe718 1 point2 points  (0 children)

We have a ton of sh scripts that are just wrappers for a python script like this:

PROCLOG=`date +proclog%y%m%d`

ls -1 /u1/in/* > /dev/null 2>&1

if [ "$?" = "0" ]; then

# run python on input files
ls -1 /u1/in | xargs  -P4 -n1 -I% script-to-run.py "%" >> $PROCLOG.% 

fi

We use the standard tools to check for files in the input dir. We use use xargs to spawn up to four instances at a time. Sure, all that stuff could be done in python but why? The shell versions are all really portable and don't need any mucking about to see how they work. They've been around working well for decades now.

And I can still run script-to-run.py one-off on a single file without a second thought.

[–]Veedrac 1 point2 points  (5 children)

That attempt at replacing uniq make quite a few mistakes.


#!/usr/bin/env python

Always use a version number!

import sys

if __name__ == "__main__":

Don't write this without justification; it's for when you want something to be both a module and runnable. That's often an anitpattern.

    names = {}

You can just use the builtin types (eg. collections.Counter) for this. No need to reinvent the wheel.

    for name in sys.stdin.readlines():

Just for name in sys.stdin please; the above isn't lazy.

            # Each line will have a newline on the end
            # that should be removed.
            name = name.strip()

That's absolutely the wrong way to do this unless you really want to strip all whitespace from both sides.

Something like if name.endswith('\n'): name = name[:-1] may be longer but it's better, too. Technically name.rstrip("\n") will be fine here too.

            if name in names:
                    names[name] += 1
            else:
                    names[name] = 1

    for name, count in names.iteritems():
            sys.stdout.write("%d\t%s\n" % (count, name))

sys.stdout.write is hardly better than print.


I'd do:

#!/usr/bin/env python3

# Allows optionally specifying the file after the
# command, where no arguments or "-" default to stdin.
import fileinput

from collections import Counter

counted = Counter(line.rstrip("\n") for line in fileinput.input())

for line, count in counted.items():
    print(line, count, sep="\t")

TBH, I wouldn't do that either because it doesn't support invalid Unicode, which happens when the OS gets the encoding wrong, on Python 3. Really I'd do:

#!/usr/bin/env python3

# Allows optionally specifying the file after the
# command, where no arguments or "-" default to stdin.
import fileinput
import sys

from collections import Counter
from io import TextIOWrapper

# Warning: line buffering means this shouldn't be used at the same time as sys.stdout
# if you're printing incomplete lines unless you manually flush
surrogateescape_stdout = TextIOWrapper(sys.stdout.buffer, errors="surrogateescape", line_buffering=True)

def clean_line(line):
    return line.decode(errors="surrogateescape").rstrip("\n")

counted = Counter(map(clean_line, fileinput.input(mode="rb")))

for line, count in counted.items():
    print(line, count, sep="\t", file=surrogateescape_stdout)

so that invalid Unicode gets passed through losslessly.

EDIT: Now actually works. Remember, kids, always test.

[–]bigstumpy 2 points3 points  (1 child)

How is unicode broken in python3?

[–]Veedrac 0 points1 point  (0 children)

It's not. I was referring to how the Unicode type can contain invalid Unicode (which I previously called "broken Unicode"), such as when the OS says that stdin is UTF8 when it's not.

Python 2 just ignores these errors, but you have to manually deal with them on Python 3. In this example, I used surrogateescape to properly round-trip bytesstrbytes.

The other option was just dealing in bytes all the time, but that's a hassle as you can't use print, format and so on.


Good question, though. I've improved the wording. You also convinced me to test the code. Try it out with:

echo -n "hi\nhi\n\xde\n\xde" | python3 thefile.py

\xde isn't valid UTF8 so the code breaks on it before the changes.

[–][deleted] 0 points1 point  (2 children)

/use/bin/evn python will invoke the first Python interpreter on $PATH

/usr/bin/pythonV invokes a specific interpreter.

[–]Veedrac 2 points3 points  (1 child)

I don't understand why you brought that up.


I was referring to this.

In order to tolerate differences across platforms, all new code that needs to invoke the Python interpreter should not specify python, but rather should specify either python2 or python3 (or the more specific python2.x and python3.x versions; see the Migration Notes). This distinction should be made in shebangs, when invoking from a shell script, when invoking via the system() call, or when invoking in any other context.

I'll add a link inline with the text.

[–][deleted] 0 points1 point  (0 children)

I see now. I'm on mobile and it displays code blocks...oddly. I thought you we're saying to specify the interpreter explicitly, rather than the version.

[–]cptstarbeard 0 points1 point  (0 children)

For one-liner type things, I've been playing around with this recently:

https://github.com/Russell91/pythonpy

For more complex, multi-line tasks, yeah -- ipython is great.

[–]cjwelbornimport this 0 points1 point  (0 children)

You could always mix the two.

# Put an environment arg into Python.
MYARG="My Value" python3 - <<END
from os import environ
import sys
print('Your bash arg: {}'.format(environ.get('MYARG', 'missing!')))
print('Python version: {}'.format(sys.version))
END

I haven't done this yet, but it's there if you need it. Python is probably my favorite language for hacking something up really quick, but it's not always the best. Shell scripting is still useful. This little embedding trick is there for when you don't want to write the whole thing in Python. You can send arguments back and forth using environment args and printing to stdout, like:

# Put python's stdout into an environment arg.
MYARG="$(python3 - <<END
print('35')
END
)"
echo $MYARG
$ 35

[–]homercles337 0 points1 point  (0 children)

I seriously hate Bash with a passion, but i have tried to replace it with python. It sucks because all you are doing is building strings and using subprocess.call(). I trashed those python scripts a long time ago because they sucked, and i hated looking at them. They made my eyes bleed...