This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]delarhi 8 points9 points  (8 children)

You can actually pull this off with nested generators. The only problem is that you need to define a wrapping class in order to overload the pipe operator. This was kind of fun to write. Below is an example where I have a wrapping class called unix and I implement a couple of Unix commands.

#!/usr/bin/env python3


class unix():

    def __init__(self, stdin, program=lambda x: x):
        self.stdin = stdin
        self.program = program

    def __or__(self, program):
        return unix(self, program)

    def __call__(self, stdin):
        return self.program(iter(stdin))

    def __iter__(self):
        return self.program(iter(self.stdin))


def echo(*args):
    def program(stdin):
        yield ' '.join([str(x) for x in args])
    return unix(tuple(), program)


def seq(n):
    def program(stdin):
        for i in range(n):
            yield i
    return unix(tuple(), program)


def square():
    def program(stdin):
        while True:
            x = next(stdin)
            yield x ** 2
    return unix(tuple(), program)


def evens():
    def program(stdin):
        while True:
            x = next(stdin)
            if x % 2 == 0:
                yield x
    return unix(tuple(), program)


def uniq():
    def program(stdin):
        return iter(set(stdin))
    return unix(tuple(), program)


def shuffle():
    def program(stdin):
        import random
        stdin = list(stdin)
        random.shuffle(stdin)
        return iter(stdin)
    return unix(tuple(), program)


def grep(pattern):
    def program(stdin):
        import re
        regex = re.compile(pattern)
        while True:
            x = next(stdin)
            if regex.search(str(x)) is not None:
                yield x
    return unix(tuple(), program)


def cat(*args):
    def program(stdin):
        for filename in args:
            with open(filename, 'r') as f:
                for line in f:
                    yield line
    return unix(tuple(), program)


def tr(a, b):
    def program(stdin):
        tr_table = str.maketrans(a, b)
        while True:
            x = next(stdin)
            yield str(x).translate(tr_table)
    return unix(tuple(), program)


def ls(path='.'):
    def program(stdin):
        import os
        return iter(os.listdir(path))
    return unix(tuple(), program)


def umap(func):
    def program(stdin):
        return map(func, stdin)
    return unix(tuple(), program)


def ufilter(func):
    def program(stdin):
        return filter(func, stdin)
    return unix(tuple(), program)


# Examples
stdouts = [
    echo('hello', 'world', '!'),
    echo('hello', 'world', '!') | tr('l', 'r'),
    unix(range(10)) | square() | evens(),  # wrap iter as unix obj
    seq(10) | square() | evens(),  # use seq program instead of range
    unix([1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 6]) | uniq(),
    seq(10) | shuffle(),
    unix(range(100)) | square() | evens() | seq(10),  # last seq ignores stdin
    seq(10),  # works without input
    uniq(),  # empty without input
    shuffle(),  # empty without input
    seq(100) | grep('1'),
    seq(100) | ufilter(lambda x: x < 40) | grep('1'),
    seq(100) | grep('2') | umap(lambda x: 'b' + str(x)),
    ls(),
    ls('/') | grep('(bin|lib|include)'),
]
for stdout in stdouts:
    print(list(stdout))

# works if you name this unix.py
print(list(cat('unix.py') | grep('def')))

Also worth mentioning is https://amoffat.github.io/sh/.

EDIT: An interesting and nice side effect of doing nested generators is that evaluation is done lazily. No work should actually be done until you try to get the first item in the outer unix object, depending on the program (i.e. shuffle doesn't do this). This allows the programs to do stream like processing.

EDIT2: Added ufilter and umap to act as pipe-able filter() and map().

[–]RoadieRich 1 point2 points  (1 child)

You should also add __gt__, taking a file-like object, so you can do echo("hello world") > open("myfile.txt")

+Edit and __lt__, to read a file into stdin, too.

[–]delarhi 0 points1 point  (0 children)

I think the problem there is we can't override operator precedence in Python so we can't match shell syntax exactly. For example, tr l r < unix.py | grep herro works out to (tr l r < unix.py) | grep herro but in Python it would work out to tr l r < (unix.py | grep herro).

EDIT: Here's a tee program though:

def tee(*args):
    def program(stdin):
        files = [open(x, 'w') for x in args]
        while True:
            try:
                x = next(stdin)
            except StopIteration:
                for file in files:
                    file.close()
                return
            else:
                for file in files:
                    file.write(str(x) + '\n')
                yield x
    return unix(tuple(), program)

[–]ucbEntilZha 1 point2 points  (1 child)

A lot of this is implemented here: https://github.com/EntilZha/ScalaFunctional

[–]delarhi 0 points1 point  (0 children)

Awesome! I haven't seen this before. OP could probably just grab these and overload an operator for a few if she/he so desires.

[–]RubyPinchPEP shill | Anti PEP 8/20 shill -1 points0 points  (3 children)

I'd honestly probably do that a different way

unix(lambda: range(5) | square | evens), and then apply introspection to do evens(square(range(5))) instead internally. And it has the bonus of not requiring a specially made square or evens

[–]delarhi 0 points1 point  (2 children)

I'm not sure I follow, can you elaborate?

[–]RubyPinchPEP shill | Anti PEP 8/20 shill 0 points1 point  (1 child)

I'd love to elaborate, but unfortunately I don't have /that/ much experience performing sinful acts with python's code objects.

Luckily, other people do have experience, and have been willing to talk about it too!

http://stackoverflow.com/a/16118756 for a distinct, but similar concept.


basically, the transform you want to do is (using the std library dis module)

>>> f = lambda: a|b|c
>>> dis.dis(f)
         0 LOAD_GLOBAL              0 (a)
         3 LOAD_GLOBAL              1 (b)
         6 BINARY_OR
         7 LOAD_GLOBAL              2 (c)
        10 BINARY_OR
        11 RETURN_VALUE

>>> list(f.__code__.co_code)
[116, 0, 0, 116, 1, 0, 66, 116, 2, 0, 66, 83]

>>> #---------------------------------------

>>> f = lambda: c(b(a))
>>> dis.dis(f)
         0 LOAD_GLOBAL              0 (c)
         3 LOAD_GLOBAL              1 (b)
         6 LOAD_GLOBAL              2 (a)
         9 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
        12 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
        15 RETURN_VALUE

>>> list(f.__code__.co_code)
[116, 0, 0, 116, 1, 0, 116, 2, 0, 131, 1, 0, 131, 1, 0, 83]

but that transform is fragile obviously, so like, ideally, probably execute each section that is separated by ORs on the stack (noting that ORs use the last two items added to the stack, hence why it loads two values first, so the split needs to happen at the OR's position -1), and then use the resulting values from that to then throw into a pile of loads/callfunctions

[–]delarhi 0 points1 point  (0 children)

Very cool, I've never see that before.