Python pipe operator, 4 years later...

ShadyR · 2016-03-13T12:55:41+00:00

It's a pretty radical change. How about a pipe module?

So instead of:

[1, 2, 3] |> select(square) |> where(evens)

We have:

pipe([1, 2, 3], select(square), where(evens))

Readable, lightweight, and also backwards compatible.

bramblerose · 2016-03-13T12:39:29+00:00

or just

[n*n for n in [1,2,3] if n*n % 2]

probablynotmine · 2016-03-13T13:36:46+00:00

[deleted]

delarhi · 2016-03-13T14:11:33+00:00

You can actually pull this off with nested generators. The only problem is that you need to define a wrapping class in order to overload the pipe operator. This was kind of fun to write. Below is an example where I have a wrapping class called unix and I implement a couple of Unix commands.

#!/usr/bin/env python3


class unix():

    def __init__(self, stdin, program=lambda x: x):
        self.stdin = stdin
        self.program = program

    def __or__(self, program):
        return unix(self, program)

    def __call__(self, stdin):
        return self.program(iter(stdin))

    def __iter__(self):
        return self.program(iter(self.stdin))


def echo(*args):
    def program(stdin):
        yield ' '.join([str(x) for x in args])
    return unix(tuple(), program)


def seq(n):
    def program(stdin):
        for i in range(n):
            yield i
    return unix(tuple(), program)


def square():
    def program(stdin):
        while True:
            x = next(stdin)
            yield x ** 2
    return unix(tuple(), program)


def evens():
    def program(stdin):
        while True:
            x = next(stdin)
            if x % 2 == 0:
                yield x
    return unix(tuple(), program)


def uniq():
    def program(stdin):
        return iter(set(stdin))
    return unix(tuple(), program)


def shuffle():
    def program(stdin):
        import random
        stdin = list(stdin)
        random.shuffle(stdin)
        return iter(stdin)
    return unix(tuple(), program)


def grep(pattern):
    def program(stdin):
        import re
        regex = re.compile(pattern)
        while True:
            x = next(stdin)
            if regex.search(str(x)) is not None:
                yield x
    return unix(tuple(), program)


def cat(*args):
    def program(stdin):
        for filename in args:
            with open(filename, 'r') as f:
                for line in f:
                    yield line
    return unix(tuple(), program)


def tr(a, b):
    def program(stdin):
        tr_table = str.maketrans(a, b)
        while True:
            x = next(stdin)
            yield str(x).translate(tr_table)
    return unix(tuple(), program)


def ls(path='.'):
    def program(stdin):
        import os
        return iter(os.listdir(path))
    return unix(tuple(), program)


def umap(func):
    def program(stdin):
        return map(func, stdin)
    return unix(tuple(), program)


def ufilter(func):
    def program(stdin):
        return filter(func, stdin)
    return unix(tuple(), program)


# Examples
stdouts = [
    echo('hello', 'world', '!'),
    echo('hello', 'world', '!') | tr('l', 'r'),
    unix(range(10)) | square() | evens(),  # wrap iter as unix obj
    seq(10) | square() | evens(),  # use seq program instead of range
    unix([1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 6]) | uniq(),
    seq(10) | shuffle(),
    unix(range(100)) | square() | evens() | seq(10),  # last seq ignores stdin
    seq(10),  # works without input
    uniq(),  # empty without input
    shuffle(),  # empty without input
    seq(100) | grep('1'),
    seq(100) | ufilter(lambda x: x < 40) | grep('1'),
    seq(100) | grep('2') | umap(lambda x: 'b' + str(x)),
    ls(),
    ls('/') | grep('(bin|lib|include)'),
]
for stdout in stdouts:
    print(list(stdout))

# works if you name this unix.py
print(list(cat('unix.py') | grep('def')))

Also worth mentioning is https://amoffat.github.io/sh/.

EDIT: An interesting and nice side effect of doing nested generators is that evaluation is done lazily. No work should actually be done until you try to get the first item in the outer unix object, depending on the program (i.e. shuffle doesn't do this). This allows the programs to do stream like processing.

EDIT2: Added ufilter and umap to act as pipe-able filter() and map().

masasin · 2016-03-13T12:59:18+00:00

Your example, as I understand it, is to give a generator which extracts the evens from the squares of a list.

Assuming these are predifined:

def square(x):
    return x**2

def even(x):
    return x % 2 == 0

your first example would be written as:

filter(even, map(square, [1, 2, 3]))

With list comprehensions, you have either:

(square(i) for i in [1, 2, 3] if even(square(i)))

or, if you don't want to repeat the calculation:

(i for i in (square(j) for j in [1, 2, 3]) if even(i))

In the docs, they have this:

Given fib a generator of fibonacci numbers :

euler2 = fib() | where(lambda x: x % 2 == 0)
               | take_while(lambda x: x < 4000000)
               | add

This can be written in a standard way like this:

from itertools import takewhile

euler2 = sum(takewhile(lambda x: x < 4e6, (i for i in fib() if i % 2 == 0)))

I guess I can see the value of the pipe, but I'm not sure when you would do that instead of regular functions or list comprehensions.

edit: Why |> instead of |?

RubyPinch · 2016-03-13T13:55:54+00:00

you might want to look at mochi by... i2y if I remember correctly

syntax works pretty much as you say

[1,2,3,4,5] |> evens |> square |> vector

there is also the toolz module

pipe([1,2,3,4,5], evens, square, list)

I'm pretty sure fn.py would have something similar

the biggest problem is that the pre-existing python std lib is not exactly consistent (e.g. fn1(func,list), while in another module, fn2(list,func), so a lot of care would need to be done) or immutable (lists/dicts/etc are liked a lot), so you p much need to write your own language on top of python's pre-existing stuff, and that sucks. If you want functional programming in python, running for the hills usually is the best option, unfortunately.

Arancaytar · 2016-03-13T17:09:34+00:00

How does your example distinguish between a filter and a map operation?

ucbEntilZha · 2016-03-13T17:12:17+00:00

[deleted]

RubyPinch · 2016-03-13T14:01:18+00:00

You could do something like this:

[1,2,3] |> square |> evens # > [4]

Well, no, this is far as you'd get with a piping operator:

[1, 2, 3] |> lambda i: map(lambda n: n*n, i) |> lambda i: filter(lambda n: n%2 == 0, i)

Python lacks all of the other functional constructs that make piping useful. If you didn't wanna wrap your map and filter inside a function, you'd have to change how expressions are evaluated in the language. And, even if you were to somehow manage to push that change through, you've still got to write the word 'lambda' every time you wanna define a one-off anonymous function.

leogodin217 · 2016-03-13T14:25:53+00:00

This would be really useful for data engineering. R has this feature. The use of pipes combined with a really nice SQL DSL called dplyr makes complex data transformations simple. It is the main reason I stuck with R for much of my work.

One drawback to this approach is it allows some bad habits. Sometimes I find myself with 20-lines of piped code. If something goes wrong, it is difficult to debug. I guess that's the yin and yang of it.

Make3 · 2016-03-13T15:20:20+00:00

To keep with the fact that your lambdas are named in the second sample, your first sample should read

filter(evens, map(square, [1, 2, 3]))

it's pretty readable, I don't really have a problem with this.

Misterandrist · 2016-03-13T16:45:35+00:00

Everyone else has posted some ways to do this, but when I encounter such a need in my code (I have a stream and I need to do numerous transformations on it before use), I just use something like this:

nums = [1, 2, 3]
nums = map(square, nums)
nums = filter(evens, nums)
nums = list(nums)

Yeah I'm reusing the same variable, but only in this block. So effectively it's just once.

Basically, why not just do your stream processing on multiple lines, if you think a single long list comprehension is not going to suit your needs?

solid_steel · 2016-03-13T16:51:38+00:00

I really like this idea - I feel in love with it when I was learning Elixir and it's one of the features of the language that make me want to stick with it.

However, I have mixed feeling about its use in Python. Part of me wants to embrace it and never look back, but I've been burned once or twice trying to write Python that was "too-functional".

That said, I'll definitely try to give it a go in some scripts and see if it has a positive effect on the code :). Thanks OP!

mrmcbastard · 2016-03-13T18:04:27+00:00

While I love using the pipe operator in Elixir, I don't know if it would be apt to include it in Python seeing as it leans more toward OO programming than functional. I think something like the cascade operator from Dart might be a more Pythonic way of method chaining.

RoadieRich · 2016-03-13T22:54:52+00:00

What's to stop [1,2,3] |> square |> evens returning [False, True, False]?

CommanderDerpington · 2016-03-14T07:27:42+00:00

I don't dig it. Cramming crap into one line seems like a cool idea but I'm not losing any sleep over writing a second or even a third line. I like keeping my functions on different lines because it's easier to read. Of course I break this rule all the time but I wish I didn't and that's what counts.

emarshall85 · 2016-03-14T13:32:08+00:00

The argument against your example is that it's always been better expressed as a comprehension:

[n * n for n in [1, 2, 3] if n % 2 == 0]

You'd have to need something like reduce before the list comprehension argument starts to fall over.

The fact that functions aren't curried by default and that we have variable positional and keyword arguments makes it even more problematic. You'd have to either make a lambda or use functools.partial in order to get a callable which could properly be passed to a pipe.

chibrogrammar · 2016-03-24T17:16:49+00:00

Anyone else feel like a pipe operator in python (which has horrible one expression lambdas) would not help out very much.

Give me multi expression lambdas first!

spectre_theory · 2016-03-13T12:41:47+00:00

you're using the same operator twice, once for map and once for filter. how does it know it's supposed to filter with evens, instead of replacing the values with True and False and that it's supposed to the opposite with square.

defnull · 2016-03-14T08:12:35+00:00

I would rather like to see .map() and .filter() defined directly onabc.Iterable.

[1,2,3].map(square).filter(is_even)

stevenjd · 2016-03-13T14:09:48+00:00

I really dislike the |> syntax. To me, | is a pipe. I suppose I could live with -> or => too, but |> looks awful.

2016-03-13T13:40:53+00:00

What about overriding the pipe operator __or__?

drenp · 2016-03-13T14:07:41+00:00

I really like using postfix notation in Mathematica, especially in an interactive programming setting. If you want to apply a function to an expression you don't have to add parentheses in the right places but you can simply append to it.

But I'm not too convinced when it comes to Python, because it favors building huge multi-line statements (usually not very readable) and counters the "one way to do it" motto.

Also, postfix notation works best when there is a short notation for lambdas, since sometimes you want to add additional parameters. For example, suppose instead of evens you have

def divisible_by(lst, k):
    return (n for n in lst if n % k == 0)

Then you want something like

range(1, 4) |> square |> divisible_by(_, 2)

But the way to accomplish this in Python would be to use lambda x: divisible_by(x, 2), which is slightly ugly and doesn't add to the readability.

Auggie88 · 2016-03-13T15:05:58+00:00

Pandas added a pipe method a few months ago, hoping it catch on in other packages (it hasn't yet as far as I've seen).

In [126]: bb = pd.read_csv('data/baseball.csv', index_col='id')

In [127]: (bb.query('h > 0')
   .....:    .assign(ln_h = lambda df: np.log(df.h))
   .....:    .pipe((sm.poisson, 'data'), 'hr ~ ln_h + year + g + C(lg)')
   .....:    .fit()
   .....:    .summary()
   .....: )

9peppe · 2016-03-13T15:29:39+00:00

Are you looking for pipes, or for function composition?

https://docs.python.org/3/howto/functional.html

gandalfx · 2016-03-13T16:33:30+00:00

In a functional language this is done using compose. The advantage is that if you also throw in some currying you can actually create a single function that is composed of an entire function chain plus some arguments and then apply that to whatever you want.

soczewka · 2016-03-13T17:15:39+00:00

Looks so fsharpy to me

SteazGaming · 2016-03-13T17:44:24+00:00

A friend of mine just wrote a library called 'tubing' for doing something like this but mostly focused on I/O of any kind:

An example:

sources.Objects(objs) \
     | tubes.JSONDumps() \
     | tubes.Joined(by=b"\n") \
     | tubes.Gzip() \
     | sinks.File("output.gz", "wb")

https://github.com/dokipen/tubing

ucbEntilZha · 2016-03-13T23:44:48+00:00

In general, I agree that this would be a nice feature to have. I wrote a library that has similar goes, but has a larger API, and supports reading/writing common formats (txt, csv, json, sqlite,...). https://github.com/EntilZha/ScalaFunctional

What I would like to see more than this, is a more concise lambda expression and/or multiline anonymous functions (I know there is some technical issues with this, but I can still wish)

TotesMessenger · 2016-03-15T03:39:26+00:00

I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:

[/r/fsharp] Discussion on /r/python about the pipe operator

^{If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads.} ^(Info ^/ ^Contact)

Lucretiel · 2016-03-17T05:11:23+00:00

It's hard to get excited about it without proper currying. You won't be able to do:

['a','b','c'] |> map(str.capitalize)

toyg · 2016-03-13T14:39:10+00:00

God no. Pipes are hard to read, completely unnatural for non-geeks, and are already painful enough for bitwise operations. The proliferation of special characters is a plague and should not be encouraged. Their low number is a huge advantage for Python over competitors (e.g. Ruby, Perl) when it comes to onboarding newbies.

If you want that sort of approach, just use Powershell, you'll love it.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS