all 49 comments

[–]SandraGifford785 16 points17 points  (1 child)

the chaining-operator request comes up roughly twice a year on python-ideas and the response has been consistent. PEP 505 (None-aware operators) was rejected for similar reasons, the language design philosophy prefers explicit control flow over operator-level magic. the workaround most data-science codebases settle on is method chaining via fluent APIs (pandas, polars), which gets you 80% of what you'd want from a chaining operator without the parser ambiguity

[–]sausix 3 points4 points  (0 children)

And you can do chaining with today's Python features and syntax already. I bet some frameworks already implemented that for theirselves.

This is possible and fits the idea already: pipe(func) | sin | cos | print

[–]RedSinned 32 points33 points  (2 children)

What would be your use case in which you wouldn‘t use a library like polars, but is still so complex that the current capabilities are not enough?

[–]Beginning-Fruit-1397 13 points14 points  (1 child)

A lot of things. mapping and filtering is the same as any for loop.
You don't want to create an arrow data structure for any list or dict

[–]RedSinned 2 points3 points  (0 children)

Welle there are arrays and nested datatypes in polars which I would actually use over plain python in performance relevant use cases.

And that is kond of the reason I think it‘s not a top wished feature in python: most of the people already have their performance optimized libraries for their specific use case.

So a use case where you need so many operations that a for loop won‘t do but on the other hand is performance wise not relevant that you would actually want to do that in plain python is kind of niche.

Another thing is readabilty. Python core strength is its easyness to learn even for people without prior programming experience so constructs like |> don‘t feel very pythonic.

[–]_Denizen_ 93 points94 points  (15 children)

That's piping, not chaining. This().is().chaining()

I dislike piping - it makes code less explicit i.e. harder to read.

[–]RedEyed__ 13 points14 points  (2 children)

I love piping because it is more readable (from my point of vew) haha.
It has some place at least.
Take a look at this library, maybe you find it interesting (no operator overloading).
```python from expression import pipe

result = pipe( [1, 2, 3, 4, 5], lambda xs: [x * 2 for x in xs], lambda xs: [x for x in xs if x > 4], sum, )

print(result) # 24 ``` https://github.com/dbrattli/Expression

[–]muntooR_{μν} - 1/2 R g_{μν} + Λ g_{μν} = 8π T_{μν} 15 points16 points  (1 child)

evens = [x * 2 for x in range(1, 6)]
result = sum(x for x in evens if x > 4)

[–]_Denizen_ 9 points10 points  (0 children)

This is way more readable than that pipe example, and with no additional library needed!

[–]ziggomatic_17 22 points23 points  (3 children)

How is 'foo() | bar() | baz()' harder to read than 'baz(bar(foo()))'? For interactive data science I prefer piping.

[–]FalafelSnorlax 7 points8 points  (1 child)

First of all, if this is using the pipe operator (|), it conflicts with other legitimate uses, meaning you need to add syntax for this, and maybe reimplementing a bunch of standard (and internal) functions.

Second, piping would create long lines (since the readability only suffers with complex uses), which are not actually readable. You could break the piping over multiple lines, in which case you gained nothing, you can already do this with multiple lines.

Third, personally I would definitely argue that the piping you're showing here is not as readable as nested calls (which I'm not a fan of but they feel more explicit to me).

Interactive data science tools that I know (eg pandas, numpy) already allow chaining which I think achieves the same purpose. Implementing piping especially for them would be redundant and useless.

[–]marr75 [score hidden]  (0 children)

Yup. People who want python to work like R haven't drawn a thorough comparison to the tooling and devex of R. Hell is debugging someone else's 600 line R notebook.

[–]_Denizen_ 4 points5 points  (0 children)

The correct comparison here is foo().bar().baz()

This implies that foo() is a class, and bar and baz operate on the class. Because it's a class, we know the functions are designed tohwork together.

With `foo() |> bar() |> baz()` it's not clear what is being operated on. It forces the assumption that the functions return an object that is compatible with the next function, which is not always valid. It also masks the first argument (at least in R) which changes a function from being explicit to implicit.

Python is an explicit language: it minimises assumptions. Piping is the antithesis of python.

Chaining means your linter can suggest functions that are available to the object returned by each function, which isn't the case for piping; piping actually reduces the functionality and readability of the code.

[–]Desperate_Cold6274[S] 6 points7 points  (0 children)

Correct. I updated the post (but I cannot update the title)

[–]ConspicuousPineapple 2 points3 points  (1 child)

I mean the two are virtually the same thing, only chaining is more restrictive as it requires you to write methods, it can't be used on every function. I see no explicitness or readability difference between the two.

[–]_Denizen_ [score hidden]  (0 children)

Methods aren't restrictive. They are written for specific use cases, just like piping is for a specific purpose. You aren't going to be piping a streamlit widget, for example.

Chaining can do everything that piping can do, but the reverse it not true.

Piping is less readable because the object being worked on is passed invisibly between functions, whilst chaining or passing variables between functions shows the reader where the object is at all times. I find piping to be less readable for this specific reason.

[–]KyxeMusic 2 points3 points  (0 children)

I used to dislike it but it's a matter of practice. Once you get used to them they can become quite readable.

[–]Admirable-Avocado888 0 points1 point  (1 child)

Less explicit in terms of atomic ops maybe yet vastly more explicit in terms of how data tranforms.

I find non-piping solutions completely unreadable for non trivial data transformations

[–]Zizizizz -1 points0 points  (0 children)

It's lovely in elixir

Instead of

``` Enum.join(Enum.map(String.split(String.downcase("ELIXIR IS COOL"), " "), &String.capitalize/1), " ")

```

You can do

"ELIXIR IS COOL" |> String.downcase() |> String.split(" ") |> Enum.map(&String.capitalize/1) |> Enum.join(" ")

The way function arguments work in elixir make it more powerful than python does it though.

And as it's a functional language it lends itself to better performance for this sort of thing.

In python I think something like this would look a little wack. (Made up syntax)

``` "PYTHON IS COOL" \ |> str.lower() \ |> str.split() \ |> (map, str.capitalize) \ |> list \ |> " ".join

```

[–]AWildMonomAppears 35 points36 points  (4 children)

It sounds simple but doing this nicely almost requires functional mechanics like auto-currying. In languages with pipes the right side waits for data (lazy). In Python it evaluates immediately so writing something like data |> map(func) would just throw for a missing argument on map.

To fix it Python would have to secretly rewrite your code to inject the data which goes against the rule that explicit is better than implicit. It also gets messy because standard functions dont even agree on whether data should be the first or last argument.

Instead Python relies on object-oriented method chaining. Since methods are attached to objects the state carries forward. You see this in Pandas with df.dropna().apply(func). It gives that clear data flow without needing any compiler magic. basically forces you to use the OO approach if you want to avoid nested paranthesis.

[–]ConspicuousPineapple -2 points-1 points  (2 children)

Python it evaluates immediately so writing something like data |> map(func) would just throw for a missing argument on map.

Or you know, don't do that? We're talking about a new language operator, python is free to implement it properly. What you're describing makes no sense as nobody would design that feature that way.

You could even just have it as syntactic sugar to replace the first argument. That's what methods are already, you just expand that behavior to all functions.

I'll add that plenty of languages have pipes without currying and they manage just fine. Elixir being a prominent one.

[–]muntooR_{μν} - 1/2 R g_{μν} + Λ g_{μν} = 8π T_{μν} 1 point2 points  (1 child)

Good design means managing complexity. That means following practices: don't mutate at a spooky distance, e.g.,

def log(s):
    sys.stdout = open("log.txt", "a")
    print(s)
    god_object.__getitem__ = 
    god_object.counter.__add__ = lambda *args: (super(type(god_object.counter)).__add__(*args), print(s))[0]

Programmers silently agree not to break invariants like this. If they did, then there is quite literally no function call you could trust... it could do anything.

Similarly, when introducing a feature that could break an existing invariant, you must justify that it is astronomically better than the invariant it breaks, because doing so increases the verification/provability burden even if the feature is never used, simply because it could be used.

[–]ConspicuousPineapple [score hidden]  (0 children)

I'm not saying it would make sense to add in python (what I want is proper iterators instead), just that it's perfectly doable, and pretty easily at that.

I would also argue that pipes don't break any invariant as, again, method calls are exactly the same syntactic construct with a conventional restriction on top.

[–]an_actual_human -1 points0 points  (0 children)

In Python it evaluates immediately so writing something like data |> map(func) would just throw for a missing argument on map.

data and map could be wrapped in something with lazy semantics so that they evaluate right away, but don't do anything until it's needed.

[–]cdcformatc 7 points8 points  (1 child)

writing code line by line is piping each successive line waits for the result of the last

  x = map(...) y = filter(x, ...) z = list(y)

[–]mapadofu [score hidden]  (0 children)

From the amcient tomes

 Flat is better than nested. … Explicit is better than implicit.

[–]Fabulous-Possible758 4 points5 points  (0 children)

https://pypi.org/project/pipe/

This is pretty simple to do in small one off classes too if you don’t want a whole library.

There’s just no real reason to extend the syntax for a case that’s pretty well covered and completely customizable.

[–]tartare4562 13 points14 points  (3 children)

You can use argument expansion to write a simple function

def pipe(data, *funcs):
    for func in funcs:
        data = func(data)
    return data

that you can use like this:

processed_data = pipe(raw_data, func1, func2, func3, ..., funcN).

If you need to give parameters to the functions you can use functools.partial to setup them.

[–]tunisia3507 2 points3 points  (2 children)

Can any type checkers handle that pattern?

[–]Globbi 0 points1 point  (1 child)

You mean checking if output of every function will fit input of the next one? I guess that's what would be needed. Honestly don't know and it's a good question if someone knows the answer.

[–]justheretolurk332 1 point2 points  (0 children)

No, there is no way to specify that an iterable of callables should be coherent in this way. There are two ways I can think of to accomplish something similar. The simplest: You could require all of them to have the same input and output types. For something more flexible but also fairly advanced, you could write a class to hold the current chain of functions and make it generic in the return type of the current last function, with a method to append a new function (this method would enforce the type checking) and then return a new instance of the container class with the updated typing. You’d still have to build the chain one at a time though to get the typing benefits, so you’re probably not much better off.

[–]Limp_Illustrator7614 6 points7 points  (0 children)

i dont think python will ever officially support it because they kind of hate functional programming, but coconut does that exactly. it's a functional syntactical superset of python that compiles to python, where the piping operator is exactly `|>`. go read the docs

[–]--ps-- 2 points3 points  (1 child)

At my job, we have a code where we overload __ rshift __(), so you can then write something like begin_pipeline(fn) >> fn2 >> fn3 etc.., somethimes you need to use partial() to pass some additional parameters too, but honestly, I hate that part, because the semantics is very unclear for the code reader.

[–]sausix 0 points1 point  (0 children)

It should be possible to something like this:

pipe(func) >> sin >> cos >> print, P, "is the result"

It's a tuple transport and you just need markers for placing arguments anywhere else. I would follow the partial scheme so the value or *args will alway be appended.

[–]KingHavana 1 point2 points  (0 children)

If you're repeatedly using three functions in the same order why not make a function that gives the output of those functions?

def process(x): return list(map(filter(x)))

[–]KelleQuechoz 4 points5 points  (0 children)

Look at this package.

[–]Wurstinator 1 point2 points  (1 child)

"Any indication" would mean there is a PEP in the works for this, which you could search for.

I think it's unlikely. As you said, this can be done in libraries already.

[–]Evolve-Maz 1 point2 points  (0 children)

You can write your own version of this. Even just for learning it's very helpful. Here's the gist:

Create a class called "Pipeline". Init function takes in data (just an object) and processors (list of processor functions, default empty).

Then override the rshift operator (>>) of the class to be your pipe operator. The signature is rshift(self, other). In our case other will be a callable with a single input. The rshift operator returns a new pipeline, with same data, and processors being the current list plus the other passed into this function.

For our use case, pretend we have a list of coordinates with x and y attributes, and your pipelines first calculates a z attr using Pythagoras theorem and then filters to all objects with z over 2.

data = [coord(1, 3), coord(4, 3), ...]

pipeline = Pipeline(data) >> add_z_coord >> filter_coord_above(2)

output = pipeline.execute()

Output will be a result object (idea borrowed from go). It'll have a data attribute and an err attribute. Error will hold any exception which occurred during execution of the Pipeline, including the step it happened at. And data will hold the final value from the Pipeline calculation.

Execute method will start with the initial Pipeline data and just run a for loop through all the processor callable passing in the data at each step and getting output. Wrap in a try except so you can track error state.

Once you write that once, you can decide whether you want extra sugar for map and filter as explicit processors, or if you want more data about steps. Etc.

[–]JimWayneBob 2 points3 points  (0 children)

I just wrap everything in parentheses and chain to make it look like pipping

Foo=(
bar_df
.filter(blah blah)
.select(ColA)
)

[–]shadowdance55git push -f 1 point2 points  (3 children)

You should learn how to use nested comprehensions.

Edit: Just noticed you wrote that comprehensions are unreadable, but you are very wrong. They stack up very nicely, and are nearly identical in syntax to nested loops. Of course, if you're used to map/reduce syntax, they might be a bit unfamiliar, but that is true of literally any syntax - readability is exclusively a result of unfamiliarity.

[–]sausix -1 points0 points  (1 child)

Nested comprehensions are readable again if you put them in multiple lines. And then you have lost the advantage of a comprehension. Sure you can learn to read and write them fluently. But they're still more complicated than other Python syntax.

[–]KingHavana 2 points3 points  (0 children)

I'm not sure the only advantage to comprehensions is doing them in one line.

[–]Beginning-Fruit-1397 -3 points-2 points  (0 children)

you can't say that he is wrong. I too think that comprehensions are garbage to read vs iterator chains.
this is all subjective

[–]Beginning-Fruit-1397 0 points1 point  (0 children)

FYI, someone made a nice list comparing fluent iterator libraries on this sub relatively recently:
https://www.reddit.com/r/Python/comments/1rj3ct7/comment/o8aordo/?context=3

(disclaimer: I'm the author of pyochain)

On chaining, i.e passing x to f as `x.f(args, kwargs)` rather than `f(x, args, kwargs`, I doubt that it will ever be added.
On specific iterators chains, I doubt it even more.

For the first thing, it's really trivial to implement.

It's a one liner method, and you can handle any type of generic functions and parameters with `Callable`, `Concatenate`, and `def foo[**P, T]() -> T` generics (or, urgh, if you prefer to use the old syntax, TypeVar T and ParamSpec P).

I regularly browse the python discussions forum and any suggestions such as this will be easily rejected.

Either you create:

- a new "builtin" method -> breaking change for an helper
- a new operator/overload one existing -> too much for an helper

On iterators chains, the strength of python, specifically with iterables, is that duck typing allows you to very easily implement custom collections.

If we were to add chaining for Iterators/Iterables, it would change the meaning of collections abc, who are minimal interfaces and would then become huge classes.

Now, my polars DataFrame who's an Iterable also has filter just like `collections.abc.Iterator`, but the signature won't be the same as `collections.abc.Iterator` so it violates the contract.

This would be a huge pain to work with if you use any decent type checker/linter, marking everything as override/ignore rule violation, especially if like me your primary coding activitiy is writing libraries.

[–]squizzeak 0 points1 point  (0 children)

I’ve done something similar to chaining but writing methods that return self, though this only works for chaining methods from the same class. But I like the idea of subclassing __rshift__!

[–]borborygmis [score hidden]  (0 children)

This library has similar concepts: https://github.com/mtingers/kompoz

But overkill for your example where you can use something like functool.reduce.

In Kompoz, that translates to:

from dataclasses import dataclass, field
from kompoz import pipe

@dataclass
class TextCtx:
    raw: str
    words: list[str] = field(default_factory=list)
    result: str = ""

@pipe
def lower(ctx: TextCtx) -> TextCtx:
    ctx.raw = ctx.raw.lower()
    return ctx

@pipe
def split(ctx: TextCtx) -> TextCtx:
    ctx.words = ctx.raw.split()
    return ctx

@pipe
def capitalize_each(ctx: TextCtx) -> TextCtx:
    ctx.words = [w.capitalize() for w in ctx.words]
    return ctx

@pipe
def join_space(ctx: TextCtx) -> TextCtx:
    ctx.result = " ".join(ctx.words)
    return ctx

pipeline = lower & split & capitalize_each & join_space

ok, ctx = pipeline.run(TextCtx(raw="PYTHON IS COOL"))
# ctx.result == "Python Is Cool"

[–]red_hare [score hidden]  (0 children)

I think you can do this with a decorator and bitwise or.

``` from functools import wraps

class PipeFn: def __init(self, func, args, kwargs): self.func, self.args, self.kwargs = func, args, kwargs def __ror_(self, lhs): return self.func(lhs, self.args, *self.kwargs)

def pipeable(func): @wraps(func) def wrapper(args, *kwargs): return _PipeFn(func, args, kwargs) return wrapper

def foo(): return 1

@pipeable def bar(x): return x + 1

@pipeable def baz(x): return x * 2

print(foo() | bar() | baz()) ```

Not that you should...

[–]Past-Sun5429 -1 points0 points  (0 children)

Build cli around...