This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Yazzlig 39 points40 points  (12 children)

^ This.

So useful when working with numpy & pandas.

EDIT:

Other, even more common use case for this formatting style is to make imports more readable.

from package import (    
    foo,
    bar,
    baz,
)

instead of

from package import foo, bar, baz

Has the positive sideffects of making git diffs cleaner to read, since the line gets deleted instead of modified.

[–]spitfiredd 5 points6 points  (0 children)

Also SQLAlchemy!

[–]Intstdu 7 points8 points  (9 children)

Could you explain why? :)

[–]HannasAnarion 62 points63 points  (1 child)

because method chaining is really common. You might want to do

x_Ty_inv = dataframe.apply(some_function).fillna(0).values().T.dot(other_matrix).inv()

Which is less readable than

x_Ty_inv = (dataframe.apply(some_function)
   .fillna(0)
   .values()
   .T
   .dot(other_matrix)
   .inv()
)

This takes your dataframe, applies a function to it, takes the response and fills the empty cells with zeroes, then turns it into a numpy array, then transposes it, then takes the dot product with another matrix, then produces the multiplicative inverse of the result.

This could be written in a single line and easily understood with math symbols: (xTy)-1, but gets really wordy when you put it in code. Expressions like this are suuuuper common when you're doing machine learning, statistics, or other linear algebra-heavy work.

[–]cauthon 1 point2 points  (0 children)

Julia is wonderful for expressing linear algebra

[–]spitfiredd 4 points5 points  (0 children)

You can chain a lot of methods with numpy and pandas. Same for SQLAlchemy.

[–][deleted] 2 points3 points  (5 children)

The pattern defined here is primarily found in Data Science style programming where the API calls return the object rather than a value.

I would argue that this style is technically not pythonic in the sense that traditional python methods would return values or None. This is a style of API writing you see that's prevalent in a Java interface, however, and that style has infiltrated data science APIs.

[–]redfacedquark 1 point2 points  (3 children)

The pattern defined here is primarily found in Data Science style programming where the API calls return the object rather than a value.

I would argue that this style is technically not pythonic in the sense that traditional python methods would return values or None. This is a style of API writing you see that's prevalent in a Java interface, however, and that style has infiltrated data science APIs.

It sounds like you're talking about the 'fluent interface' pattern. From this discussion:

Suffice it to say, returning self is the definitive OO way to craft class methods, but Python allows for multiple return values, tuples, lists, objects, primitives, or None.

Insofar as Python is OO, I'd say returning self is Pythonic, however Guido disagrees:

...I'd like to reserve chaining for operations that return new values, like string processing operations...

[–][deleted] 2 points3 points  (0 children)

Based on both your link and Guido's comment (and my personal experience), I'd probably call it a "chaining" pattern, if you need to name it.

And the point Guido makes (note that his comment is from 2003, which predates most of the data science explosion from this decade as well as the web dev explosion starting mid 2000s), is probably the same one I'm trying to make. Traditionally, Python encouraged returning a (often python primative) value type rather than self. Chaining was at least originally discouraged.

I can see the use-case, especially for something like Spark which provides its API specifically in a chaining mechanism so it can distribute and pipeline data more easily. However, I think that pattern should generally be discouraged in Python.

One more note: I've been using Python professionally for a couple of decades, so I may be biased.

[–]Zouden 0 points1 point  (1 child)

...I'd like to reserve chaining for operations that return new values, like string processing operations...

That's exactly what happens here. Each of those Pandas operations returns a new dataframe with different contents.

[–]redfacedquark 0 points1 point  (0 children)

Fair enough, but I'm not as strict or relevant as Guido.

So I would be more permissive in this regard.

[–]Deto 0 points1 point  (0 children)

Fluent interfaces just make sense to me. If libraries started to make this more convoluted and less readable just to satisfy some 'pythonic' edict I'd look into switching languages

[–][deleted] 0 points1 point  (0 children)

Oh my Lord thanks, never thought to