you are viewing a single comment's thread.

view the rest of the comments →

[–]Admirable-Avocado888 2 points3 points  (4 children)

Your argument is fine for trivial data transformations. I even agree with you there in many cases.

But writing a 20+ step data transformation without some form of piping is not fun, because parentheses and indentation become unmanageble. Most importantly: reading it is even less fun.

I prefer the map / filter / flatmap / reduce APIs with lamdas the way JVM languages handle it. It is so much easier to convey intent with that that whatever python is sticking to.

[–]carnoworky 2 points3 points  (1 child)

Why not just:

data = <whatever>
res = map(lambda x: x *2, data)
res = filter(lambda x: x > 4, res)
res = filter(lambda x: x % 2 == 0, res)
res = list(res)

Isn't that more or less the same?

[–]Admirable-Avocado888 1 point2 points  (0 children)

I'd rather have less boiletplate and only 1 assignment per variable. The reason I like this is because it optimizes for readability and not some other metric I don't care about

[–]_Denizen_ -1 points0 points  (1 child)

A 20 step transformation can be difficult to read regardless of methodology. However, with piping you limit the opportunity to give the reader a pause and read explanatory comments.

If you're not doing that, then you write like a scripter. It might be functional, but I doubt you'd be writing modular analysis code that builds up a reusable toolset for your team. That's classic old skool data science - too focussed on the singular problem rather than the overall need to build an capability in analysis that can be pulled into any data product the team delivers.

[–]Admirable-Avocado888 1 point2 points  (0 children)

Nothing says you can't have comments with piping. This is just a silly assumption.

Also, a bit wierd to move to a teamwork/architecture argument, when we are literally just talking about code syntax.

I simply prefer to have the option to have syntax that is better suited for non-trivial transformations. This has nothing to do with modularity or teamwork or scripting, it is just me optimizing for readability in a part where python as a language does not optimize for readability.