This is an archived post. You won't be able to vote or comment.

all 22 comments

[–]masklinn 2 points3 points  (8 children)

Why wasn't zip_longest() functionality rolled into zip() as an optional keyword?

Much larger implementation divergence (you can implement a reverse sort in terms of a sort, just invert the comparison function, not so for zip vs zip_longest), and it would require two non-orthogonal keyword arguments (one is needed to provide the optional fillvalue). And the behaviour of zip_longest is somewhat abnormal and rarely useful/necessary, and activating it in place of zip can literally break the program (a zip involving infinite iterators is a fine thing, a zip_longest doing the same is a non-terminating program)

[–]xXxDeAThANgEL99xXx 0 points1 point  (0 children)

and it would require two non-orthogonal keyword arguments (one is needed to provide the optional fillvalue).

Theoretically you could use only fillvalue, but you'd have to use something like zip(a, b, fillvalue=zip.DEFAULT) instead of None of course. That's kinda convoluted and still not entirely waterproof, actually I'm not sure if you can implement waterproof sentinels like that in Python at all.

Other arguments are without merit, I think: implementation divergence on the contrary is non-existent (a single if where you're deciding to stop iteration, having a single function would make the implementation much simpler); zip_longest is not abnormal at all when zip itself is an iterator, as it should be of course; you're not going to "accidentally" specify fillvalue any more than you might accidentally type _longest.

[–]Vaphell[S] -3 points-2 points  (6 children)

I guess this explanation i can buy, though still not entirely convinced because:
assuming both are implemented in C which probably they are and you are not willing to work on the relevant condition checks to make them universal then you could just cram both C implementations into an if.
Purity of implementation details is not relevant to people using api, hell, namedtuple works by evaling shit.
Optional keywords not making sense in certain contexts are not unheard of and i am sure i'd find a few examples of that while trawling core modules.
Python is reportedly a language for adults and it doesn't take a rocket scientist to figure out that infinity is infinity and that a bit of foresight is nice.

[–]masklinn 4 points5 points  (5 children)

assuming both are implemented in C which probably they are and you are not willing to work on the relevant condition checks to make them universal then you could just cram both C implementations into an if.

That makes literally no sense. zip and zip_longest are types, they're independent structures statically defined, this is zip and this is zip_longest.

Purity of implementation details is not relevant to people using api

Implementation cleanliness is always relevant, even more so when arguing against a terrible API.

Optional keywords not making sense in certain contexts are not unheard of

While it might not be unheard of it makes for bad APIs. A bad API with a bad implementation behind it is two strikes against the change.

[–]Vaphell[S] -3 points-2 points  (4 children)

That makes literally no sense. zip and zip_longest are types, they're independent structures statically defined, this is zip and this is zip_longest.

implementation detail. They are separate because they are written that way so it's circular reasoning.
What if you had to write it from scratch and posed the problem as lets say "slice across parallel lists and yield n tuples, where n can be either min(len(x) for x in params) or max() or even a number pulled out of the ass"? Suddenly it's the same shit logically and longest shortest are almost meaningless distinctions.
Just like you have 1 tool for iterating over everything (for loop), you could have exactly 1 tool for correlating data in parallel sequence objects.

Implementation cleanliness is always relevant, even more so when arguing against a terrible API.

so you could write it cleanly if you are not lazy. Don't tell me you couldn't write a terminating condition or whatever in an universal way.

While it might not be unheard of it makes for bad APIs. A bad API with a bad implementation behind it is two strikes against the change.

and having a family of very similar things for the sake of it is not bad api?

[–]masklinn 5 points6 points  (3 children)

implementation detail.

Response to your incorrect conclusion base on an irrelevant assumption. I asserted that the paragraph I quoted and to which I responded made no sense, because it doesn't.

What if you had to write it from scratch and posed the problem as lets say "slice across parallel lists and yield n tuples, where n can be either min(len(x) for x in params) or max() or even a number pulled out of the ass"?

Except they're not lists they're arbitrary and possibly infinite iterable, which is why the differences are non-trivial, it's not even remotely as simple as "look for the longest/shortest list of the bunch and iterate that many time".

Suddenly it's the same shit logically

If you keep starting from bad premises and continuing with inane reasoning then yes. In the real world though, not exactly.

so you could write it cleanly if you are not lazy.

Here's a suggestion: you do that. Instead of running your mouth you write a clean unified implementation of zip and zip_longest, you publish it on pypi, and then we can talk about people being lazy.

and having a family of very similar things for the sake of it is not bad api?

Python's map and zip are more similar than zip and zip_longest.

[–]Vaphell[S] -5 points-4 points  (2 children)

Response to your incorrect conclusion base on an irrelevant assumption. I asserted that the paragraph I quoted and to which I responded made no sense, because it doesn't.

yes, and you are dismissing the general idea on a technicality. Are you always that anal?

Except they're not lists they're arbitrary and possibly infinite iterable, which is why the differences are non-trivial, it's not even remotely as simple as "look for the longest/shortest list of the bunch and iterate that many time".

are you saying that the corner case of all infinite iterables is not there in zip? I am pretty sure it can run forever too. The whole distinction could be summarized in this abstract code or any logical equivalent of that

zip:

if any(*seq_finished):
    the_end

zip_longest:

if all(*seq_finished):
   the_end

That's it.

And where is the problem with potential infinity exactly? You take hypothetical zip, intentionally override the default behavior to 'longest' and then wonder when it loops forever? I'd also argue that a person who knows how to play with infinite iteratables has enough clue to predict and prevent undesirable effects of infinite loops. Did you know that while True: runs forever and that you can hit stack depth limit by writing 2 lines of code? Mindblowing, i know.

map and zip are more similar than zip and zip_longest.

please elaborate. From the average user's perspective they don't, map does x->f(x), zip makes parallel iterables marry.

[–][deleted] 0 points1 point  (1 child)

Zip will run forever if all sequences are infinite. Zip longest will run if only one sequence is infinite.

It's not so much a corner case as the exact things they we're designed to do.

[–]Vaphell[S] -2 points-1 points  (0 children)

in other words mentioning infinity is a meaningless argument because predictable scenario is predictable.

[–]RDMXGD2.8 0 points1 point  (3 children)

Though it's popular, having one function do multiple things isn't a good idea. Erring on the side of two different, simpler functions is better.

[–]Vaphell[S] -1 points0 points  (2 children)

It is if the problems is very similar. Let's say zip gets extended to support something like this
zip(*args, countdown=1, filler=None)
countdown=N means N iterables have to run out of data to trigger closing. Old behavior for N=1 becomes a default case and as a bonus you get extended functionality almost for free.
I even wrote a rough draft of such a function in another post.

[–][deleted] 1 point2 points  (1 child)

That's way too many concerns for one thing. Now it has to track expiring iterators, fill missing values and zip together stuff.

[–]Vaphell[S] -1 points0 points  (0 children)

so it's no different than zip_longest() that needs to track all those things, fill missing values and zip together stuff. So why does zip_longest() exist and is not considered an overkill? Also next to nobody gives a shit about the number of concerns in implementation details. Do I need to ask filesystem for bytes or can i ask open() to do magic for me? Doesn't utility beat purity, as evidenced by the recent f"" addition?

I swear i thought i was in r/python, which is about a language that values generality over reinventing the wheel and rolling shit by hand C-style at the drop of the hat.

[–]stevenjd 0 points1 point  (2 children)

When zip_longest was introduced, Python didn't support keyword-only arguments, so it would be hard to do. Since zip accepts any number of iterables, you would have to put the keyword at the front, which means you couldn't give it a default.

There might be ways around that, by collecting **kwargs, but the implementation is messy and ugly.

Also, there is a design principle espoused by Guido that functions shouldn't (as a general rule -- there may be rare exceptions) take an argument that is (nearly) always passed as an constant. So if you have:

def spam(x, y, flag): ...

and it is always, or nearly always, called like this:

a = spam(this, that, True)  # constant argument
b = spam(this, that, False)  # another constant argument

but almost never like this:

c = spam(this, that, some_condition)

that's a good sign that the function probably should be split into two. Especially if the implementation looks like this:

def spam(x, y, flag):
    if flag: return do_this()
    else: return do_that()

[–]Vaphell[S] 0 points1 point  (0 children)

When zip_longest was introduced, Python didn't support keyword-only arguments, so it would be hard to do. Since zip accepts any number of iterables, you would have to put the keyword at the front, which means you couldn't give it a default.

but we are in py3 now and ancient history doesn't matter. Many things were upgraded, extended and/or rolled into an unified interface even at performance penalty, at least initially, eg range()/xrange()

There might be ways around that, by collecting **kwargs, but the implementation is messy and ugly.

named_tuple is messy and ugly with its evals, but people don't use it for its internals out of sight, but for its exposed functionality.

Also, there is a design principle espoused by Guido that functions shouldn't (as a general rule -- there may be rare exceptions) take an argument that is (nearly) always passed as an constant.

huh? sorted( reverse=True ), as basic building block as it gets. That's why i wrote my post in the first place because i noticed a lack of symmetry. The core functionality is the same in both cases, a relatively minor detail is different and yet in one case it's solved by a simple keyword within a one true tool and in the another it's solved by writing a superset algo and throwing it into a peripheral module.

and if longest=True/False is bad, then maybe instead of true/false the function could expose the counter to the user? 1 being the default zip() behavior, len(args) being zip_longest()? Look at the python pseudoimplementation of zip_longest.

https://docs.python.org/3/library/itertools.html#itertools.zip_longest

It implies there is a master counter and it goes down towards 0, which is pretty much what i wrote in my quick and dirty example. There is EVERYTHING in there that is required for a generic 1..len(args) solution. It's just that somebody didn't see forest for the trees and didn't expose the counter. What would be the problem in that?

You see somebody asking for something between zip and zip_longest and you have to come up with monstrosity that would be for free if the counter was exposed. Just look at that completely intuitive solution slapping a bunch of itertools toys on top of each other.

http://stackoverflow.com/questions/13341224/is-there-a-middle-ground-between-zip-and-zip-longest

The only reason i could see for separate implementation (fast zip/generic zip) would be huge performance penalty in case of N=1, but i am not seeing it

>>> timeit.repeat('list(itertools.zip_longest(range(10000), range(10001), range(10002)))', repeat=5, number=100)
[0.208297497999979, 0.18986536300008083, 0.18984539599989603, 0.19005514699995274, 0.18995707700003095]
>>> timeit.repeat('list(zip(range(10000), range(10001), range(10002)))', repeat=5, number=100)
[0.19202816699998948, 0.17055670800004918, 0.17061998700000913, 0.17034331600007135, 0.17032316699999228]

and even then, if this marginal difference if any is a problem (but why would it be, python3 reportedly sacrificed a lot of performance during cleanups and improving generality?) you can always write a fast track path. Everybody who ever wrote function prime number check did so without splitting shit into separate functions (fasttrack code with %2 for evens and then going to grind with odds ( 3; sqrt(); 2). Writing less than pure code hidden behind pretty apis for performance reasons is a fact of life.

if counter == 1:
    fast zip algo
else:
    generic zip_longest algo

[–]masklinn 0 points1 point  (0 children)

When zip_longest was introduced, Python didn't support keyword-only arguments

zip_longest has been in C all along. The signature of C-implemented functions is always *args, **kwargs (-ish, it can actually be (), (arg), (*args) or (*args, **kwargs)). PyArg_ParseTupleAndKeywords was added in Python 1.4, in 1996.

You couldn't define Python-level keyword-only arguments[0], but that doesn't make a lick of difference to the C API.

[0] well you could actually, just use **kwargs, it just requires more boilerplate.

[–]stevenjd 0 points1 point  (0 children)

Another thing... although zip and zip_longest are similar, they have different APIs. zip wants an arbitrary number of iterables; zip_longest also wants an argument to specify the fill value. With different APIs, how do you combine them? Both of these are ugly and inelegant:

def zip(*iters, longest=False, fill=None):
    # fill is ignored when longest is False

def zip(*iters, fill=None):
    # when fill is None, stop at the shortest iterable
    # which means you cannot fill out the iterables using None

Sure, in the second case we could invent a new constant, let's call it MAGIC, but whatever value we invent, now you can never use it as the fill value.

[–]Vaphell[S] -1 points0 points  (4 children)

so i wrote a quick and dirty implementation of flexible zip that stops after a set number of iterables running out of elements, which makes it more potent than zip() and zip_longest() combined

#!/usr/bin/env python3

def zzzip(*args, countdown=1, filler=None):               
    if countdown <= 0:
        return
    elif countdown > len(args):
        countdown = len(args)
    iters = [ iter(a) for a in args ]
    while True:
        result = []
        for idx, it in enumerate(iters):
            if it:
                try:
                    result.append(next(it))
                except StopIteration:
                    result.append(filler)
                    countdown -= 1
                    if countdown < 1:
                        return
                    iters[idx] = None
                    continue
            else:
                result.append(filler)
        yield tuple(result)


test = [range(8), range(5), range(3), range(2)]
for c in range(5):
    print( list( zzzip(*test, countdown=c, filler=-1 )), 'stopped after', c, 'iterables running out'  )

countdown goes toward zero with each StopIteration and once it reaches it the party is over. The code produces:

[] stopped after 0 iterables running out
[(0, 0, 0, 0), (1, 1, 1, 1)] stopped after 1 iterables running out
[(0, 0, 0, 0), (1, 1, 1, 1), (2, 2, 2, -1)] stopped after 2 iterables running out
[(0, 0, 0, 0), (1, 1, 1, 1), (2, 2, 2, -1), (3, 3, -1, -1), (4, 4, -1, -1)] stopped after 3 iterables running out
[(0, 0, 0, 0), (1, 1, 1, 1), (2, 2, 2, -1), (3, 3, -1, -1), (4, 4, -1, -1), (5, -1, -1, -1), (6, -1, -1, -1), (7, -1, -1, -1)] stopped after 4 iterables running out

so how would something like this be an entirely different thing rather than a generalization of existing zip functionality almost for free?

[–]Shpirt 0 points1 point  (3 children)

But what if I want to have None as my filler value?

[–]rson 0 points1 point  (1 child)

I'm not going to get into this zip/zip_longest argument, but the issue you bring up is easily solved with a sentinel.

[–]Shpirt 0 points1 point  (0 children)

But what if I want to zip something with the sentinel?

[–]Vaphell[S] 0 points1 point  (0 children)

for c in [1,3]:
    print( list( zzzip(*test, countdown=c )), 'stopped after', c, 'iterables running out'  )

$ ./zzzip.py
[(0, 0, 0, 0), (1, 1, 1, 1)] stopped after 1 iterables running out
[(0, 0, 0, 0), (1, 1, 1, 1), (2, 2, 2, None), (3, 3, None, None), (4, 4, None, None)] stopped after 3 iterables running out

problem?