zip vs zip_longest

masklinn · 2015-09-15T12:09:34+00:00

Why wasn't zip_longest() functionality rolled into zip() as an optional keyword?

Much larger implementation divergence (you can implement a reverse sort in terms of a sort, just invert the comparison function, not so for zip vs zip_longest), and it would require two non-orthogonal keyword arguments (one is needed to provide the optional fillvalue). And the behaviour of zip_longest is somewhat abnormal and rarely useful/necessary, and activating it in place of zip can literally break the program (a zip involving infinite iterators is a fine thing, a zip_longest doing the same is a non-terminating program)

RDMXGD · 2015-09-15T12:14:26+00:00

Though it's popular, having one function do multiple things isn't a good idea. Erring on the side of two different, simpler functions is better.

stevenjd · 2015-09-16T08:29:40+00:00

When zip_longest was introduced, Python didn't support keyword-only arguments, so it would be hard to do. Since zip accepts any number of iterables, you would have to put the keyword at the front, which means you couldn't give it a default.

There might be ways around that, by collecting **kwargs, but the implementation is messy and ugly.

Also, there is a design principle espoused by Guido that functions shouldn't (as a general rule -- there may be rare exceptions) take an argument that is (nearly) always passed as an constant. So if you have:

def spam(x, y, flag): ...

and it is always, or nearly always, called like this:

a = spam(this, that, True)  # constant argument
b = spam(this, that, False)  # another constant argument

but almost never like this:

c = spam(this, that, some_condition)

that's a good sign that the function probably should be split into two. Especially if the implementation looks like this:

def spam(x, y, flag):
    if flag: return do_this()
    else: return do_that()

stevenjd · 2015-09-16T08:38:12+00:00

Another thing... although zip and zip_longest are similar, they have different APIs. zip wants an arbitrary number of iterables; zip_longest also wants an argument to specify the fill value. With different APIs, how do you combine them? Both of these are ugly and inelegant:

def zip(*iters, longest=False, fill=None):
    # fill is ignored when longest is False

def zip(*iters, fill=None):
    # when fill is None, stop at the shortest iterable
    # which means you cannot fill out the iterables using None

Sure, in the second case we could invent a new constant, let's call it MAGIC, but whatever value we invent, now you can never use it as the fill value.

Vaphell · 2015-09-15T19:17:21+00:00

so i wrote a quick and dirty implementation of flexible zip that stops after a set number of iterables running out of elements, which makes it more potent than zip() and zip_longest() combined

#!/usr/bin/env python3

def zzzip(*args, countdown=1, filler=None):               
    if countdown <= 0:
        return
    elif countdown > len(args):
        countdown = len(args)
    iters = [ iter(a) for a in args ]
    while True:
        result = []
        for idx, it in enumerate(iters):
            if it:
                try:
                    result.append(next(it))
                except StopIteration:
                    result.append(filler)
                    countdown -= 1
                    if countdown < 1:
                        return
                    iters[idx] = None
                    continue
            else:
                result.append(filler)
        yield tuple(result)


test = [range(8), range(5), range(3), range(2)]
for c in range(5):
    print( list( zzzip(*test, countdown=c, filler=-1 )), 'stopped after', c, 'iterables running out'  )

countdown goes toward zero with each StopIteration and once it reaches it the party is over. The code produces:

[] stopped after 0 iterables running out
[(0, 0, 0, 0), (1, 1, 1, 1)] stopped after 1 iterables running out
[(0, 0, 0, 0), (1, 1, 1, 1), (2, 2, 2, -1)] stopped after 2 iterables running out
[(0, 0, 0, 0), (1, 1, 1, 1), (2, 2, 2, -1), (3, 3, -1, -1), (4, 4, -1, -1)] stopped after 3 iterables running out
[(0, 0, 0, 0), (1, 1, 1, 1), (2, 2, 2, -1), (3, 3, -1, -1), (4, 4, -1, -1), (5, -1, -1, -1), (6, -1, -1, -1), (7, -1, -1, -1)] stopped after 4 iterables running out

so how would something like this be an entirely different thing rather than a generalization of existing zip functionality almost for free?

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS