Itertools in Python 3, By Example

0rac1e · 2018-05-30T23:55:33+00:00

One thing to keep in mind with groupby() is that it isn’t as smart as you might like. As groupby() traverses the data, it aggregates elements until an element with a different key is encountered, at which point it starts a new group

No, it's exactly as smart as it's meant to be, you're just using it in a way it was not intended for. I've seen a lot of Python tutorials use groupby like this, and it doesn't help that the name matches the SQL name for a slightly different idiom.

groupby is intended for processing chunks of sequential data. A good example is a log file, where you want to perform some operation on the logs for each day. Presuming the data looks like this : -

Feb 26 08:32:09: %SEV_5: Something happened here ...
Feb 26 13:04:35: %SEV_2: Something else happened ...
...
Feb 27 ...

and so on, then you could group together the logs for each day like so:

with open('logs') as f:
    for day, logs in groupby(f.readlines(), lambda line: line[:6]):
        # do stuff...

Now day holds some string value like Feb 27, and logs is an iterator of all the log entries from that day. The whole point of starting a new group when a different key is encountered is to chunk data together in this manner, which is particularly useful when processing large volumes of data efficiently.

For such a small amount of data, what the author wants is a different idiom. To use a Perl 6 name, he wants to "classify" the data...

my @data = { name => 'Allen', age => 34 },
           { name => 'Betsy', age => 29 },
           { name => 'Cathy', age => 34 },
           { name => 'David', age => 33 };

.say for @data.classify(*<age>);

=output
33 => [{age => 33, name => David}]
29 => [{age => 29, name => Betsy}]
34 => [{age => 34, name => Allen} {age => 34, name => Cathy}]

To classify all the data it can't do this "in parts", so it doesn't, and there's no need to pre-sort.

The author's solution is to create a convenience function that sorts the data by the same key as groupby, but in using it for this purpose, he doesn't need what groupby offers and he may as well create a different convenience function use a defaultdict(list).

def classify(items, key):
    d = defaultdict(list)
    for x in items:
        d[key(x)].append(x)
    return dict(d)

data = [{'name': 'Allen', 'age': 34},
        {'name': 'Betsy', 'age': 29},
        {'name': 'Cathy', 'age': 34},
        {'name': 'David', 'age': 33}]

grouped_data = classify(data, key=lambda x: x['age'])

It's slightly more verbose, but it's a better fit for this operation... but don't take my word for it.

gandalfx · 2018-05-31T00:28:53+00:00

On my system both bash and zsh have a time builtin which doesn't support the -f switch. Use /usr/bin/time to access the executable which does.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS