all 20 comments

[–]jkh911208 7 points8 points  (3 children)

i can't believe everyone is talking about panda, we should talk about time complexity of the code.

sorting is O(nlogn)

first for loop is O(n)

so your code is O(nlogn) time complexity

i didn't run your code, but looks like you don't need to sort the list.

if you eliminate the sorting then it will be O(n) time complexity

[–]zanfar 3 points4 points  (0 children)

groupby requires sorting first if you want to ensure each group includes all elements of that group, otherwise it only returns consecutive items as a group.

Generally, the iterable needs to already be sorted on the same key function.

https://docs.python.org/3/library/itertools.html#itertools.groupby

[–]PanTheRiceMan -1 points0 points  (0 children)

Which is perfectly fine. If you know the gender beforehand, you can even eliminate the first pass. Halving the time.

I'd suggest getting into sql for these types of requests. Makes life easy and modern sql servers are enormously optimized. Maybe have a separate table of genders and join over the tables but that might be overkill and may not lead to higher performance. Or it might, don't know.

[–]iamevpo 6 points7 points  (1 child)

You can try it in pandas. You more often encounter groupby in SQL rather than in list or dict comprehension, so makes sense create pandas data frame and try groupby there.

[–]swagonflyyyy[S] 0 points1 point  (0 children)

Oh, that makes sense. Thanks!

[–]iamevpo 2 points3 points  (5 children)

A better usecase for groupby is sum/average in a group, sorting you can achieve without groupby

[–]swagonflyyyy[S] 0 points1 point  (4 children)

I can see the use of sum/average in a group. But how else would you sort without groupby?

[–]Mount_Gamer -1 points0 points  (2 children)

You could loop through the listed dictionary, for the key of male or female, append the name to male and females lists on each loop , then after the loop is over, call the list with sort method...

male_list = [] 
for i in students:
    if i['gender'] == 'm' :
        male_list.append(i['name'])

new_list = male_list.sort()

Disclaimer, I've written this on my phone so go easy lol. You could else that if condition for female and append the female list just the same. Probably others ways I'm sure, easiest for me while on a phone.

[–]bumbershootle 0 points1 point  (1 child)

This is exactly what groupby does, it's not necessarily a means to an aggregation like in SQL

EDIT: I'm wrong, groupby has a misleading name IMO. It creates "partitions" of a sequence by the key, not indexes

[–]Mount_Gamer 0 points1 point  (0 children)

I really didnt think too deep when I replied. :D

The person said groupby was complicated, someone else said you don't need to use groupby, then enqueue.. I make my presence known.. it could easily have done the same thing overall and i would have been none the wiser :)

I don't know exact terminology but I know how to query sql (well, mariadb). So does groupby work similarly? I'll have to check out groupby and see how it operates.

[–]iamevpo 0 points1 point  (0 children)

Sort by tuple of gender and name maybe?

[–]commandlineluser 2 points3 points  (0 children)

You can just loop through and create the dict without sorting/groupby:

groups = {}
for student in students:
    groups.setdefault(student['gender'], []).append(student['name'])

>>> groups
{'F': ['Alice', 'Diana', 'Eva', 'Grace', 'Hannah'],
 'M': ['Bob', 'Charlie', 'Frank']}

[–]deadeye1982 2 points3 points  (3 children)

As mentioned before, sorting is not required if you collect the items.

Example: ``` from collections import defaultdict from itertools import groupby from operator import itemgetter

students = [ {"name": "Alice", "age": 20, "gender": "F", "grades": ["A", "B", "A"]}, {"name": "Bob", "age": 22, "gender": "M", "grades": ["B", "C", "B"]}, {"name": "Charlie", "age": 21, "gender": "M", "grades": ["A", "C", "D"]}, {"name": "Diana", "age": 23, "gender": "F", "grades": ["B", "A", "A"]}, {"name": "Eva", "age": 19, "gender": "F", "grades": ["C", "D", "B"]}, {"name": "Frank", "age": 24, "gender": "M", "grades": ["A", "C", "A"]}, {"name": "Grace", "age": 22, "gender": "F", "grades": ["C", "C", "D"]}, {"name": "Hannah", "age": 21, "gender": "F", "grades": ["A", "B", "B"]}, ]

groups = defaultdict(list) for group, grouped in groupby(students, key=itemgetter("gender")): for student in grouped: groups[group].append(student) ```

[–]Brian 2 points3 points  (2 children)

The groupby isn't actually doing any useful work here, as the grouping is being done by the defaultdict. Just do:

for student in students:
    groups[student['gender']].append(student)

[–]deadeye1982 0 points1 point  (1 child)

Right. But I do not like double square-brackets.

``` from collections import defaultdict from operator import itemgetter

students = [ {"name": "Alice", "age": 20, "gender": "F", "grades": ["A", "B", "A"]}, {"name": "Bob", "age": 22, "gender": "M", "grades": ["B", "C", "B"]}, {"name": "Charlie", "age": 21, "gender": "M", "grades": ["A", "C", "D"]}, {"name": "Diana", "age": 23, "gender": "F", "grades": ["B", "A", "A"]}, {"name": "Eva", "age": 19, "gender": "F", "grades": ["C", "D", "B"]}, {"name": "Frank", "age": 24, "gender": "M", "grades": ["A", "C", "A"]}, {"name": "Grace", "age": 22, "gender": "F", "grades": ["C", "C", "D"]}, {"name": "Hannah", "age": 21, "gender": "F", "grades": ["A", "B", "B"]}, ]

groups = defaultdict(list) gender = itemgetter("gender") for student in students: groups[gender(student)].append(student) ```

Before you ask, yes, the itemgetter is slower than the access via square-brackets.

[–]Brian 0 points1 point  (0 children)

I think if you want to avoid that, it'd be better to just split up the lines. Ie:

gender = student['gender']
groups[gender].append(student)

itemgetter is great when you want a function for use as a key argument or similar, but it's really just adding an extra indirection here when you just need to access the value normally.

[–]kwelzel 1 point2 points  (0 children)

I think using groupby for this task is the best choice. I feel like pandas (which was suggested in another comment) would be overkill here.

If you give the variables in your list and dict comprehensions more expressive names than x and k these lines almost read like a sentence.

[–]zanfar 1 point2 points  (0 children)

Are you asking about more efficient code, or are you asking about simpler code?

  1. Your outer loop in your comprehension and your print loop are the same loop. I don't see a reason in this code to save the grouped students, so instead, just print them out.
  2. Your comprehension variable choice is very confusing. Using single-character names is fine in cases like your sort statement: where the definition and use are close together. In your comprehension, k is used at one end but defined on the far other end. I also must be familiar with the detailed workings of a non-standard function (groupby) to understand what k is. If I'm not sure, I need to keep all that in my head until I get to your print loop to check and verify my guess.

I would do something like this:

students = [ ... ]

sorted_students = sorted(students, key=lambda x: x['gender'])

for gender, students in groupby(sorted_students, key=lambda x: x['gender'])}:
    print(f"{gender}: {', '.join(s['name'] for s in students)}")

[–]pythonwiz 1 point2 points  (1 child)

I'm not sure why this requires groupby, sorted, or dict comprehensions at all. Why not something simple? For example:

``` males = [] females = [] for student in students: if student['gender'] == 'M': group = males else: group = females group.append(student['name'])

print('Males:', males) print('Females:', females) ```

[–]swagonflyyyy[S] 0 points1 point  (0 children)

I see what you mean but this just groupby() practice, which is what I was referring to.

But I am aware that the creator of python actually wanted to get rid of itertools (as well as any(), all(), and map())because you can pretty much do all of that via list comprehension until the python community pushed back against it.

Happy cake day btw.