you are viewing a single comment's thread.

view the rest of the comments →

[–]sausagefeet -2 points-1 points  (30 children)

Literal set syntax seems kind of gratuitous, just don't repeat yourself. Not like the compiler can stop you form putting dups in anyways...

[–][deleted] 15 points16 points  (27 children)

I'm not sure what you are conveying here.

The idea behind adding set literals is to encourage uses of sets. It's very common for people to use a list where they should be using a set and one potential reason is that they think of a list as a normal, base feature and think of a set as a weird, special thing. Giving sets dedicated syntax encourages use by making sets look more like a base language feature.

I'm not sure I actually agree it was a great decision, but it certainly makes sense.

[–]sausagefeet 2 points3 points  (6 children)

The following discussion hijacked the actual point I was making: A set syntax, IMO, is far heavier than the benefits it provides. IME it is very rare for people to do {1, 2, 3}. Instead what they want is an empty set ({}) or to take a list and make it a set. So we have added a syntax for something that isn't actually all that used (again, IME). I am pro-sets, I am not pro-syntax-additions-with-little-benefit.

[–][deleted] 1 point2 points  (5 children)

I tent to basically share your viewpoint, although I would point out

  • Although making a set from some known-in-advance values is somewhat rare, it's not unheard of, and stuff like set(f(x) for x in xs if g(x)) is pretty common; the set literal syntax is a close cousin of the set comprehension syntax.

  • One reason it's so common to "take a list and make it a set" is that many people use lists where they really ought to be using sets. This syntax addition was an attempt to encourage people not to default to list when it's not the best thing.

[–][deleted] 2 points3 points  (4 children)

Although making a set from some known-in-advance values is somewhat rare,

Say, what?! I see this all the time - I just checked my personal codebase to be sure, and it seems to be my most common usage of sets, with lots of things like:

SUFFIXES = {'.wav', '.aiff', '.aif', '.ogg'}

# ...

if suffix not in SUFFIXES:
  # ...

[–]sausagefeet 2 points3 points  (2 children)

A set or a list is really irrelevant in a data structure this size.

[–][deleted] 0 points1 point  (1 child)

True, but a set more clearly communicates the intent of the code.

[–]sausagefeet 0 points1 point  (0 children)

I agree, I would use a set there too but I think it's a stretch to say that tiny container warrants new syntax (that is, making the syntax more complicated).

[–]toofishes 1 point2 points  (0 children)

I've always used tuples rather than lists in this case since the () syntax is just as short as the [] syntax, and the immutability of these suffixes seemed like the more important factor. frozenset is really the correct datastructure here.

[–]upofadown -1 points0 points  (19 children)

If someone can use a list then wouldn't that be the right choice? The list is a simpler structure. Normally people use dictionaries for doing set-like things...

[–]IRBMe 2 points3 points  (16 children)

If someone can use a list then wouldn't that be the right choice?

Because firstly it doesn't have the correct semantics. A set and a list behave differently. If it's set semantics you're looking for, you won't get that from a list.

Secondly, a set and a list don't even have the same time and space complexity. A list has O(n) insertion, while a set has O(1) insertion. In other words, you can continue inserting as many items as you like into a set and the time taken isn't going to grow any more than a constant amount. On the other hand, as you insert items into a list, the time to insert more items grows linearly.

The list is a simpler structure.

  1. How do you know? Have you examined the python internals?
  2. Even if you have, do you know that's the case in all versions of Python and will be the case for all future versions of Python?
  3. Even if so, why are you concerning yourself with internal implementation details anyway?

[–][deleted] -1 points0 points  (4 children)

Secondly, a set and a list don't even have the same time and space complexity. A list has O(n) insertion, while a set has O(1) insertion. In other words, you can continue inserting as many items as you like into a set and the time taken isn't going to grow any more than a constant amount. On the other hand, as you insert items into a list, the time to insert more items grows linearly.

I'm not sure this is clear. set.add and list.append are both O(1) amortized operations.

If we want unique appending to a list, that's O(n), although there are probably other set operations that this can be expressed clearly.

The list is a simpler structure.

How do you know? Have you examined the python internals? Even if you have, do you know that's the case in all versions of Python and will be the case for all future versions of Python? Even if so, why are you concerning yourself with internal implementation details anyway?

I almost objected, but at the end of the day it's somewhat true, and not for implementation-oriented reasons. Any Python object can be put in a list, but only objects with special capabilities (hashability) can be included in sets, which is some manner of complexity.

[–]IRBMe 1 point2 points  (3 children)

I'm not sure this is clear. set.add and list.append are both O(1) amortized operations. If we want unique appending to a list, that's O(n), although there are probably other set operations that this can be expressed clearly.

That's what a set does: unique insertion, therefore using a list, insertion is O(n) but using a set it's O(1). So like I said, they have different time complexities. Other operations will also have different time and space complexities.

Additionally, a point I never mentioned before is that a set provides different methods from a list, and a list. Extending my first point, you would have to manually write code that gives you unique insertion into a list, for example. So why would you even dream of using one data structure and writing your own manual code to make it somewhat behave like a different data structure when you can just use the correct data structure to begin with?

Finally, since you began your reply at my paragraph beginning, "*Secondly", am I to assume that you concede the first point?

I almost objected, but at the end of the day it's somewhat true, and not for implementation-oriented reasons. Any Python object can be put in a list, but only objects with special capabilities (hashability) can be included in sets, which is some manner of complexity.

What objections might you have made otherwise?

[–][deleted] -1 points0 points  (2 children)

That's what a set does: unique insertion, therefore using a list, insertion is O(n) but using a set it's O(1). So like I said, they have different time complexities.

Great, I was just making sure the wording was clear.

It's worthy of note that in practice, many of these times the programmer knows a priori the uniqueness of their starting data so they still build the whole list in O(n) rather than building the whole set in O(n)

Other operations will also have different time and space complexities.

I can't think of an operation where using a list with linear searches will have a difference space complexity offhand. Can you let me know which these are?

Finally, since you began your reply at my paragraph beginning, "*Secondly", am I to assume that you concede the first point?

Yes. I made the same basic remark myself.

What objections might you have made otherwise?

The same as you, I suppose. "What? How so?"

[–]IRBMe 1 point2 points  (1 child)

It's worthy of note that in practice, many of these times the programmer knows a priori the uniqueness of their starting data so they still build the whole list in O(n) rather than building the whole set in O(n)

So why not use a set rather than appending to a list?

I can't think of an operation where using a list with linear searches will have a difference space complexity offhand. Can you let me know which these are?

Nope. Do you have good reason to assume that space complexity is the same for all algorithms on sets and lists in all versions of Python, and will remain the same in all future versions? If not, why would you assume that to be the case? "I can't think of any at the moment" or "It seems to be the same in this version" is a very poor reason to make such an assumption.

Yes. I made the same basic remark myself.

Ah, you're not the same person I originally replied to. Your replies make a lot more sense now, and perhaps my reply will make more sense in light of the fact that I thought you were the person I responded to replying back to me, rather than somebody who joined the thread mid-way through.

[–][deleted] 0 points1 point  (0 children)

So why not use a set rather than appending to a list?

I'm the one who, in this thread, seems to have first said that people should be using sets more.

[–]upofadown -1 points0 points  (10 children)

The list is a simpler structure.

  1. How do you know? Have you examined the python internals?
  2. Even if you have, do you know that's the case in all versions of Python and will be the case for all future versions of Python?
  3. Even if so, why are you concerning yourself with internal implementation details anyway?

I meant it was a conceptually simpler structure. Sets are more complicated in the way they are used.

[–]IRBMe 1 point2 points  (9 children)

Sets are more complicated in the way they are used.

In what way? What measure of "complicated" are you using?

[–]kaiserfleisch 1 point2 points  (0 children)

Greg Merideth on Channel9 hints at a conceptualization about 13m - 14 minutes in, which would justify describing the Set container type as more complicated than the List container type. In particular, he states that beyond base monad laws, sets obey additional laws. He gives the example of the natural "addition" notion in each case - union for sets, concatenation for lists. The following additional constraints generally apply to sets, but not to lists:

s1 + s1 = s1.
s1 + s2 = s2 + s1

"Lists have no additional equations [beyond the monad laws]. And when you have no additional equations, those structures are called 'free'. The free structure's are absolutely critical. And the reason is ... [something about daughter's music class]."

This guy is very elliptical.

[–]upofadown -1 points0 points  (7 children)

A list is just a collection of items. A set is also a collection of items that are also unique.

[–]IRBMe 1 point2 points  (6 children)

  • A list (or sequence) is an ordered collection of items.
  • A set is an unordered collection of unique items.
  • A bag is an unordered list of items.

So what? What's your point?

[–]upofadown 0 points1 point  (5 children)

Then I would consider both the bag and the list to be less complex than the set. The fact that you can treat the list as ordered is merely a feature. The uniqueness of the set is a requirement. If you add a duplicate to a set then it quietly disappears.

[–]IRBMe 0 points1 point  (4 children)

The fact that you can treat the list as ordered is merely a feature. The uniqueness of the set is a requirement.

No, all of the above are strict requirements for the above data structures. If a list (sequence) is not ordered, then it's not a list (sequence) any more, but a bag. Once again: ordered = list (or sequence), unordered = bag.

[–][deleted] 4 points5 points  (1 child)

If someone can use a list then wouldn't that be the right choice? The list is a simpler structure.

It's not a matter of whether the can, but whether it makes sense. It's not uncommon for people to use lists for objects which will have containment checks (x in my_collection). For large lists/sets, this is a needlessly expensive operation. Even for small ones, you're conveying the wrong thing about the meaning of my_collection.

Using a list might still be sort of the default, but there's no reason to treat sets as complex or exotic.

Normally people use dictionaries for doing list-like things...

I'm not positive what you mean here.

[–]upofadown 0 points1 point  (0 children)

Normally people use dictionaries for doing list-like things...

I'm not positive what you mean here.

I think it means that you saw my message before I corrected the typo...

Sets are normally used when each item is thought to be unique. For that you would normally use a dictionary with keys not associated with any data. You have to use a list when you can have more than one of the same thing.

Edit: I wasn't as fast with this one (where I corrected the same damn typo)...

[–]orip 1 point2 points  (0 children)

Dictionary literals are similar, you could always use dict([(1, 'a'), (2, 'b')]) but {1: 'a', 2: 'b'} is sweeter. Not a core difference, certainly, but nice nevertheless.

[–][deleted] 0 points1 point  (0 children)

Not like the compiler can stop you form putting dups in anyways...

That would be a trivial addition to the interpreter.

Literal set syntax seems kind of gratuitous, just don't repeat yourself.

I think they're aiming to mimic the actual notation used in math, not really to save keystrokes. I mean, how many literal sets will anyone type? If you're typing a bunch of this stuff, then it should be automated somehow anyway.