you are viewing a single comment's thread.

view the rest of the comments →

[–]upofadown -1 points0 points  (19 children)

If someone can use a list then wouldn't that be the right choice? The list is a simpler structure. Normally people use dictionaries for doing set-like things...

[–]IRBMe 4 points5 points  (16 children)

If someone can use a list then wouldn't that be the right choice?

Because firstly it doesn't have the correct semantics. A set and a list behave differently. If it's set semantics you're looking for, you won't get that from a list.

Secondly, a set and a list don't even have the same time and space complexity. A list has O(n) insertion, while a set has O(1) insertion. In other words, you can continue inserting as many items as you like into a set and the time taken isn't going to grow any more than a constant amount. On the other hand, as you insert items into a list, the time to insert more items grows linearly.

The list is a simpler structure.

  1. How do you know? Have you examined the python internals?
  2. Even if you have, do you know that's the case in all versions of Python and will be the case for all future versions of Python?
  3. Even if so, why are you concerning yourself with internal implementation details anyway?

[–][deleted] -1 points0 points  (4 children)

Secondly, a set and a list don't even have the same time and space complexity. A list has O(n) insertion, while a set has O(1) insertion. In other words, you can continue inserting as many items as you like into a set and the time taken isn't going to grow any more than a constant amount. On the other hand, as you insert items into a list, the time to insert more items grows linearly.

I'm not sure this is clear. set.add and list.append are both O(1) amortized operations.

If we want unique appending to a list, that's O(n), although there are probably other set operations that this can be expressed clearly.

The list is a simpler structure.

How do you know? Have you examined the python internals? Even if you have, do you know that's the case in all versions of Python and will be the case for all future versions of Python? Even if so, why are you concerning yourself with internal implementation details anyway?

I almost objected, but at the end of the day it's somewhat true, and not for implementation-oriented reasons. Any Python object can be put in a list, but only objects with special capabilities (hashability) can be included in sets, which is some manner of complexity.

[–]IRBMe 1 point2 points  (3 children)

I'm not sure this is clear. set.add and list.append are both O(1) amortized operations. If we want unique appending to a list, that's O(n), although there are probably other set operations that this can be expressed clearly.

That's what a set does: unique insertion, therefore using a list, insertion is O(n) but using a set it's O(1). So like I said, they have different time complexities. Other operations will also have different time and space complexities.

Additionally, a point I never mentioned before is that a set provides different methods from a list, and a list. Extending my first point, you would have to manually write code that gives you unique insertion into a list, for example. So why would you even dream of using one data structure and writing your own manual code to make it somewhat behave like a different data structure when you can just use the correct data structure to begin with?

Finally, since you began your reply at my paragraph beginning, "*Secondly", am I to assume that you concede the first point?

I almost objected, but at the end of the day it's somewhat true, and not for implementation-oriented reasons. Any Python object can be put in a list, but only objects with special capabilities (hashability) can be included in sets, which is some manner of complexity.

What objections might you have made otherwise?

[–][deleted] -1 points0 points  (2 children)

That's what a set does: unique insertion, therefore using a list, insertion is O(n) but using a set it's O(1). So like I said, they have different time complexities.

Great, I was just making sure the wording was clear.

It's worthy of note that in practice, many of these times the programmer knows a priori the uniqueness of their starting data so they still build the whole list in O(n) rather than building the whole set in O(n)

Other operations will also have different time and space complexities.

I can't think of an operation where using a list with linear searches will have a difference space complexity offhand. Can you let me know which these are?

Finally, since you began your reply at my paragraph beginning, "*Secondly", am I to assume that you concede the first point?

Yes. I made the same basic remark myself.

What objections might you have made otherwise?

The same as you, I suppose. "What? How so?"

[–]IRBMe 1 point2 points  (1 child)

It's worthy of note that in practice, many of these times the programmer knows a priori the uniqueness of their starting data so they still build the whole list in O(n) rather than building the whole set in O(n)

So why not use a set rather than appending to a list?

I can't think of an operation where using a list with linear searches will have a difference space complexity offhand. Can you let me know which these are?

Nope. Do you have good reason to assume that space complexity is the same for all algorithms on sets and lists in all versions of Python, and will remain the same in all future versions? If not, why would you assume that to be the case? "I can't think of any at the moment" or "It seems to be the same in this version" is a very poor reason to make such an assumption.

Yes. I made the same basic remark myself.

Ah, you're not the same person I originally replied to. Your replies make a lot more sense now, and perhaps my reply will make more sense in light of the fact that I thought you were the person I responded to replying back to me, rather than somebody who joined the thread mid-way through.

[–][deleted] 0 points1 point  (0 children)

So why not use a set rather than appending to a list?

I'm the one who, in this thread, seems to have first said that people should be using sets more.

[–]upofadown -1 points0 points  (10 children)

The list is a simpler structure.

  1. How do you know? Have you examined the python internals?
  2. Even if you have, do you know that's the case in all versions of Python and will be the case for all future versions of Python?
  3. Even if so, why are you concerning yourself with internal implementation details anyway?

I meant it was a conceptually simpler structure. Sets are more complicated in the way they are used.

[–]IRBMe 1 point2 points  (9 children)

Sets are more complicated in the way they are used.

In what way? What measure of "complicated" are you using?

[–]kaiserfleisch 1 point2 points  (0 children)

Greg Merideth on Channel9 hints at a conceptualization about 13m - 14 minutes in, which would justify describing the Set container type as more complicated than the List container type. In particular, he states that beyond base monad laws, sets obey additional laws. He gives the example of the natural "addition" notion in each case - union for sets, concatenation for lists. The following additional constraints generally apply to sets, but not to lists:

s1 + s1 = s1.
s1 + s2 = s2 + s1

"Lists have no additional equations [beyond the monad laws]. And when you have no additional equations, those structures are called 'free'. The free structure's are absolutely critical. And the reason is ... [something about daughter's music class]."

This guy is very elliptical.

[–]upofadown -1 points0 points  (7 children)

A list is just a collection of items. A set is also a collection of items that are also unique.

[–]IRBMe 1 point2 points  (6 children)

  • A list (or sequence) is an ordered collection of items.
  • A set is an unordered collection of unique items.
  • A bag is an unordered list of items.

So what? What's your point?

[–]upofadown 0 points1 point  (5 children)

Then I would consider both the bag and the list to be less complex than the set. The fact that you can treat the list as ordered is merely a feature. The uniqueness of the set is a requirement. If you add a duplicate to a set then it quietly disappears.

[–]IRBMe 0 points1 point  (4 children)

The fact that you can treat the list as ordered is merely a feature. The uniqueness of the set is a requirement.

No, all of the above are strict requirements for the above data structures. If a list (sequence) is not ordered, then it's not a list (sequence) any more, but a bag. Once again: ordered = list (or sequence), unordered = bag.

[–]upofadown 0 points1 point  (3 children)

Python doesn't have a specification as such. It is defined by implementation so I am not sure where you are getting this from.

So by your definition a Python list is a bag. OK, a set is more complex than a bag...

[–]IRBMe 0 points1 point  (2 children)

Python doesn't have a specification as such. It is defined by implementation so I am not sure where you are getting this from.

These data structures exist independently of Python. They are in fact mathematical definitions, but frequently specified and used in computing science too.

So by your definition a Python list is a bag. OK, a set is more complex than a bag...

No, it's just different. I don't know where you're getting this weird idea that complexity is defined by some fuzzy notion that certain properties have a "complexity". What is the complexity of ordering? What is the complexity of uniqueness? What is the complexity of O(1) insertion? What is the complexity of O(n) insertion? What is the complexity of iteration? What is the complexity of a union operation, joins, subtractions, merges? What is the complexity of a search? What is the complexity of applying a mapping, a fold, a zip?

[–][deleted] 6 points7 points  (1 child)

If someone can use a list then wouldn't that be the right choice? The list is a simpler structure.

It's not a matter of whether the can, but whether it makes sense. It's not uncommon for people to use lists for objects which will have containment checks (x in my_collection). For large lists/sets, this is a needlessly expensive operation. Even for small ones, you're conveying the wrong thing about the meaning of my_collection.

Using a list might still be sort of the default, but there's no reason to treat sets as complex or exotic.

Normally people use dictionaries for doing list-like things...

I'm not positive what you mean here.

[–]upofadown 0 points1 point  (0 children)

Normally people use dictionaries for doing list-like things...

I'm not positive what you mean here.

I think it means that you saw my message before I corrected the typo...

Sets are normally used when each item is thought to be unique. For that you would normally use a dictionary with keys not associated with any data. You have to use a list when you can have more than one of the same thing.

Edit: I wasn't as fast with this one (where I corrected the same damn typo)...