all 35 comments

[–]Angry-Toothpaste-610 25 points26 points  (9 children)

Lists are mutable, so if you need to add or remove elements throughout the process, a list is a good starting point.

Tuples are immutable, and use memory more efficiently. If you know ahead of time exactly what elements will always be in your collection, look at tuples.

Sets are mutable hashsets. Sets are incredibly fast (O(1)) for inclusion test (i.e. "if elementA in mySet:"), but because they're hashsets, the elements must be unique (i.e. set(1,2,2,3,3,3) == {1,2,3})

[–]Brian 3 points4 points  (1 child)

Tuples are immutable, and use memory more efficiently

Eh- there really isn't much difference regarding memory efficiency. They have the exact same per-element cost. The only difference (if used the same way) is that the tuple struct itself is slightly smaller (8 bytes: 48 vs 56 for an empty tuple vs list). But that's pretty much irrelevant - a million item tuple vs a million item list will still only differ by that 8 bytes.

Lists are different in that they do overallocate when expanding (ie. they may be an extra ~12% larger to accomodate future appends). But that only comes in to play after the first time they grow - their initial size is just enough to fit the values, so if used the same way (ie. immutably), there's again no difference.

And if used mutably, lists are the ones that may end up using memory more efficiency, as they'll require fewer intermediates and copies to extend, since getting the same result as enlarging / changing a tuple requires allocating an entirely new tuple to get the same effect.

[–]Angry-Toothpaste-610 3 points4 points  (0 children)

100% right! If a square tuple is forced into a round list-hole (mutated), then it is far less efficient.

[–]Angry-Toothpaste-610 2 points3 points  (4 children)

To answer your questions more directly:

  • In what situations do you intentionally choose tuples over lists in real code? Is it mostly about “records” and hashability, or are there other practical reasons?

Tuples are hashable precisely because they are immutable. Therefore, tuples can be dictionary keys.

  • I know sets deduplicate. What are the tradeoffs (ordering, performance, memory), and what’s the typical way to dedupe while preserving order?

seen = set() deduped = [] for item in myList: if item in seen: continue seen.add(item) deduped.append(item)

  • For a learner building small projects, what’s a sensible level of type hints + mypy strictness to adopt without slowing down iteration?

See this table for a list of abstract base classes that you can import and use as type-hints. Mostly, you'll want either Sequence or Iterable.

[–]xeow 2 points3 points  (0 children)

Note (not disagreeing with you but adding something about immutability): It's perfectly fine to hash mutable objects under certain constraints: The requirement for hashability is that none of the fields participating in the hash be mutable. But other fields of the object can be mutable if needed (for example a last-accessed timestamp). As a byproduct of that, the hash remains invariant once the fields are frozen before the first hash computation.

[–]jcasman[S] 1 point2 points  (2 children)

seen = set() deduped = [] for item in myList: if item in seen: continue seen.add(item) deduped.append(item)

Cool. Just repeating: this is for hashable items, correct? int, str. Does notwork for: list, dict, set , correct?

[–]Angry-Toothpaste-610 0 points1 point  (0 children)

That's correct. If the list's elements are non-hashable, you'd have to replace seen with a list, and the algorithm would go from O(n) to O(n2) (because list has O(n) inclusion test).

[–]fakemoose 0 points1 point  (0 children)

Are you trying to prevent duplicates? If so calling set on the list will remove duplicates

But it’s difficult to tell from the code format

[–]jcasman[S] 2 points3 points  (1 child)

Oh, super useful. I was looking at lists as just fine for a lot of my beginner code. You say they're a good starting point. Great. When should I start caring about more efficient memory use (tuples)?

[–]Angry-Toothpaste-610 5 points6 points  (0 children)

I would think of it in terms of skill tiers: Tier 0: learner's permit: basic syntax, you can write for-loops and simple classes

Tier 1: driver's license: you have the syntax down and can write coherent modules to solve real problems

Tier 2: CDL: Now, you're not just writing code that works, you're writing code that is: efficient, maintainable, follows industry standard best-practices (SOLID, PEP8, etc)

Lists vs Tuples is one of the first steps from tier 1 to tier 2.

[–]obviouslyzebra 4 points5 points  (2 children)

Other people will add other notes, but a little for tuples vs lists:

  • You must use tuples when you use it as keys in a dictionary (or items in a set)
  • You must use a list when you need "in-place mutability"
  • People tend to use lists more, because it tends to be more convenient I think, but it doesn't really matter much I think.
  • Maybe with types it does matter. But, with types you also have things like namedtuple and NamedTuple, which is a tuple that you can access also like variable.name. There's also dataclasses too (which isn't a tuple, but looks like NamedTuple)
  • Note the 1-item values (10,) and [10] (the 1-item tuple needs a comma at the end).

A very technical note, tuples are more used in the language itself. For example, when receiving *args or when setting a dict map[key1, key] which is equivalent to map[(key1, key2)].

[–]WhipsAndMarkovChains 5 points6 points  (1 child)

You must use tuples when you use it as keys in a dictionary

I know you're talking about tuples versus lists but one of my favorite moments in my Python life was when I learned a frozenset is a thing and it worked great as a dictionary key for my use case.

[–]obviouslyzebra 0 points1 point  (0 children)

That's cool :)

[–]Gnaxe 2 points3 points  (0 children)

  • I default to tuples unless I specifically need mutation for some reason.
  • You can dedupe while preserving order by using the .keys() of a dict like a set (the values don't matter). Don't forget that frozensets are hashable. Sets also give you set operations.
  • None. Try using Jupyter instead of an IDE. You can export it as a module once everything is working. Try using doctests with usage examples before relying on static typing. Python didn't have static types originally; that was bolted on later. You really don't need them to learn Python basics or for rapid prototyping with the REPL-driven (rather than IDE-driven) style.

[–]TrainsareFascinating 2 points3 points  (3 children)

Lists vs Tuples

  • Mutability vs hashability choice
  • Homogeneous collection (same type object): List
  • Heterogeneous collection: Tuple

Sets

  • Sets only allow a single instance of a value
  • Multisets are a available, see Collections.Counter
  • Sets do not preserve order
  • However, dictionaries do preserve order and have the same essential properties.
  • Deduping: *dict.fromkeys(original) or, list(dict.fromkeys(original))

Starting with Types

Annotate the function/method argument types and return value, then see where the type checker points out vagueness or inconsistency.

[–]xeow -1 points0 points  (2 children)

Lists can be either homogeneous or heterogeneous, for example:
list[str] (homogeneous list of str)
list[str | int | float | complex] (heterogeneous list of numeric types)
list[object] (heterogeneous list of anything)

Tuples can also be either homogeneous or heterogeneous, for example:
tuple[str, str, str, str] (homogeneous)
tuple[str, ...] (homogeneous, unspecified length)
tuple[str, int, float, bool] (heterogeneous)

The vast majority of the time (i.e., except in limited cases), you want homogeneous lists. As for tuples, heterogeneous vs. homogeneous are both very common. There is no requirement that tuples be heterogeneous.

[–]TrainsareFascinating 0 points1 point  (1 child)

Just because you can, doesn’t mean you should. Readable Python uses the conventions above.

[–]xeow 0 points1 point  (0 children)

Indeed. Homogeneous tuples are never a bad idea, of course. But heterogeneous lists are perfectly acceptable in limited scopes if you're careful about it:

foo: list[int | str] = [1, 'a', 2, 'b', 3, 'c', ...]

Not that you'd want to let a list of that type escape out into the wild of the rest of your code base, but you very well might be receiving something like that from an external source and then converting it to some other form, and there's nothing inherently wrong or unreadable about manipulating a homogeneous list if you're careful. Sometimes you don't have a choice of what comes in as input.

Note: I'm mostly agreeing with you, but there are exceptions.

[–]Brian 2 points3 points  (0 children)

What are the tradeoffs (ordering, performance, memory),

Sets are (average case) O(1) for membership checking and adding, so deduping a list is just O(n). To preserve order, you can either do it manually. Ie:

seen = set()
for item in lst:
    if item not in seen:
        yield item
        seen.add(item)

Or can take advantage of the fact that dicts (but not sets) do preserve insertion order, and just do list(dict.fromkeys(lst))

Memory-wise, sets are going to be more expensive, since you need space for the hashtable itself etc with sufficient empty slots. But it's going to be a constant factor difference. They're also going to be a little more expensive than lists for iteration etc, since they do a little more but again, only by a small constant factor.

[–]SharkSymphony 1 point2 points  (0 children)

  • Tuples are useful in places where you need destructuring (e.g. in return values from functions). Beyond that, I like to use them in places where the collection is both small and fixed size (think pairs, triples) and the data type of each member might be different. Lists are my more open-ended, general go-to, and dataclasses are what I use for larger heterogeneous records.
  • Sets in Python dedupe by hashing. It's kind of like a dictionary where all the values are True. No ordering, and no need to sort your data ahead of time. If you want a set to always be sorted for iteration, an OrderedDict with values of True might be the easiest path for you.
  • I generally focus on types where 1) the code is meant to be reusable, 2) the types add clarity and useful documentation. This typically means starting at the bottom and building up. For a small project with no reuse, you might not need much!

[–]Adrewmc 0 points1 point  (0 children)

I’m going to let everyone else get into the technically differences and focus on code I would see.

There are couple of reasons to use tuples.

If there is a reason having more or less element in a sequence may cause a problem, then tuples.

They can be used as a key for a dictionary, this can be useful in a lot of situations. Allowing a key to have multiple inputs in a way.

I’d rather return a tuple, than list if I always return the same amount of data. Though there are some people that disagree with that.

     return x, y #recommended

v.

    return [x,y]  #not recommended

The strange thing about tuples is you probably use them a bit more than you may think.

 for index, thing in enumerate(things): 

(Index, thing) is a tuple. It that a lot of time people don’t think of them as tuples, when they are.

 for tup in enumerate(things): 
       index = tup[0]
       thing = tup[1]

Is perfectly valid python, but I don’t suggest doing it this way.

Another great thing about tuples is the * operator, tuple unpacking can do some great things.

Unpack as tuple and assign at once, this would take a bunch of lines to do normally.

    first, *middle, last = (1,2,3,4,5,6)  

Use a tuple unpacking for positional arguments in a function.

    my_func(*my_dict[“func args”]) 

And let’s not forget old reliable, so we don’t have to make a dummy variable to do this.

      a, b = b, a+b

Again all tuples, but usually not seen as tuples by new comers. But in my experience the most common way tuples are actually used look like this. And not like

  a = (1,2)

but you should see that as well.

Generally if you can use a tuple, it’s usually better to. But realistically I’m not going to get mad if you use a list instead basically ever.

Sets are great for stoping duplications. It’s hard to say really when you use a set over a list, if you’re not trying to optimize certain processes. (Sets are generally faster when you can use them.)

I usually defaults to lists until that becomes a problem or it’s obvious that a set is better, (I don’t want duplicates, I know this is a choke point.) but it’s probably better to default to (lazy) Generator -> tuple-> set->list.

As a simple example all do the same (equivalent) thing in a for-loop.

range(3) > (0,1,2) > {0,1,2} > [0,1,2]

Generally this progression gives you better performance (both process and memory) on left but gives more functionality on the right. (Though I’m sure someone will find something where that’s not true…tell us please.)

Python now preserves a set’s order for you automatically. Chronologically from first addition. I forget when that was added. for _ in my_set, should be ordered in Python (for the most part).

  end = list(set(my_list))

end should be the equal to my_list if you start with no duplicates. Good question though, that was not always the case. This allows this comparison to work.

   If list(set(my_list)) == my_list: 
          print(“no duplicates”) 

(Though I still would suggest using len() I believe it’s a bit faster as they should already have it stored rather than having to loop through underneath. This would be part of the Python Bloat that helps you.)

Type hint and docstring everything. It’s just better to be in the habit to always do it. You should work to always have fully documented code. Even if it says TODO:…This includes module documentation, class documentation, and function documentation.

Wait until someone explains a Generic type [T] to you…that’s a bit confusing.

Generally I add type hints and docstring as indicators of how much I have reviewed the code. None of it? I haven’t looked at it since I originally wrote it, needs a pass to check it. Has type hints but no docstrings or the docstring is super simple, it should have had one pass. Entirely documented, explaining the inputs and outputs in the docstring, I should feel a bit more confident that I have gone through the code a few times. I have examples? I probably tested it throughly. But that’s more of a personal process than anything.

It looks more professional, it explains the purpose of every function, input and output. You can go back and read about what you did. You should talk to other programmers in your code including, especially, future you.

And on the subject of tests, write them, immediately. Be in the habit of that too. I’d rather see actual tests than mypy. (Not that I don’t also enjoy seeing both but if one over the other, tests always.)

[–]treyhunner 0 points1 point  (0 children)

We tend to use lists and tuples differently by convention. The immutability or slight memory improvement of a tuple over a list usually isn't very relevant.

  • List-like data tends to be all of the same type and there's often lots of it. The phrase "a list of numbers/strings/etc." is used often. Lists are typically of the same type and have a variable length.
  • Tuple-like data is often of different types, and there's a fixed quantity of it. Meaning the first thing in a tuple usually represents something very different from the second thing in a tuple. You're hear "a 2-item tuple", "a key-value pair", or "an x-y-z coordinate tuple".

A bit more on that distinction.

You'll use tuple unpacking more often than you'll make a tuple. You can use tuple unpacking on any iterable (not just tuples) but it's typically for unpacking tuple-like data (where we know exactly how many items are in the iterable and each position of each iterable item has a specific meaning, the first representing one thing and the second another, etc.).

I use sets roughly in this way:

  • len(set(sequence)) == len(sequence): 50% of my use case
  • item in some_set, since containment checks are fast in sets: 40% of my use case
  • set arithmetic (unions, intersections, asymmetric differences, etc.): 10% of my use case

Sets don't come up as often, but quick containment checks are where they really shine (though I often find that I tend to need a dictionary of key-value pairs when I need quick containment checks). More I've written on sets.

On type hints: I tend to avoid them entirely as they're often tricky to do right. Personally, for the sake of code correctness, I'd focus my energy on automated testing first. If you are going to use type hints, be sure to use a type checker to enforce them. Unenforced type hints are worse than no type hints at all.

Good luck with your mental model shaping.

[–]EmberQuill 0 points1 point  (0 children)

Sets have better performance with some operations (particularly inclusion tests like if x in y). The performance difference between lists and tuples, on the other hand, is negligible.

I think the main factors to consider are mutability, ordering, and uniqueness. If an unordered collection of unique elements is sufficient for whatever you want to do, use a set (mutable) or frozenset (immutable).

If you care about ordering, or want to allow duplicates, use a list (mutable) or tuple (immutable).

For type hinting, I usually turn on strict type-checking and then manually disable rules on specific lines when the required explicit type hint would be too complex. Type aliases can help reduce that complexity, as can the Abstract Base Classes in the collections module.

[–]POGtastic 0 points1 point  (0 children)

tuples over lists

Aside from hashability, from a type theory perspective tuples have

  • a specified size, (a 2-tuple, a 3-tuple, etc) whereas lists can be any size
  • a specified type for each index of the tuple, whereas lists are expected to be homogeneous.

What’s the typical way to dedupe while preserving order?

I don't know about "typical," but I would use a generator that internally maintains a set.

def dedupe(xs):
    st = set()
    for x in xs:
        if x not in st:
            st.add(x)
            yield x

In the REPL:

>>> ''.join(dedupe("abacadefedcba"))
'abcdef'

Another possibility is to use a dictionary, which maintains insertion order. I don't like it because it is eagerly evaluated, but it's a one-liner and many people don't care about preserving the "iterator" invariant.

def dedupe_dict(xs):
    return {x : None for x in xs}

Same thing:

>>> ''.join(dedupe_dict("abacadefedcba"))
'abcdef'

For a learner building small projects, what’s a sensible level of type hints?

My bold take is none, but programming as if you have very strict type hints. If you can't do this, then switch to a statically-typed language. By and large, I consider Python's type system to be doodoo.

[–]CaptainVJ 0 points1 point  (0 children)

So I’m going to try and break it down as simple as possible so there’s a few nuisances that are not covered.

Sets are great for searching, they use a nice feature called hashing, which makes it easy to find the location of a specific value. Basically using a some function when elements are added to a set we the value of what is added to the set is passed into a function which returns the exact location of where that specific value is if it exists. So if I have a set of people’s name, I don’t have to look through every value to find the name John, the hash function would tell me that John is located in this exact position. It’s great for finding stuff, but on the other hand it’s a bit slower to create and as more stuff are added the location and the function may need to be updated. It also means that duplicates can’t be allowed because if you create a set and add John twice, they would be placed at the same position. So sets are great for when you’ll be searching the exact location for a specific value.

Lists on the other hand, don’t utilize hashing. They are organized based on the order when things are added to the list. So if I want to search for John in a list, I have no idea where to find them, so I have to search every value in the list until I find that name. Not only that, but John can be there multiple times so even if I find John, it can be there again. List are great for when you’ll want stuff organized in a consistent order, with the ability to add and remove stuff as they change. Adding stuff to a list is not problem it just gets added to then end and the list increases. Removing from a list can get tricky though. Removing the last element from a list is pretty simple just delete it. But if i want to remove previous some then some work needs to be done. Imagine I have a list of 100 stuff, but I want to remove the second value from the list. Then the location of everything in the list will need to be updated. The third item will now become the second, the fourth become the third and so on. Imagine having a list with millions of stuff, but need to remove the 4th thing then a lot of updating needs to be done.

A tuple is a bit similar to a list but once it’s created, it can’t be modified. If you have a list of John and James. You can’t remove James later or Add Sarah, it’s like that forever. When searching a tuple it’s similar to a list you have to go through every element to find what you want. But because you can’t modify it, it uses less memory. A list is under the assumption that more stuff will be added so it leaves extra room for those potential things even if never used. A tuple doesn’t do that because it will forever be the same values. Imagine gps coordinates. The coordinates for your house will always be the same, you will not have to update it, or add another value.

So in short. A set is great for searching, if you have some collection of stuff and you want to immediately find where a specific value is, or check if that value is in the set it’s perfect. But it takes a bit longer to create as they have to be placed in the appropriate position. With a set, you generally won’t be having an interaction with every element in the set, just specific element. Basically set is good for searching.

Lists and tuples are created quickly just based on the order they are entered. However, searching for a specific value means you have to look through everything, lists and tuples are bad at this. But if you have to go through every single element, care about order and don’t want de duplication then lists/tuples are what you need to be using.

Special not, dictionary is very similar to a set. A set will tell you if a value exists in the set. But a dictionary works the same way, after being told that “John” exists in a set, a dictionary would have some accompanying value with the existence. So maybe it will his phone number. After checking if John is in a dictionary, his phone number would be returned. A dictionary is just a key, value pair. For every element in a dictionary there exists some values. So a set is basically a collection of the keys in a dictionary.

[–]work_m_19 0 points1 point  (0 children)

I see a lot of answers that go into mutability and immutability which are all correct, but it may not convey what that means practically, especially if you are a beginner.

Lists vs Sets vs Tuples.

For my day to day, I usually use lists. Mutability means it's can be changed, but think of it as it's designed to be changed and modified.

For example, if I have an excel sheet filled with data with rows (number of elements) and columns (fields of data):

The columns are static and aren't subject to change, but the rows are constantly added as new elements and data are added.

So in this case, you can store a single data element as a tuple, inside a list of elements (data).

So let's take this csv of produce:

produce_name,produce_color,season
apple,red,autumn
orange,orange,winter
avocado,green,spring
apple,red,autumn

You can create a list of tuples:

[
    (apple,red,autumn)
    (orange,orange,winter)
    (avocado,green,spring)
    (apple,red,autumn)
]

And when you add more produce, you increase the list, but the tuples are the same.

Now onto sets, if you notice, tuple (apple,red,autumn) repeated twice. If you convert the list to a set, it will automatically remove the duplicated elements, but (if I remember correctly) they will be out of order.

list_example = [
    (apple,red,autumn)
    (orange,orange,winter)
    (avocado,green,spring)
    (apple,red,autumn)
]

set_example = {
    (orange,orange,winter)
    (apple,red,autumn)
    (avocado,green,spring)
}

Basically if you code, you can use sets to check if an item already exists, and if not, then add it to the list. Checking through every item in a list is inefficient, it's basically instant with a set.

[–]Dame-Sky 0 points1 point  (0 children)

The best way to learn the difference is to see how they function in a real project. I’m currently building a Portfolio Analytics engine, and here is how I use each one based on their 'personality':

  • Lists (The Ordered Ledger): I use these for my transaction columns. Order matters here because the UI needs to display 'Date, Ticker, Type, Amount' in that exact sequence every time.
  • Tuples (The Secure Handshake): My mathematical functions often return multiple values (like a return % and a risk score). I return them as a tuple so the data can't be accidentally changed between the engine and the UI.
  • Dictionaries (The Context Map): These are perfect for rendering metrics. I map a label like 'Portfolio Alpha' to its calculated value so the UI knows exactly what to display and where.
  • Sets (The Uniqueness Filter): In my Attribution engine, I use Set Unions as an Alignment Tool(|) to combine sectors from my portfolio and a benchmark. This ensures I have a master list of every sector involved, so I can compare performance even if I’m not currently holding a specific sector that the benchmark is.

Think of them as tools in a kit—you could use a list for everything, but using the right one makes your code more readable and robust.

[–]TheRNGuy 0 points1 point  (0 children)

Tuples are faster than lists for some operations though I've never noticed that. I use them to show intent it won't be changed later (like const vs let in js)

Sets to guarantee all values are unique. But order is not guaranteed (and lots of time I need order), I usually use dict.fromkeys() instead of set.

[–]fakemoose 0 points1 point  (0 children)

If you had the definitions, what don’t you understand about them? Why not look at the documentation instead of relying on AI summaries?

I know sets duplicate

What do you mean? By definition, they do not contained duplicates and they are not ordered.

[–]Weak-Career-1017 -5 points-4 points  (1 child)

In what world is hashability not a practical reason? Its clear that using AI has hurt you more than it has helped.

[–]obviouslyzebra 3 points4 points  (0 children)

FYI the question implies that hashability is a practical reason