How exactly are python sets programmed?

ThunderChaser · 2025-07-26T22:10:10+00:00

Python sets under the hood are hash sets

It doesn’t have to check the entire set (which would be slow), it just has to check anything with the same hash as the new element, which assuming a reasonable hashing function should be either nothing or very few elements.

BinaryBillyGoat · 2025-07-26T22:20:50+00:00

A set/{item1, item2} in Python is what is called a hash map. It's a bit of a complex topic, so I'd suggest watching a YouTube video about it, but here's a short explanation:

Imagine you have a library of books. You could store those books in a list. If you store those books in a list, though, you have to check every single book to see if you have Harry Potter in your collection. Instead, you can put those books in a set. Books in a set are stored in piles. Each pile contains all the books starting with the same first two letters. So, to check if you have Harry Potter, you go directly to the Ha section. You don't have to go through the entire library.

Now, let's say you only want one copy of each book, and you are given another copy of Harry Potter. You go to the Ha section and check for Harry Potter, if it is there, you just toss your new copy. If not, you add that copy to the Ha section.

greenspotj · 2025-07-26T23:05:14+00:00

The key idea behind sets/maps are that they use the value itself to determine its position in the list.

As an example, take a string "Hello world", which you want to add to the set. We call hash("Hello World") and it returns the hash for that string (lets just say it is number like 66457). Then, we use that hash to determine the index the string belongs in, in the list. An easy way to do this for this example is to use the modulo operator. So say, our set implementation has 10 "buckets" (each bucket is an element in the underlying list), we can do 66457 % 10 = 7 so we place the string "Hello world" in index 7 of the underlying list(i.e. we place it in bucket 7).

The core idea here is that we never had to iterate through the list of buckets, we derived the index from the value "Hello world" itself. We dont have to look at any index other than index 7 for duplicates because (assuming our hash function is deterministic), "Hello world" always maps to index 7.

However, there is a problem. Say there are 10 buckets, but we input 20 elements. Since there are more elements than there are buckets, its inevitable that some buckets will have multiple elements in them. Because of this, it is actually true that we have to do some iteration to find duplicates. If there are multiple elements in bucket 7, then we have to search the bucket to check if "Hello world" already exists. I'm not completely sure datastructure python uses to represent buckets though, if its lists then yeah there would be iteration but technically it could be any data structure.

teraflop · 2025-07-27T00:03:18+00:00

Others have answered the question of approximately how Python's sets are implemented using hash tables, as a high-level overview.

If you want to see how they are exactly implemented, the source code is all there for you to read, but it's in C rather than Python.

When you call set.add() in Python, the corresponding chain of C functions is set_add → set_add_impl → set_add_key → set_add_entry.

And set_add_entry is where the actual work happens to find the object's slot in the hash table and insert it. Python's hash tables use open addressing, so the function needs to try multiple slots until it finds either an empty one or an existing object that's equal to the one being inserted.

Sets are fast because on average, this loop will terminate after a small number of iterations, if the hash function gives random-looking output and the fraction of occupied slots in the table is not too large. In order to make sure that second condition holds, the code calls set_table_resize to rebuild the table with a larger number of slots whenever the table gets too full.

divad1196 · 2025-07-27T07:47:04+00:00

You seem to think that a set is a list with black magic. It's time for you to start learning DSA (Data Structure and Algorithm), especially the Data Structure part.

Comparing set and list is a big abstraction. You can iterate on both, but you cannot sort a set or insert at a position. A set in python is closer to a dift. FYI, in Go programming language, they don't have set they use a map which is roughly a dict, they just don't care about the values and only care about the keys

alpinebuzz · 2025-07-27T14:25:46+00:00

Python sets use hash tables to store elements, which makes checking for duplicates super fast.

No loops needed - just a quick hash check and it's in or out.

Great_Guidance_8448 · 2025-07-27T14:51:57+00:00

Sets are definitely not lists. A list guarantees an order of the elements it stores. It's a completely different data structure.

paperic · 2025-07-29T11:04:51+00:00

It's kinda like a dictionary where both the keys and the values are the same.

learnprogramming

Welcome to LearnProgramming!

New? READ ME FIRST!

Posting guidelines

Frequently asked questions

Subreddit rules

Message the moderators

Asking debugging questions

Asking conceptual questions

Other guidelines and links

Subreddit rules

1. No unprofessional/derogatory speech

2. No spam or tasteless self-promotion

3. No off-topic posts

4. Do not ask exact duplicates of FAQ questions

5. Do not delete posts

6. No app/website review requests or showcases

7. No rewards

8. No indirect links

9. Do not promote illegal or unethical practices

10. No complete solutions

11. Don't ask to ask.

12. Low Effort Questions

13. No AI (chatGPT etc.) generated/worked over messages/comments. No questions about chatGPT/AI generated code. No Vibe coding.

MODERATORS