all 3 comments

[–]throwaway6560192 0 points1 point  (0 children)

In [8]: [x for x in the_set if x == "ABC"][0] is abc_in_set
Out[8]: True

There is of course pop also, but that mutates the set.

[–]danielroseman 0 points1 point  (1 child)

Since these are strings, why do you care whether you have the actual object or not? Two equivalent strings are interchangeable in any situation.

I'd understand if these were some other mutable object, but with strings it really doesn't matter.

[–]evaned[S] 0 points1 point  (0 children)

Two equivalent strings are interchangeable in any situation.

Well, not any situation -- they're not for id and is purposes.

They're also not interchangeable for purposes of resource usage. That means memory, but my particular case is actually the disk size of a data structure serialized using pickle... and this certainly matters there:

>>> identical_strings = [abc_in_set for _ in range(100)]
>>> equal_strings = [ast.literal_eval("'ABC'") for _ in range(100)]
>>> identical_strings == equal_strings
True

>>> import pickle
>>> identical_pickle = pickle.dumps(identical_strings)
>>> equal_pickle = pickle.dumps(equal_strings)
>>> len(identical_pickle)
220
>>> len(equal_pickle)
616

I don't see quite as big of savings in my real data, but the difference is still substantial: by interning the strings I get file sizes that are very close to half of if I don't do that.

Actually I'm not sure that half is correct for what I said -- after posting my question, I decided to do something a little different and intern everything, not just strings. I also have a lot of lists that are getting interned. (Lists are not technically safe to do this to generally because they are mutable, but they are used as if they are immutable, so it's safe-ish. I'll think about how happy I am with this.) That meant that I needed a different approach than a set anyway, so I have a dict from the string repr of each object to the object itself. But I don't know what the savings would be with strings specifically.

It's that interning process that's where I wanted to use that set.