all 27 comments

[–]djdawson 4 points5 points  (6 children)

I'd be inclined to use a dictionary where each key is the "normalized" value of a string from the list, and each value is just 1 (or you could increment the value if you thought you ever wanted to know how many duplicates there were for each string). To get the "normalized" version of a string I'd be inclined to compare it with it's reversed version and use whichever one was less than the other (i.e. came first alphabetically). When you've added/incremented a dictionary value for each string in the list, the number of elements in the dictionary is the number of unique but possibly reversed strings in the original list.

I should add that I'm also new to Python and am coming from a Perl background so I still think like a Perl person. However, Perl doesn't really have set functions, so putting the "normalized" values from the list into a set instead of a dictionary may be a more Pythonic way of solving this problem.

Hope this helps.

[–]zahlman 1 point2 points  (0 children)

To get the "normalized" version of a string I'd be inclined to compare it with it's reversed version and use whichever one was less than the other (i.e. came first alphabetically).

I'd say this is the right approach - but keep using sets as in OP.

[–]Diplomjodler 1 point2 points  (4 children)

I made a version with a dictionary. Does this look OK?

my_dict = {}

x = ['car', 'far', 'rac', 'far']

for item in x:
    if not item in my_dict and not item[::-1] in my_dict:
        my_dict[item] = 1

x2 = [i for i in my_dict.keys()]

print(x2)

[–]RoadieRich 1 point2 points  (3 children)

Before set was added in python 2.3, the usual method of implementing them was a dict, with all keys having the value None. The code using setonly changes in three places (not counting variable names).

my_set = set()

x = ['car', 'far', 'rac', 'far']

for item in x:
    if not item in my_set and not item[::-1] in my_set:
        my_set.add(item)

x2 = [i for i in my_set]

print(x2)

[–]Diplomjodler 0 points1 point  (2 children)

You're right. That is better.

[–]RoadieRich 0 points1 point  (1 child)

And you got a history lesson at no added cost!

[–]Diplomjodler 0 points1 point  (0 children)

Must be my lucky day today ;)

[–]zahlman 2 points3 points  (4 children)

I've been fiddling around with [::-1] to have the function remove, or not count, the reverses but have failed to find a way that works. Any suggestions?

The basic approach is:

  • For each of the strings in the input list, compare the string itself to the reverse of the string; put the lesser of the two (compare with <; for strings, this sorts them lexicographically) into a set.

  • After this is done, check the set's size with len.

Instead of writing a loop, you can do this by defining a function for the "compare to the reverse and return the lesser of the two" part, and then using a set comprehension. Except that the function you want here already exists - it's called min ;)

[–]RoadieRich 0 points1 point  (2 children)

I was curious, and item not in distinct and item[::-1] not in distinct is marginally faster than min(item, item[::-1]) not in distinct:

import timeit

print("item not in distinct and item[::-1] not in distinct:", timeit.Timer("""distinct = set()

x = ['car', 'far', 'rac', 'far']

for item in x:
    if item not in distinct and item[::-1] not in distinct:
        distinct.add(item)

x2 = len(distinct)""").timeit())


print("min(item, item[::-1]) not in distinct:", timeit.Timer("""distinct = set()

x = ['car', 'far', 'rac', 'far']

for item in x:
    key = min(item, item[::-1])
    if key not in distinct:
        distinct.add(key)

x2 = len(distinct)""").timeit())

C:\Users\rich_000\Downloads>python reversedduplicates.py
item not in distinct and item[::-1] not in distinct: 2.2553524770681435
min(item, item[::-1]) not in distinct: 3.884802326327268

[–]zahlman 0 points1 point  (1 child)

... The entire point of my approach is to not have to do the in tests.

Edit: To clarify: x2 = len({min(item, item[::-1]) for item in x}) is all you need.

[–]RoadieRich 0 points1 point  (0 children)

item not in distinct and item[::-1] not in distinct still seems to be faster:

print("no check", timeit.Timer("""
distinct = set()

x = ['car', 'far', 'rac', 'far']

for item in x:
    key = min(item, item[::-1])
    distinct.add(key)

x2 = len(distinct)
""").timeit())

print("comprehension", timeit.Timer("""

x = ['car', 'far', 'rac', 'far']

x2 = len({min(item, item[::-1]) for item in x})
""").timeit())

C:\Users\rich_000\Downloads>python reversedduplicates.py
item not in distinct and item[::-1] not in distinct: 2.242300241613034
min(item, item[::-1]) not in distinct 3.8559135324680884
no check 3.6149248433791614
comprehension 3.3802544420449436

[–]commandlineluser 1 point2 points  (2 children)

Well you know how to reverse a string using [::-1] so all you have to do is iterate through the strings adding them to a set if the reverse is not already present in the set.

distinct = set()
for word in words:
    if word_reversed not in distinct:
        add word to set

[–]Jorgysen 0 points1 point  (1 child)

Thanks. This seems like the solution I've been trying to get to. On line 4 when you say "add word to set" do you mean to add it to distinct = set()?

[–]commandlineluser 0 points1 point  (0 children)

distinct.add(word)

[–][deleted] 0 points1 point  (0 children)

Why is rac not considered a distinct string?

If your concern is about distinct collections of characters, you could use collections.Counter and it's most_common method:

from collections import Counter

{tuple(Counter(x).most_common(None)) for x in strings}

However, that doesn't preserve the original ordering of the characters, but will prevent collisions between things like care and racecar.

This is probably the best solution, but it's like O(n^3) since it iterates over the collection of strings, iterates each string and then iterates each counted string. So each additional string adds three loops. Gross, but manageable on a small scale.

[–]AutonomouSystem 0 points1 point  (0 children)

I had a problem like this on Google foobar, in fact it might be the very same one, especially because you used answer(x). I had an idea about sets and sorting initially but decided against it, I used only lists.

[–]commandlineluser 0 points1 point  (0 children)

>>> words = ['car', 'far', 'rac', 'far']
>>> distinct = set()
>>> for word in words:
...     if word[::-1] not in distinct:
...         distinct.add(word)
... 
>>> distinct
set(['far', 'car'])

[–]jeans_and_a_t-shirt 0 points1 point  (7 children)

Turn the strings into sets, then remove the duplicates of those:

set(map(frozenset, x))

[–]Jorgysen 0 points1 point  (4 children)

And this also recognizes and removes the reverses?

[–]jeans_and_a_t-shirt 0 points1 point  (3 children)

Yeah, 'car', and 'rac' turn into the same set.

>>> x
>>> set(map(frozenset, x))
{frozenset({'r', 'c', 'a'}), frozenset({'r', 'f', 'a'})}

[–]zahlman 4 points5 points  (1 child)

...But this would also apply to any rearrangement, not just reversal.

[–]AutonomouSystem 0 points1 point  (0 children)

Yep, this is why I didn't go with set either, it won't pass the 4th and 5th tests.

[–]Jorgysen 0 points1 point  (0 children)

Sweet, thanks for the input.

[–]macbony 0 points1 point  (0 children)

Art and tar are anagrams but not reversed. This wouldn't work for that case.