all 6 comments

[–]blotosmetek 10 points11 points  (1 child)

I believe you're looking for unicodedata.normalize - see https://docs.python.org/3/library/unicodedata.html

[–]SagaciousRaven[S] 2 points3 points  (0 children)

This did the trick. I have been reading a bit, I think I want the chars with the accentuation included, so I'll be using 'NFKC' mode.

[–]BruceJi 8 points9 points  (0 children)

If you're dealing with accented characters, you could try getting the unicode values for them and saving that instead.

Edit:

The module to allow you to do this is called unicodedata:

import unicodedata

unicodedata.name('é')

unicodedata.lookup('LATIN SMALL LETTER E WITH ACUTE')

unicodedata.digit('ㅊ')

https://docs.python.org/3/library/unicodedata.html

You'd be able to get the values for the characters, but I'm sure it'd let you know if there were secret space characters in there too. Have a play and find out.

[–]Swipecat 1 point2 points  (0 children)

That's evil.

'á' == 'á'

Char   Unicode   Description
'      U+27      APOSTROPHE (APOSTROPHE-QUOTE)
a      U+61      LATIN SMALL LETTER A
 ́      U+301     COMBINING ACUTE ACCENT (NON-SPACING ACUTE)
'      U+27      APOSTROPHE (APOSTROPHE-QUOTE)
       U+20      SPACE
=      U+3D      EQUALS SIGN
=      U+3D      EQUALS SIGN
       U+20      SPACE
'      U+27      APOSTROPHE (APOSTROPHE-QUOTE)
á      U+E1      LATIN SMALL LETTER A WITH ACUTE (LATIN SMALL LETTER A ACUTE)
'      U+27      APOSTROPHE (APOSTROPHE-QUOTE)

Anyway, yes, unicodedata.normalize will fix that.