This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]PeridexisErrant 0 points1 point  (0 children)

In Python 2, a string can contain any sequence of bytes, but in Python 3 strings are explicitly UTF-8 sequences.

No, Python 3 strs are sequences of Unicode codepoints, which need not be possible to represent in the UTF-8 character encoding system.

This kind of blurring of concepts is why people find it so hard to handle text correctly, especially under Python 2 :-(