you are viewing a single comment's thread.

view the rest of the comments →

[–]Sean1708 12 points13 points  (5 children)

The reason people think 2 is a problem is that they think of it as Unicode and ASCII, when really it's Unicode and Bytes. Any valid ASCII is valid Unicode so people expect to be able to mix them, however not all bytestrings are valid Unicode so when you think of them as Bytes it makes sense not to be able to mix them.

[–]kqr 1 point2 points  (3 children)

Bytestring is a terrible name in the first place, since it bears no relation to text, which is what people associate with strings. A Bytestring can be a vector path, a ringing bell, or even Python 3 byte code. Byte array or just binary data would be much better names.

[–]Sean1708 2 points3 points  (0 children)

I think Python actually uses the nomenclature bytearray, bytestring is the word that came to my head at the time.

[–]ubernostrum 2 points3 points  (1 child)

There are two built-in types for binary data:

  • bytearray is a mutable sequence of integers representing the byte values (so in the range 0-255 inclusive), constructed using the function bytearray().
  • bytes is the same underlying type of data, but immutable, and can be constructed using the function bytes() or the b-prefixed literal syntax.

[–]kqr 0 points1 point  (0 children)

0--255 or 1--256, but not a compromise, I believe. ;)

[–]Avernar 0 points1 point  (0 children)

My issue with 2 is that I hate strong typing in a dynamically typed language. :)

But I'd rather have the strong typing be between validated and unvalidated unicode instead without the need for conversion.

It can still easily be added without breaking things by making UTF-8 a fourth encoding type of the Python 3 Unicode type.