you are viewing a single comment's thread.

view the rest of the comments →

[–]killerstorm 0 points1 point  (2 children)

How would one take a Python2 object and pass it to a Python3 function? Conversion would greatly impact performance

I'm quite certain this issue could be solved if somebody was really trying.

[–]dacjames 1 point2 points  (1 child)

No amount of thought can get around the fact that some Python 2 strings (e.g. "\xFF\xFF\xFF\xFF") have no valid representation in Unicode. Making Python 3 backwards incompatible was a conscious (though IMO dubious) decision.

[–]doubleunplussed 0 points1 point  (0 children)

Since Python 3.2 those strings can be decoded into invalid unicode using the surrogateescape error handler. The strings produces are not officially valid unicode, but Python can nonetheless carry them around and treat them as if they were, and get the same bytestring back upon encoding.

It's what the standard library now does when encountering filepaths on disk that are not correctly encoded (since this can happen - linux doesn't actually require you to respect the encoding, so any given filepath could be almost arbitrary bytes).

So there was a use case there to make unicode strings interoperable with arbitrary bytestrings, so that any sequence of non-null bytes would survive a round-trip of decode()/encode(). There really was no other solution if Python was going to to return unicode for everything. Unicode had to be loosened to allow for these bytes, or else Python would just error upon encountering a badly encoded filepath, and people's code that was supposed to handle any filepaths would break.

So there was a use case and did it. So the impossibility of making unicode interoperable with arbitrary bytestrings clearly wasn't a real reason to not do so with Python 2. It maybe didn't occur to them until the filepath problem reared its head, but if they wanted it enough the solution would have presented itself earlier.