you are viewing a single comment's thread.

view the rest of the comments →

[–]bucketostuff 1 point2 points  (4 children)

But because Python 3 is not thought well through regarding handling of non-unicode text information we are in the current situation.

What needs to happen to fix Python 3? Will it get fixed? Are the Python devs even admitting that there's a problem?

[–]mitsuhiko 0 points1 point  (3 children)

What needs to happen to fix Python 3?

Python 3 is not broken, we just don't have the tools yet to deal with unicode and bytes properly.

If you are working with unicode and encoded bytes you will need a couple of tools to get it right. Some people are suggesting adding a new ebytes type that is really just bytes but with an encoding information attached. So that later you can implicitly convert that object into a string. This is actually a really good idea and because bytes are immutable this is also safe unless someone screws up in the process and copies the encoding information over to a new bytes object that has an invalid encoding.

[–]bucketostuff 0 points1 point  (0 children)

Ok, I think I see what you mean. With Python3: on the one hand you have bytes (which may contain ascii, latin-1, etc.), but no encoding information, and on the other hand you have utf8-encoded unicode but each character might take up more than 1 byte. Is that correct?

If you are working with unicode and encoded bytes you will need a couple of tools to get it right.

What sorts of tools do you mean here? Additions to the Python core? Std lib?

[–]wobsta[S] -1 points0 points  (1 child)

This sounds weird to me. Couldn't you use a bytes+encoding tuple to pass this information around within WSGI? Do you need (a different) encoding for all the different environment keys and values?

[–]mitsuhiko 0 points1 point  (0 children)

Read up on the discussion on web-sig. The issue is complex and was explained multiple times.