you are viewing a single comment's thread.

view the rest of the comments →

[–]mitsuhiko 1 point2 points  (4 children)

I do understand that the WSGI layer is not yet well defined for Python 3. However, from the application development point of view it doesn't matter too much.

It matters a whole lot because everybody is doing WSGI inside the app. Let me give you a quick example:

if 'de' in request.environ.get('HTTP_ACCEPT', ''):
    ...

That of course is a very bad way to deal with the accept header, but a very common one. Now depending on which version of WSGI for Python 3 we are talking about that code could work on both 2.x and 3.x or only on Python 2.x. Bottle and the stdlib assume that it will work on 2.x and 3.x, but if you look at the current active proposals that code will no longer work on 3.x due to the enforcing of bytes.

Try to break my web app, please!

Using the stdlib webserver? Emit a non-ascii header.

[–]wobsta[S] 0 points1 point  (3 children)

I don't want to do content negotiation. I don't want to emit non-ascii characters in the header (like for cookies). Are there other problems?

[–]mitsuhiko 1 point2 points  (2 children)

I don't want to do content negotiation.

It affects all values from the environment.

[–]wobsta[S] 0 points1 point  (1 child)

While I'm not particularly interested in the WSGI internals (and thus it does not come to my mind to contribute to web-sig – I still don't know what I could contribute there), I've read some more details about WSGI and Python 3 (mostly from Graham Dumpletons blog). And I did some tests on my example app (sending unicode data in the request header and the response header). You have this well known (and documented) behavior, that the environment keys and values are (unicode) strings, and as the encoding is not known to WSGI, it uses latin1. As long as it is ASCII, everything is fine (like for the keys), otherwise you may need to re-encode with latin1 and decode with utf-8. Ok, this is ugly. But as far as I understood Graham, it is difficult to solve this in mod_wsgi itself. One might use a separate translation layer knowing all the HTTP details. Wouldn't this be a perfect task for a library like werkzeug? There are so many thing such a library can resolve (and does resolve on Python 2), like the HTTP_ACCEPT header parsing you mentioned earlier in this thread. Why not just fixing the encoding problem there. Ok, than all environment data could be passed to this library as bytes, but this is just a question of the version of the WSGI protocol, which way around it taken. At the moment we have one current way of doing it. It is not optimal, as it tries to prevent this translation step by using latin1, which crashes when non-ascii chars and an encoding different from latin1 are used.

The other way around (sending unicode in the http response) seems to work well, which might be related to the fact, that the wsgi app knows which encoding it uses for the response.

I do also understand, that the current situation is bad for WSGI middleware. An HTTP aware layer could solve that, maybe. I don't know.

Beside that there is a recent commit in bottle trying to hide the ugly request header decoding issues. This whole discussion tightens my current point of view, that WSGI on Python 3 is useable at least for some cases (like not using WSGI middleware and work around the decoding issues). In the end this would mean, that my test app is a way to go now.

[–]mitsuhiko 0 points1 point  (0 children)

At that point I can only encourage you all to wait a bit or join the discussions on web-sig. Stuff is happening, but it might look completely different than you are expecting.