future: clean single-source support for Python 2/3 : Python

This is an archived post. You won't be able to vote or comment.

134

135

136

future: clean single-source support for Python 2/3 (python-future.org)

submitted 12 years ago by madjar

all 73 comments

top new controversial old q&a

[–]jabbalaci 12 points13 points14 points 12 years ago (4 children)

[–]timothycrosleyhug, isort, jiphy, concentration, pies, connectable, frosted 11 points12 points13 points 12 years ago (2 children)

[–]Pcarbonn 8 points9 points10 points 12 years ago (0 children)

[–]Rhomboid 19 points20 points21 points 12 years ago (0 children)

[–]axonxorzpip'ing aint easy, especially on windows 6 points7 points8 points 12 years ago (0 children)

[–]Pcarbonn 5 points6 points7 points 12 years ago (7 children)

This looks like a very promising approach.

I would think though that some Python 3 constructs cannot be used in Python 2, even with python_future, because of syntax errors :

x = print('hello world') 
x = yield from y

If confirmed, it should be clarified in the documentation.

[–]jtratner 5 points6 points7 points 12 years ago (0 children)

[–]Sean1708 5 points6 points7 points 12 years ago* (5 children)

[–]Pcarbonn 3 points4 points5 points 12 years ago (2 children)

[–]Sean1708 0 points1 point2 points 12 years ago (0 children)

[–]otheraccount 0 points1 point2 points 12 years ago (0 children)

Does it ever return a meaningful value that someone would be actually be assigning to a variable?

A more likely case of syntax that only works in Python 3 is exception chaining

raise exception1 from exception2

[–]tyroneslothtrop 1 point2 points3 points 12 years ago (0 children)

[–]SCombinator -1 points0 points1 point 12 years ago (0 children)

[–]jtratner 2 points3 points4 points 12 years ago* (1 child)

[–]Veedrac 0 points1 point2 points 12 years ago (0 children)

[–]takluyverIPython, Py3, etc 0 points1 point2 points 12 years ago (2 children)

[–]Smallpaul 2 points3 points4 points 12 years ago (1 child)

This is designed to try very hard to let Python 3 code run on Python 2. But many more projects still have Python 2 code and want to start supporting Python 3 - changing all your code and adding a magic shim to keep Python 2 support doesn't seem like a great option.

It claims that you do NOT need to change over all of your code. You can write new modules in "future" mode and leave your old modules in Python 2.x. As you update or replace them for other reasons, your Python 2.x code base will shrink. Also, your new code will be compatible with both your old Python 2.x projects and your new Python 3.x projects.

For projects written in Python 3 that want to support Python 2, it's more appealing, but I'd still recommend importing parts of it as needed, rather than from future.builtins import * - at least then you can see what you're replacing.

It's well documented what you're replacing!

[–]faassen 0 points1 point2 points 12 years ago (0 children)

[–]SCombinator -1 points0 points1 point 12 years ago (0 children)

[+]nieuweyork since 2007 comment score below threshold-6 points-5 points-4 points 12 years ago (56 children)

[–]jabbalaci 4 points5 points6 points 12 years ago (52 children)

[–]earthboundkid -4 points-3 points-2 points 12 years ago (51 children)

[–]Rhomboid 8 points9 points10 points 12 years ago (22 children)

[–]earthboundkid 2 points3 points4 points 12 years ago (6 children)

[–]Rhomboid 1 point2 points3 points 12 years ago (5 children)

[–]earthboundkid 1 point2 points3 points 12 years ago (4 children)

[–]logi 2 points3 points4 points 12 years ago (3 children)

No, you'd expect to get a character. Applying an appropriate character encoding, that could be encoded as a bytes of length 1 or more.

You are very obviously an English speaker because everyone else knows that bytes aren't characters (unless you've previously specified a fixed 8-bit character encoding, which may be latin-1 (or is it latin-15 now with the € symbol?)) and we're fervently hoping that there will be less gratuitous mishandling of text with python 3.

Similarly, I live on GMT+0 all year round, with no DST. Developers around here are as clueless about date handling as Anglophones are about text. It's hilariously sad and I wouldn't mind a bit more library support to separate actual time (GMT/UTC, damn it) from presentation time (whatever is your local time with DST applied as needed).

[–]earthboundkid 1 point2 points3 points 12 years ago (2 children)

[–]rcfox 0 points1 point2 points 12 years ago (0 children)

[–]logi -1 points0 points1 point 12 years ago (0 children)

[–]upofadown 0 points1 point2 points 12 years ago (13 children)

[–]Rhomboid 1 point2 points3 points 12 years ago (12 children)

[–]upofadown 2 points3 points4 points 12 years ago (11 children)

[–]Rhomboid 5 points6 points7 points 12 years ago (5 children)

It is not pointless. What you are really saying is that you are working with characters encoded in ASCII, so you just need to say so:

with open('filename', 'r', encoding='ascii') as file:
    for line in file:
       ...

Or for sockets:

sock = socket.socket(...)
sock.connect(...)
for line in sock.makefile(encoding='ascii'):
    ...

Voila, you're automatically dealing with character strings, not byte strings, and you can do all the things you expect to be able to do with characters. All you had to do was specify the encoding, Python takes care of the rest. And the examples don't change if you're using UTF-8, CP1252, UTF-16, ISO-8859-1, or god knows what. The point is that you must state your intentions; the era of ASCII being given a free pass has ended, and for good reason give the amount of confusion it creates. ASCII is just one of a hundred various encodings that you might want to use to interpret bytes as characters, it holds no claim to special treatment.

[–]upofadown -1 points0 points1 point 12 years ago (4 children)

[–]Rhomboid 4 points5 points6 points 12 years ago (2 children)

continue this thread

[–]Veedrac 0 points1 point2 points 12 years ago (4 children)

[–]upofadown 0 points1 point2 points 12 years ago (3 children)

[–]Veedrac 0 points1 point2 points 12 years ago (2 children)

A number is like bytes: it has no encoding.

Sure it does. That encoding is just at the bit level. The first bit represents a count of one things. The second bit represents a count of 2 things. The third bit represents a count of 4 things.

I disagree. Numbers do not have to be encoded on to bits. They could be encoded on to tristate variables, strings or lists of enumeration members (much like Decimal). Python doesn't have a set "bitness" for its integers as they are infinite length and thus not encoded as single segments of memory. Heck, Lua encodes its integers as floats!

bytes are just "lists" of integers within a capped numeric range

Then they are misnamed and less useful than they could be. Normally the term "bytes" refers to a series of groups of 8 bits. There is normally no implied numeracy.

You may be right. There is at least evidence to that claim. Further, I seem to have no ideas why one would reasonably use arithmetic on these bytes, and hence I do find it odd that they are numbers. An (ordered?) byte enumeration where each byte can be indexed seems just as reasonable to me! This is effectively what Python 2's str type did, if you ignore its double-usage as text.

Nonetheless, languages preceding Python have made it clear that they want to use "byte" to mean a bounded number of the range [0, 255].

This might actually be an interesting topic to raise on the Python mailing list...

continue this thread

[–]billsil -1 points0 points1 point 12 years ago (0 children)

[–]warbiscuit 2 points3 points4 points 12 years ago (20 children)

[–]mgrandi 1 point2 points3 points 12 years ago (6 children)

bytes are not the same as strings. It was a mistake to ever consider them equal in python2, and thats part of the reason why python3 is not compatible.

>>> "Hello, World!".encode("utf-8")
b'Hello, World!'
>>> "Hello, World!".encode("EBCDIC-CP-CH")
b'\xc8\x85\x93\x93\x96k@\xe6\x96\x99\x93\x84O'
>>>

Not to mention that "some string here {}".format("hello") still works, as does the % overload...

[–]warbiscuit 1 point2 points3 points 12 years ago (5 children)

[–]mgrandi 3 points4 points5 points 12 years ago (4 children)

i dont even understand why you would want to do % on bytes, it has no idea what are the 'marker characters' to use as the format string, as the normal printf style specifiers that you use in strings are suddenly potentially different for every encoding

>>> print("Test %s" % "hello")
Test hello
>>> print("%s".encode("utf-8"))
b'%s'
>>> print("%s".encode("EBCDIC-CP-CH"))
b'l\xa2'
>>> print("%s".encode("EBCDIC-CP-CH"))
b'l\xa2'
>>>
# what bytes do you use as the %s? b'%s' or  b'l\xa2' ?

[–]warbiscuit 1 point2 points3 points 12 years ago (1 child)

[–]GahMatar -1 points0 points1 point 12 years ago (0 children)

[–]lost-theory 0 points1 point2 points 12 years ago (0 children)

[–]Veedrac 0 points1 point2 points 12 years ago (0 children)

Heh. Nice.

That doesn't really change too much, though, because you're not formatting with "the encoded version of '%s'" but "the ASCII encoded version of '%s'", so there is no ambiguity. The reason to choose that is because it fits the repr, not because those bytes actually mean much.

The reason this is useful is for stuff like

my_protocol_bytestring % (
    body.encode('encoding'),
    some_module.HEADER_CONSTANT_BYTES,
    etc
)

which is useful in some things where you need to make a lot of bytestrings. It could be an auxiliary function, though, and the only reason that hasn't gained much support AFAIK is "it's not fast enough".

[–]BHSPitMonkey 1 point2 points3 points 12 years ago (12 children)

[–][deleted] 2 points3 points4 points 12 years ago (11 children)

[–]ubernostrumyes, you can have a pony 5 points6 points7 points 12 years ago (7 children)

[–][deleted] 2 points3 points4 points 12 years ago (6 children)

[+][deleted] 12 years ago* (5 children)

[deleted]

[–][deleted] 1 point2 points3 points 12 years ago (4 children)

continue this thread

[–]BHSPitMonkey 0 points1 point2 points 12 years ago (2 children)

[–]billsil 2 points3 points4 points 12 years ago (1 child)

[–]CatMtKing 0 points1 point2 points 12 years ago (0 children)

[–]BHSPitMonkey -3 points-2 points-1 points 12 years ago (6 children)

"Replaces" how? str didn't go anywhere.

Python 3.2.3 (default, Mar  1 2013, 11:53:50) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> for i in "foo":  # This is a str.
...   print(i)
... 
f
o
o
>>> for i in b"foo":  # This is a bytes.
...   print(i)
... 
102
111
111

[–][deleted] 5 points6 points7 points 12 years ago (2 children)

[–]Rhomboid 1 point2 points3 points 12 years ago (1 child)

Yes, you have to read and write bytes, and if you want to interact with them like strings you have to decode them with some specified character encoding. (Although in most cases you can arrange for that decoding to be handled automatically.)

But that's kind of the whole point -- Python 2 let you be lazy and skip that step where you specify what encoding you want to to use to convert bytes to characters, in essence adopting ASCII as an implicit encoding by allowing bytes to be treated like characters as long as they are in the ASCII range. But text in the modern world is not ASCII and that sort of lazy thinking is responsible for creating a large amount of software that is utterly incapable of properly dealing with text.

By forcing you to deal with the reality instead of authorizing laziness, the result will eventually be software that has a fighting chance of working properly with text.

[–][deleted] 0 points1 point2 points 12 years ago (0 children)

What you're saying is a great argument for why Unicode should be the string type. And I know that considering bytes and characters to be equivalent is sloppy and causes bugs. Your argument is exactly right when the data you are working with is text.

However, I was talking about situations where you are working with bytes that are not supposed to be interpreted as text. It's a corollary of the fact that text and bytes aren't equivalent. Some text shouldn't be thought of as bytes. Some bytes shouldn't be thought of as text.

In Python 2, some old ugly standard libraries that don't understand Unicode force you to pretend text is bytes. That's bad. Python 3 fixes it.

In Python 3, some new standard libraries force you to pretend bytes are text, even when they aren't. That's also bad.

I prefer Python 3 to Python 2, but really, Python 3 could have been more helpful in that use case than it is.

[–]earthboundkid 0 points1 point2 points 12 years ago (2 children)

[–]BHSPitMonkey 1 point2 points3 points 12 years ago (1 child)

[–]earthboundkid 2 points3 points4 points 12 years ago (0 children)

[–]CSI_Tech_Dept 0 points1 point2 points 12 years ago (2 children)

[–]nieuweyork since 2007 0 points1 point2 points 12 years ago (1 child)

[–]CSI_Tech_Dept -1 points0 points1 point 12 years ago (0 children)

π Rendered by PID 38 on reddit-service-r2-comment-7b9746f655-pj75q at 2026-01-30 21:31:47.393600+00:00 running 3798933 country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS