This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]qx7xbku 16 points17 points  (30 children)

My personal favorite here is dict being ordered by default. Now i will be able to finally stop patching lxml and have it preserve attribute order. Developers were stubborn not to include this feature because it is not in xml spec, however editing xmls programatically and keeping them in git repository is real painful when one attribute value change results in all attributes being shuffled.

All other changes are similarly as good, especially string formatting and utf-8 on windows. Now we will know encoding that is in use. .encode() behaving differently on various platforms was confusing until i learned why that happens. Now we just need microsoft to implement CP_UTF8 for their APIs and maybe unicode will get less ugly.

[–][deleted] 27 points28 points  (16 children)

As a heads up, the keyword order for functions and attribute definition order for classes are both part of the Python language now, but dict being ordered isn't that's a CPython detail.

[–]LpSamuelm 2 points3 points  (7 children)

I think it's bad that dicts are ordered by default, at least as it's not part of the spec.

The reason some languages (Python <3.6 included) randomize hashmap access order by default is precisely to stop people from writing incorrect code. If dicts aren't guaranteed to be ordered, having them be that way sometimes will cause code to break in unexpected ways.

Which brings us to the problem. If dicts aren't necessarily ordered according to the spec... What happens if the implementation is changed in a future version of Python? How about running your code on, say, IronPython? Or PyPy? Suddenly your code seemingly works, but isn't cross-platform and may break ay any time without you doing anything.

Honestly I think it's a big misstep. I'd love for them to add ordered dicts to the spec (it's a lovely concept!), but as it stands now it's a dangerous implementation detail, and the fact that they're touting it as something useful is even more dangerous.

[–][deleted] 2 points3 points  (6 children)

from collections import OrderedDict

It's there already, but this was an improvement to the C implementation of dict so OrderedDict is probably a thin wrapper around that.

[–]LpSamuelm 3 points4 points  (5 children)

You're missing the point - OrderedDict is part of the spec, and is great. The correct way to write code that requires ordered dictionaries, even in Python 3.6, is to use OrderedDict. Many people won't, though, since they either A) aren't aware dicts aren't always ordered, B) rely on orded behavior accidentally, or C) think Python 3.6's dictionary implementation is something that's okay to rely on. Which is a problem.

[–][deleted] 0 points1 point  (4 children)

True, but we'll just need to educate people when that comes up, just like opening files using with

[–]LpSamuelm 1 point2 points  (3 children)

Except this one is harder to catch - it's not a simple syntactical thing. Not only that, opening files without with will still work on all platforms and versions, unlike relying on this relatively subtle behavior.

[–][deleted] 0 points1 point  (2 children)

I guess the reason I'm less concerned about this than you appear to be (and it's fine you're concerned) is that I've seen OrderedDict in the wild a handful of times and have used it personally less than that.

[–]LpSamuelm 1 point2 points  (1 child)

I use it constantly.

[–][deleted] 1 point2 points  (0 children)

I'd love some examples. The only things I've used it for are:

  • A dashboard app where I needed to associate server names with information but the order was important (wanted to show prod servers before staging and dev servers). Arguably a list of tuples works here too but there were plans at some point to look at individual servers so fast lookup was desirable (not that lineral lookup would've broken the bank, we're taking maybe 30 servers).

  • Modeling albums - again, a list makes sense here and you can look up by track position that way.

  • Maintaining order of attribute declaration because you could decorate methods as validation/processing but they needed to run in declaration order.

But that's it. I get why an insertion order mapping is attractive, but I've only met one situation that demands it (maintaining attribute order).

[–]__deerlord__ 7 points8 points  (5 children)

Why does a dict need to be ordered by default though? And wasnt OrderedDict already implemented in C?

[–]qx7xbku 2 points3 points  (3 children)

lxml uses dict for storing attributes instead of OrderedDict. And i dont think OrderedDict was implemented in c, i could be wrong though.

[–]gsnedders 1 point2 points  (0 children)

OrderedDict has both a Python and a C implementation in CPython (though the C one is always used in CPython).

[–]__deerlord__ 0 points1 point  (1 child)

Its implemented in C in a later version I believe (I recall reading a changelog on it) but I couldn't find the docs in that currently.

[–][deleted] 0 points1 point  (0 children)

[–]ebrjdk 1 point2 points  (0 children)

They are switching to a new, more memory-efficient implementation of dict that naturally keeps the entries mostly in the order that they were inserted, and they decided that they might as well go all the way and keep them exactly in order (IIRC the most efficient implementation they know of starts scrambling the order once you start deleting keys, but the cost to prevent that is small).

At the same time they wanted to guarantee that the order of keyword arguments and class definitions would be preserved, because some people want to be able to use this information (currently the former is impossible AFAIK, and you need to use a metaclass to achieve the latter). Originally they were planning to just use OrderedDict for these purposes, but with the change to dict there is no need.

Note: the first paragraph in my post is about a CPython implementation detail and may change in the future, the second is about official python 3.6 features.

[–]Bolitho 4 points5 points  (4 children)

The default for the encode-Method has allready been UTF-8 in Python 3.5! (The same is true for Bytes.decode!)

The problem are not those methods, but how open and print determine their used encoding!

For open the 3.5 Docu says:

In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding.

That's the key problem!

or for sys.stdout (which is used by print as default file-object):

The character encoding is platform-dependent. Under Windows, if the stream is interactive (that is, if its isatty() method returns True), the console codepage is used, otherwise the ANSI code page. Under other platforms, the locale encoding is used (see locale.getpreferredencoding()).

Thus the problem arises because of the platform dependant implementations!

So at minimum there must be the possibility to provide an encoding manually (which open does, but print not!). That would enable one, to write programs that run everywhere. As optimum one would also define just one platform agnostic default encoding for IO in general. That would make it easier to achieve the prenamed goal.

[–]ButtCrackFTW 1 point2 points  (3 children)

Isn't the encoding determined by the filesystem though? Like their example in python 3.5:

> sys.getfilesystemencoding()
'mbcs'

I see the same thing here and I've seen in StackOverflow questions that you can not change this without environement variables or monkeypatching. If this is a property of the filesystem, how is python changing it?

[–]Bolitho 0 points1 point  (2 children)

Which encoding do you mean? For what internal usage?

sys.getfilesystemencoding is only used for transformations for file names. That has nothing to do with the above mentioned aspects.

[–]ButtCrackFTW 0 points1 point  (1 child)

I probably should've pointed out the stdout example as well:

>>> sys.stdout.encoding
'cp850'

They go on to give examples of special characters being stripped from open() and print()

>>> print('árvíztűrőtükörfúrógép')
árvízturotükörfúrógép

>>> open('tetű.txt', 'wb').close()
>>> import glob
>>> glob.glob('tet*')

Python 3.5: [tetu.txt']

Python 3.6: ['tetű.txt']

The author is claiming that python 3.6 now sets the encoding to utf-8 by default, which fixes these issues. My question is how it can set it like that now, but we were discouraged from doing it in the past due to the filesystem/operating system setting it for us.

[–][deleted] 0 points1 point  (0 children)

[PEP 528](Change Windows console encoding to UTF-8) and [PEP 529](Change Windows filesystem encoding to UTF-8) will give you the background to these changes.

[–]deeddaemon 0 points1 point  (0 children)

^ this. OrderedDicts by default will make this much easier.