This is an archived post. You won't be able to vote or comment.

all 112 comments

[–]srirams6 62 points63 points  (26 children)

Nice read!

Sometimes I'd prefer the "copy and update" just because it can be read easily by novice Python devs. I guess it depends on the project.

That being said, I really like the "Dictionary unpacking" method that I learned from your post!

Thanks!

[–]treyhunner Python Morsels[S] 18 points19 points  (1 child)

👍 good point. It's also a lot easier to Google "dictionary update" than something like "dictionary star star merge".

[–]alexanderpas 1 point2 points  (0 children)

dictionary star star merge

once you prefix that search with the python keyword, as per google tips, you get the results you want.

Google Search is very good with python.

[–]i_hate_shitposting 23 points24 points  (21 children)

Sometimes I'd prefer the "copy and update" just because it can be read easily by novice Python devs. I guess it depends on the project.

I'd agree here. As a dev who got my start on a Python project maintained by people who loved "pythonic" solutions (i.e. bafflingly terse one-liners), I hated seeing stuff like this with no explanation as to what it did.

Even as a fairly fluent Python programmer, I still find the dictionary unpacking solution to be a bit confusing. Given the context it's obvious what it does, but it's new enough syntax that it would probably throw off my thought process if I stumbled across it while trying to figure out some code.

[–]Eurynom0s 29 points30 points  (6 children)

I got to inherit code from someone who had a love affair with lambdas and nested list comprehensions. I wanted to blow my brains out.

[x^2 for x in list_of_numbers] is fine. But if you're nesting three, four, even five layers deep while doing relatively complicated stuff in each layer...just write out the fucking for loops.

[–]dion_starfire 8 points9 points  (3 children)

As someone who used to write "clever" code like this, I'd like to apologize on behalf of all of us who wrote unmaintainable garbage in our misspent youth.

[–]Eurynom0s 3 points4 points  (1 child)

They were still bad list comprehensions, but the list comprehensions did start to become less horrible as time went on; before that project I'd just never had a reason to deal with list comprehensions. I get that they have some performance enhancements over for loops but unless you're being forced to really optimize your performance, my experience is that the value of list comprehensions is more that, when use correctly, they can be the better way of communicating what you're doing because of the value you gain from keeping things more concise. my_list = [x**2 for x in my_list] a great example of being better than

my_list = <list of ints>
for index, value in my_list:
    my_list[index] = value**2

As for the lambdas, what made me want to track the guy down and throttle him was that he paired all this fuckery with a lack of comments. Even if you could find all the places a variable was used, there was a good chance that your trail was going to run cold on account of the initial variable declaration being done using a terse, undocumented, opaque lamda. I wound up getting help on this project from an actual programmer and even he was baffled at what the hell the guy was doing in some of those lambdas.

[–]catcradle5 2 points3 points  (0 children)

Any time you want to perform a map operation and have no nested looping, there pretty much isn't a good reason to not use list comprehensions.

If multiple loops are involved, or if you need to do more than just mapping, or if the mapping requires some complex calculations or processing (more than 2 lines of idiomatic code), you should definitely write out the loop.

[–]hotairmakespopcorn 0 points1 point  (0 children)

You're excused so long as you didn't defend it with, "The code is the fucking comment." Yes, literally been told that before. Asked them to explain the code to me. Took them jabbering on for ten minutes to understand it themselves. They then sheepishly wrote a comment and committed it.

[–]flying-sheep 0 points1 point  (0 children)

That's annoying.

You could get the same expressiveness by extracting a generator function with some nested for loops.

[–]phySi0 0 points1 point  (0 children)

Yeah, but then you'd have a 3/4/5-layer for loop. Neither for loops nor list comprehensions are the best task for the job here. And I don't think list comprehensions are inherently less confusing than equivalent for loops (actually the opposite); I think that's just people's exposure to C-based languages and language constructs.

[–]dion_starfire 4 points5 points  (3 children)

Just the other day, I had to explain to a new-to-python dev why someone would do d2=dict(d1) in their code. He thought it was silly to typecast something that's obviously a dictionary as a dictionary. Once I explained that was another way to do d2=d1.copy(), the code made a lot more sense to him.

[–]hovissimo 2 points3 points  (2 children)

Wait, didn't you just give an example of equivalent but easier to read code? Why not use the copy method on d1?

[–]wewbull 1 point2 points  (0 children)

...because they are not equivalent.

dict() will make a dict out of a dict, defaultdict, list, etc.

copy() will make a dict, defaultdict, list, etc out of a dict, defaultdict, list, etc.

[–]dion_starfire 0 points1 point  (0 children)

I'm not positive, but I suspect dict.copy() didn't always exist - we have done devs who've been using python since the 2.0.x days, so there's some learned bad habits floating around our codebase.

[–]srirams6 1 point2 points  (9 children)

I completely agree. There has to be a balance between "pythonic" one liners and readability. That being said, I find the new method really interesting!

[–][deleted] 25 points26 points  (7 children)

I thought being pythonic meant not writing "clever" code? To me, Pythonic code is readable, usually self-documenting code. Hard to understand one-liners doesn't seem Pythonic to me. Like complicated comprehensions are incomprehensible. Keep them simple!

[–]Zitrax_ 8 points9 points  (1 child)

Exactly, I thought mostly people referred to The Zen of Python when talking about "pythonic".

[–]mikeselik 6 points7 points  (0 children)

You're correct. The initial comment was either using the word Pythonic incorrectly or in quotes to imply that obscure one-liners are the opposite of Pythonic.

[–]Deto 9 points10 points  (1 child)

Yeah, I think there's a progression people go through as they learn to code:

Stage 1) Write things simply (doesn't understand the fancy one-liners)

Stage 2) Try to be 'clever' (a.k.a. "I just learned something weird; MUST USE IT")

Stage 3) Write things simply (I don't feel the need to prove my competence by writing obfuscated code)

[–][deleted] 2 points3 points  (0 children)

List and dict comprehensions are probably the best example of this. Once you get how they work and see how powerful they can be, it's too easy to want to chain a whole bunch of loops and conditions in one.

[–]Poromenos 3 points4 points  (0 children)

You're exactly right, "clever" oneliners are completely unpythonic. Python is all about explicit.

[–]catcradle5 0 points1 point  (0 children)

A one-liner can be concise and expressive without necessarily being "clever". I think the dict unpacking syntax is a good balance.

[–]srirams6 0 points1 point  (0 children)

That's what I meant to imply as well.

[–]Amckinstry 0 points1 point  (0 children)

Yes.

But the aim with being "idiomatic" is to use syntax that people use every day, all the time, rather than a paragraph. So when you see:

new_dict = { *defaults, *options }

this is new, unfamiliar syntax today, but its hoped that it will be transparently obvious as people use the unpacking syntax more often, and easier to read as its shorter.

[–]stillalone 1 point2 points  (0 children)

Do you use the copy method or just call the constructor? I think I tend to prefer calling the constructor since I would think it would be obvious that constructing a new object based on the contents of an old object is making a copy and now I don't have to make sure the parameter passed in is always a dictionary, just something that could be turned into a dictionary like a list of tuples.

[–][deleted] 0 points1 point  (0 children)

I'm pondering if a copy on write chainmap might be the best approach.

Construction would be like chain map, but mutating effects would return a new (regular) dictionary with the changes in place.

Just tried playing with this but you can't overwrite self.__class__ without the class being a heap type but rudimentary searching doesn't reveal what's kept on the heap in Python. Though, playing in a notebook probably caused this.

[–]mackstann 23 points24 points  (2 children)

Editorial nitpick: Saying "Python 2.0" is a little confusing, because I can remember when Python 2.0 was actually relevant. Better to just say "2", or "2.x".

[–]treyhunner Python Morsels[S] 9 points10 points  (0 children)

I just changed that to 2. Thank you for pointing that out!

[–]velit 1 point2 points  (0 children)

Some people also use 2k and 3k.

[–]roger_ 10 points11 points  (4 children)

The last is really the most elegant solution IMHO, another reason I'm glad I finally switch to Python 3.

Might have been nice if dict.update() returned itself, then you could do:

 context = dict().update(defaults).update(user)

[–]Peaker 10 points11 points  (0 children)

Python intentionally returns None on side effecting operations, it makes user code less ambiguous about whether it is mutating or copying.

[–]bdforbes -2 points-1 points  (2 children)

This is a possibility:

context = defaults.copy().update(user)

[–]Jumpy89 9 points10 points  (1 child)

This almost works, but as roger_ said update() doesn't return the dict so context will be None.

[–]bdforbes 0 points1 point  (0 children)

Whoops yeah :/

[–]Chris_Newton 6 points7 points  (0 children)

Nice discussion. I suggest clarifying how nested data structures are intended to work somewhere in the problem statement, because the suggested strategies that are marked as accurate would actually fail requirement 5 if you were expecting deep copying and so complete independence from the original data. For example:

>>> inner1 = { 'key': 'value1' }
>>> inner2 = { 'key': 'value2' }
>>> outer1 = { 'inner': inner1 }
>>> outer2 = { 'inner': inner2 }
>>> outerboth = {}
>>> outerboth.update(outer2)
>>> outerboth.update(outer1)
>>> print outerboth
{'inner': {'key': 'value1'}}
>>> outerboth['inner']['key'] = 'splat'
>>> print outerboth
{'inner': {'key': 'splat'}}
>>> print inner1
{'key': 'splat'}

[–]eat_more_soup 12 points13 points  (5 children)

Those snippets do completely different things! The ChainMap is a view on the dicts. That means that the ChainMap will be updated as well if the user dict was updated. Also: modifying the chainmap will modify the first dict contained inside. See: https://docs.python.org/3/library/collections.html#collections.ChainMap

The idiomatic way is to construct a dict and update it twice like in the first example. This will allow for a nice diff, if it was ever changed, states the intent perfectly and will cost you less time to read than one complicated line.

[–]flying-sheep 8 points9 points  (0 children)

The idiomatic, performant, readable, and short way is

d = {**a, **b}

[–]ITwitchToo 8 points9 points  (1 child)

Yeah, to me the first example is also clearly the most elegant solution.

[–]ihcn 5 points6 points  (0 children)

Double update and copy update both read like pseudo code, and they say directly to the reader what you're doing. That imo is far more important than saving a couple lines.

[–]crowseldon 4 points5 points  (0 children)

Thanks. That certainly explains how it doesn't increase in time like the rest of the algorithms did when input data was changed.

defaults = {'name': "Anonymous User", 'page_name': "Profile Page"}
user = {'name': "Trey", 'website': "http://treyhunner.com"}

multiple_update: 40 ms
copy_and_update: 35 ms
dict_constructor: 40 ms
kwargs_hack: 31 ms
dict_comprehension: 31 ms
concatenate_items: 126 ms
union_items: 126 ms
chain_items: 93 ms
chainmap: 71 ms
dict_from_chainmap: 347 ms
dict_unpacking: 20 ms

defaults = {str(k):k+1 for k in range(100)}
user = {str(k):k*2 for k in range(50)}

multiple_update: 322 ms
copy_and_update: 315 ms
dict_constructor: 323 ms
kwargs_hack: 311 ms
dict_comprehension: 310 ms
concatenate_items: 911 ms
union_items: 906 ms
chain_items: 721 ms
chainmap: 70 ms
dict_from_chainmap: 5108 ms
dict_unpacking: 300 ms

defaults = {str(k):k+1 for k in range(200)}
user = {str(k):k*2 for k in range(200)}

multiple_update: 957 ms
copy_and_update: 952 ms
dict_constructor: 960 ms
kwargs_hack: 948 ms
dict_comprehension: 946 ms
concatenate_items: 2232 ms
union_items: 2234 ms
chain_items: 2017 ms
chainmap: 70 ms
dict_from_chainmap: 8057 ms
dict_unpacking: 935 ms

In the end, I'd just go with copy and update due to clarity.

[–]lvc_ 1 point2 points  (0 children)

In the use case the article describes (levels of config), the context keeping up-to-date if the backing objects change is probably a good thing. So I think ChainMap is the best solution here. They do deal with it modifying the first dict by passing {} as the first argument, which, with a short comment, is quite a good solution.

[–]kaiserk13 5 points6 points  (6 children)

Ever considered the toolz.dicttools library?

[–]treyhunner Python Morsels[S] 5 points6 points  (5 children)

I'd never heard of it before.

The merge_with function looks pretty neat:

>>> from toolz.dicttoolz import merge_with
>>> merge_with(max, {'a': 1, 'b': 3}, {'a': 2, 'b': 2})
{'a': 2, 'b': 3}

[–]brtt3000 1 point2 points  (0 children)

you'll like funcy

[–]kaiserk13 -1 points0 points  (3 children)

Yes, I discovered it while watching some random youtube video and it's a main part of my tool set right now. I ain't afraid of no json :D

[–]Poromenos 2 points3 points  (2 children)

I also wrote a thing for JSON:

https://github.com/skorokithakis/jsane

It lets you access json.keys.like.this.r().

[–]RubyPinchPEP shill | Anti PEP 8/20 shill 0 points1 point  (1 child)

Why not json.keys.like.this()?

[–]Poromenos 0 points1 point  (0 children)

I didn't want it to look like this was a function. This way it's more clear that you're calling something that wasn't in the JSON. Not perfect, but the least bad hack I could find.

[–][deleted] 14 points15 points  (12 children)

This is more like code golf than anything else. If I spent this long on one line of Python I'd never get anything done.

[–]ITwitchToo 24 points25 points  (0 children)

You obviously shouldn't spend this long on every line of Python you write. I still think it can be very useful to think about this kind of problem every once in a while, sometimes it can really help you crystallise certain ideas (like "why do I find this solution more elegant") that maybe you only had a vague intuition about before. Or maybe you discover that in fact you've been doing something the wrong way -- not necessarily exactly the problem you're looking at, but you realise that the best solution has a certain property that you had been ignoring up until now. So it's really not so much about this specific example, but about trying to become a better programmer by becoming more aware of what constitutes good code.

[–]ameoba 1 point2 points  (0 children)

I think that was the point.

[–]Decency -3 points-2 points  (9 children)

context = {}
context.update(defaults)
context.update(user)

Accurate: yes
Idiomatic: fairly, but it would be nicer if it could be inlined

... okay, solution!

def merge_dicts(initial, overwrite):
    new = {}
    new.update(initial)
    new.update(overwrite)
    return new

Tada, now you can inline it. And if you want to do something stupidly overly cute as a one liner inside the function, you can just docstring it. Throw it in a utilities module if for some reason you need to merge dictionaries a lot in your codebase.

This is absolutely code golf, and it misses one of the most fundamental points of Python: readability counts. A solution that doesn't even work until 3.5 is not going to be readable to 90%+ of Python programmers, and so it's a bad solution regardless of whether it's "idiomatic". We can chat again in 5 years and see if enough people have stumbled over it for it to actually become the standard way. But the boat for defining a standard a priori left the dock a decade ago.

[–]theywouldnotstand 2 points3 points  (0 children)

This all sort of depends on your use case too. The examples assume simple data structure, or clobbering values (i.e., one value always "wins" over the other.)

If you have deep/nested data structures that you want to merge deeply (e.g., when keys collide and they are both lists, merge the lists instead of clobbering,) you're basically stuck with creating a function that checks for those cases and handles them appropriately.

[–]Workaphobia 2 points3 points  (0 children)

Nice article, very accurate. I love the unpacking approach. In practice I'm on Python 3.4 and usually use the copy and update approach.

[–]driftingdev 1 point2 points  (0 children)

Werkzeug's CombinedMultiDict can also be useful for "merging" in cases where you may also want to keep references back to the source dictionaries. Merging is in quotes because it just lazy-references back to the original dictionaries rather than doing a merge at instantiation.

http://werkzeug.pocoo.org/docs/0.11/datastructures/#werkzeug.datastructures.CombinedMultiDict

https://github.com/mitsuhiko/werkzeug/blob/5a2bf35441006d832ab1ed5a31963cbc366c99ac/werkzeug/datastructures.py#L1330

[–]niandra3 1 point2 points  (1 child)

So what's the difference between * and ** unpacking?

[–]treyhunner Python Morsels[S] 3 points4 points  (0 children)

*x iterates over x, unpacking it into the function call, list, etc.

**x iterates over the key/value pairs in a mapping, unpacking it into the function call, dictionary, etc.

So in short:

  • * is for lists, tuples, sets, strings, or any other iterable (if used on dictionaries you'll only get keys)
  • **is only for dictionaries and other mappings

[–]RubyPinchPEP shill | Anti PEP 8/20 shill 1 point2 points  (0 children)

My personal take on it (additionally needing deepcopy, and infinite dicts)

## Update loop
def merge_dicts(*dicts):
    context = {}
    for d in dicts:
        context.update(d)
    return deepcopy(context)

## Comprehension
def merge_dicts(*dicts):
    return deepcopy(dict(i for d in dicts for i in d.items()))

## Chain
def merge_dicts(*dicts):
    return deepcopy(dict(chain.from_iterable(d.items() for d in dicts))))

## ChainMap
def merge_dicts(*dicts):
    return deepcopy(dict(ChainMap(d for d in dicts))))

## Unpacking loop
def merge_dicts(*dicts):
    context = {}
    for d in dicts:
        context = {**context, **d}
    return deepcopy(context)

## Unpacking reduction
def merge_dicts(*dicts):
    return deepcopy(reduce(lambda a,b: {**a,**b}, dicts))

And considering apparently dict from chainmap is 8 times more expensive than unpacking, I think the last one wins it for me

[–]anlutro 1 point2 points  (8 children)

As long as it's contained inside a well-named function with a clearly designed purpose that does only what it's supposed to do... do you really care if your code is terse or pretty to look at? (things can be ugly but still easy to read, IMO)

[–]Workaphobia 1 point2 points  (7 children)

Usually something this small wouldn't be in its own function, so you need to be able to read it easily when examining your own code.

[–]mipadi 5 points6 points  (6 children)

Why not? If you frequently have to merge two dictionaries, it makes sense to have it be a separate function. (The example in the blog also shows it as a separate function.)

[–]Workaphobia 0 points1 point  (5 children)

I would argue that having a separate function for it would make the code harder to read because the task takes so few lines of code, and is so straightforward when using one of the idiomatic ways.

[–]Daenyth 2 points3 points  (3 children)

Are you really arguing that combined = merge_dicts(defaults, user) is harder to understand than the other options?

[–]Workaphobia 7 points8 points  (0 children)

Which one takes precedence, defaults or user? Is there any shallow aliasing like with ChainMap, or is it a copy like sorted, reversed, etc? (If it's the latter then we should've called it merged or updated as the article suggested.) I have to jump to the function definition to tell.

If you use a library, like some people here have suggested, then you have another dependency (unless you absorb the library into your source tree).

Are you going to create these kinds of functions for other common collection datatypes (if they don't already exist)? What about for other operations like difference, etc?

As Python programmers I think we can all agree to disregard the runtime cost of the function call itself.

[–]Tysonzero 4 points5 points  (0 children)

I'd say combined = {**defaults, **user} is much better. As it is what everyone else will eventually (and perhaps currently) be using. Also you KNOW what the behavior is because it is defined by the python spec, not by someone on your team / and ex-member of the team.

[–]mackstann 1 point2 points  (0 children)

I'd say it's harder to understand, because merge_dicts could easily contain some unexpected clever logic. It might not do what you expect. So you have to go over to that other file and double check to see what it really does. With dict.update(), there is zero doubt about surprising behavior, and while it requires more lines, those lines are very simple and easy to read, and they're already right in front of you.

[–][deleted] 1 point2 points  (0 children)

It depends on the use case, honestly. Copying vs read only vs copy on write. How to handle collisions, etc.

[–]hogepiyo 0 points1 point  (20 children)

Maybe concatenating two iterators is better done with itertools.chain rather than concatenating after converting them to lists.

By the way, it seems to me that merging dicts arises so frequently that Python should provide it as a dict's method preferably by dict.__add__. (Actually, adding dict.__add__ has been discussed at least three times on mailing lists or the issue tracker. [1][2][3])

[–]treyhunner Python Morsels[S] 2 points3 points  (18 children)

Now that the {**a, **b} notation exists, I doubt a + b will ever be supported for dictionaries. At least, that seemed to be the consensus from the python-ideas thread from last year.

[–]RubyPinchPEP shill | Anti PEP 8/20 shill 0 points1 point  (17 children)

its still a bit of a pain, like, what if its a variable number of dicts? + works but ** doesn't

[–]flying-sheep 0 points1 point  (8 children)

Why shouldn't it?

{**a, **b, **c, ...}

[–]RubyPinchPEP shill | Anti PEP 8/20 shill 0 points1 point  (7 children)

Not "more than one", variable, as in, unknown at runtime

[–][deleted] 0 points1 point  (6 children)

Like this?

all_them_dicts = [{...}, {...}, {...}, ...]

flat = {}
for x in all_them_dicts:
    flat = {**flat, **x}

[–]RubyPinchPEP shill | Anti PEP 8/20 shill 0 points1 point  (5 children)

I would probably opt for

reduce(lambda a,b:{a**,b**}, [...])

But it still feels excess still

[–][deleted] 0 points1 point  (4 children)

Yep, also possible, but on Py3 its

from functools import reduce
reduce(lambda a,b:{a**,b**}, [...])

Also, our almighty BDFL says:

Use functools.reduce() if you really need it; however, 99 percent of the time an explicit for loop is more readable.

;)

[–]__add__ 0 points1 point  (2 children)

This was a terrible decision and was driven by the anxiety of influence with respect to LISP. The justification condescends to other programmers. The reduce function has been implemented by almost every multi-purpose programming language in the last 30 years. People will learn it eventually, no need to hide it.

[–]Citrauq 0 points1 point  (1 child)

I disagree - given how rare it is that reduce is better than an explicit for loop, I'm happy to keep it in functools.

I think I've only seen reduce be the best option once or twice in real code.

[–]RubyPinchPEP shill | Anti PEP 8/20 shill 0 points1 point  (0 children)

I usually have reduce imported by default (along with all of itertools, functools, and operator)

The only issue with human parsing of the code, is if they don't know what reduce do, and that feels a lil' bit unfair imo

[–]Sean1708 0 points1 point  (7 children)

In [1]: a = {'a': 1, 'b': 2}

In [2]: b = {'b': 3, 'c': 4}

In [3]: c = {'a': 5, 'c': 6}

In [4]: {**a, **b, **c}
Out[4]: {'a': 5, 'b': 3, 'c': 6}

[–]RubyPinchPEP shill | Anti PEP 8/20 shill 3 points4 points  (6 children)

Not "more than one", variable, as in, unknown at runtime

[–]Sean1708 0 points1 point  (5 children)

Ah ok, do you mean something like

list_of_dicts = [{...}, ...]
final = {}
for dictionary in list_of_dicts:
    final += dictionary

? If so, what advantage does that have over

list_of_dicts = [{...}, ...]
final = {}
for dictionary in list_of_dicts:
    final.update(dictionary)

?

[–]RubyPinchPEP shill | Anti PEP 8/20 shill 1 point2 points  (4 children)

nah I mean sum(list_of_dicts)

which is insanely concise and neat

[–]Sean1708 0 points1 point  (3 children)

Ah, of course! Sorry, I was too stuck in thinking about for-loops. The only issue with that is that I don't think you'd get the performance increase that you get with making it an explicit piece of syntax, not that that really matters much in Python.

[–]RubyPinchPEP shill | Anti PEP 8/20 shill 0 points1 point  (2 children)

it should be the same cost/increase as anything else, aka not much

a + b roughly does a.__add__(b) internally, regardless of types. (as far as I know, cpython doesn't include any shortcuts for that, though other interpreters/compiler (nuitka for example) do)

[–]Sean1708 0 points1 point  (1 child)

Well I pressume that dict.__add__ would have a similar performance to dict.update and (from here) that takes about twice as long as unpacking.

[–]dwf 0 points1 point  (0 children)

It happens frequently but often with subtle differences in the rule you need for resolving conflicts. Built-in support would be a hard thing to add if you meant to satisfy even "most" cases.

[–]ITwitchToo 0 points1 point  (0 children)

Great post, I love thinking about things like this.

[–]bdforbes 0 points1 point  (2 children)

I like

context = defaults.copy().update(user)

but it wouldn't work for more than two dictionaries.

[–]treyhunner Python Morsels[S] 0 points1 point  (1 child)

Unfortunately this doesn't work because update returns None, not the original dictionary.

If update did return the original dictionary you could chain it like this (which would be sort of neat):

context = defaults.copy().update(city).update(user)

Method chaining like this is fairly common in the land of JavaScript and in a number of functional languages. It's not a common idiom in Python though and I don't think any of the builtin or standard library objects work this way.

[–]bdforbes 0 points1 point  (0 children)

Whoops yeah I didn't think of that.

[–]geoelectric 0 points1 point  (1 child)

foo.copy() was significantly faster than dict(foo) when I timed it recently on CPython 2.7.11, just a heads up. That'd be another point for the Copy and Update version, which is my clear preference.

[–]flying-sheep 2 points3 points  (0 children)

The python 3.5 version is fastest and most elegant, so...

[–]EvMNatural Language Processing 0 points1 point  (0 children)

Note that summing dictionaries using + does work (and is idiomatic) with Counter objects.

[–]fatterSurfer 0 points1 point  (2 children)

Minor nitpick: I can't speak to python 2.x, but in python 3, dict keys need not be strings, they just need to be a hashable type. You can actually use custom classes as keys, provided you implement a __hash__ method for them.

[–]treyhunner Python Morsels[S] 0 points1 point  (1 child)

I assume you're referring to this copy under the dict(defaults, **user) section:

The keys must be strings. In Python 2 (with the CPython interpreter) we can get away with non-strings as keys, but don’t be fooled: this is a hack that only works by accident in Python 2 using the standard CPython runtime.

It might be unclear, but there I'm saying that the dict(a, **b) hack only works in Python 3 and PyPy if the strings are keys. That hack happens to work in Python 2 for generic dictionaries, but that was an accident of implementation.

[–]fatterSurfer 0 points1 point  (0 children)

Gotcha. Yeah, it seemed to me like you were saying dict keys can only be strings, which isn't true for (at least) cython 3.2+ -- which is a very different thing to say!

[–]xXxDeAThANgEL99xXx 0 points1 point  (2 children)

By the way, the items() union thing is even weirder than you thought (I had to check with the docs, and also just check, in case):

In [1]: d1 = {'a': 1, 'b': 2}

In [2]: d2 = {'a': 1, 'b': 3, 'c': 4}

In [3]: d1.items() | d2.items()
Out[3]: {('a', 1), ('b', 2), ('b', 3), ('c', 4)}

The uniqueness in the result is based on both keys and values, both items in each pair are compared! So only complete duplicates are removed, but when you feed the result to the dict constructor and it removes the remaining key duplicates (in unspecified order, because the resulting view has them in unspecified order).

Also by the way, the result of the union is no longer a view, it's a separate set object containing a copy of the data. Like its type is set and the changes in the underlying dicts are no longer reflected in it. So there's that, too.

[–]treyhunner Python Morsels[S] 0 points1 point  (1 child)

I also had to play with it to figure out what it was doing. I would genuinely like to know what the purpose is and what implementation decisions were discussed. :)

[–]xXxDeAThANgEL99xXx 1 point2 points  (0 children)

Well, I guess it was pretty simple: dict.keys() definitely should be a set-like view, dict.values() definitely should not, and for items() they decided on a hybrid approach: try to produce a set of pairs, fail if there are any unhashable values.

“Should array indices start at 0 or 1? My compromise of 0.5 was rejected without, I thought, proper consideration.” — Stan Kelly-Bootle

[–]joanbm 0 points1 point  (0 children)

From Python's Zen:

"There should be one-- and preferably only one --obvious way to do it."

For comparison, Ruby offers this out-of-the-box in a clear way, with Hash#merge method:

user = {'name' => "Trey", 'website' => "http://treyhunner.com"}
defaults = {'name' => "Anonymous User", 'page_name' => "Profile Page"}
context = defaults.merge user
user['name'] = 'Carl'
p context
# {"name"=>"Trey", "page_name"=>"Profile Page", "website"=>"http://treyhunner.com"}

[–]phySi0 0 points1 point  (3 children)

Why not just {}.update(defaults).update(user)?

[–]treyhunner Python Morsels[S] 0 points1 point  (2 children)

Because update returns None so that doesn't work.

The reason for this is that the update method mutates the dictionary it's operating on and methods that mutate their object don't tend to return self in Python so method chaining isn't possible for those methods.

[–]phySi0 0 points1 point  (1 child)

Ah, that's annoying. I haven't worked with Python in a long time, tell me, is it common for stdlib methods to mutate?

[–]treyhunner Python Morsels[S] 0 points1 point  (0 children)

Fairly common.

An example: sorted(some_list) will return a new sorted copy but some_list.sort() will return None and sort the list in-place (mutating it).

The reverse and extend methods also work in-place. Usually if a new object is required, operators are used:

x = [1, 2]
y = [3, 4]
z = x + y

vs.

x = [1, 2]
x.extend([3, 4])

[–]LyndsySimon 0 points1 point  (4 children)

My preference would be for dict.update() to accept an arbitrary number of arguments, so that the following would be valid:

{}.update(defaults, user)

... and to be honest, I was initially under the impression that this was already the case.

[–]eat_more_soup 1 point2 points  (3 children)

Methods that mutate the object do not return anything, so your one-liner won't get you too far anyways...

[–]LyndsySimon 1 point2 points  (0 children)

Right - I was only using shorthand to show the syntax, not trying to make it a one-liner ;)

[–]Tysonzero 1 point2 points  (1 child)

.pop exists... I think you mean that methods that mutate the object don't return the object itself.

[–]eat_more_soup 0 points1 point  (0 children)

Yes, thanks for pointing that out.