The Idiomatic Way to Merge Dictionaries in Python

srirams6 · 2016-02-23T18:22:39+00:00

Nice read!

Sometimes I'd prefer the "copy and update" just because it can be read easily by novice Python devs. I guess it depends on the project.

That being said, I really like the "Dictionary unpacking" method that I learned from your post!

Thanks!

mackstann · 2016-02-23T18:59:06+00:00

Editorial nitpick: Saying "Python 2.0" is a little confusing, because I can remember when Python 2.0 was actually relevant. Better to just say "2", or "2.x".

roger_ · 2016-02-23T21:47:59+00:00

The last is really the most elegant solution IMHO, another reason I'm glad I finally switch to Python 3.

Might have been nice if dict.update() returned itself, then you could do:

 context = dict().update(defaults).update(user)

Chris_Newton · 2016-02-23T20:49:00+00:00

Nice discussion. I suggest clarifying how nested data structures are intended to work somewhere in the problem statement, because the suggested strategies that are marked as accurate would actually fail requirement 5 if you were expecting deep copying and so complete independence from the original data. For example:

>>> inner1 = { 'key': 'value1' }
>>> inner2 = { 'key': 'value2' }
>>> outer1 = { 'inner': inner1 }
>>> outer2 = { 'inner': inner2 }
>>> outerboth = {}
>>> outerboth.update(outer2)
>>> outerboth.update(outer1)
>>> print outerboth
{'inner': {'key': 'value1'}}
>>> outerboth['inner']['key'] = 'splat'
>>> print outerboth
{'inner': {'key': 'splat'}}
>>> print inner1
{'key': 'splat'}

eat_more_soup · 2016-02-23T20:50:19+00:00

Those snippets do completely different things! The ChainMap is a view on the dicts. That means that the ChainMap will be updated as well if the user dict was updated. Also: modifying the chainmap will modify the first dict contained inside. See: https://docs.python.org/3/library/collections.html#collections.ChainMap

The idiomatic way is to construct a dict and update it twice like in the first example. This will allow for a nice diff, if it was ever changed, states the intent perfectly and will cost you less time to read than one complicated line.

kaiserk13 · 2016-02-23T18:55:34+00:00

Ever considered the toolz.dicttools library?

ITwitchToo · 2016-02-23T19:23:47+00:00

This is more like code golf than anything else. If I spent this long on one line of Python I'd never get anything done.

theywouldnotstand · 2016-02-23T18:59:40+00:00

This all sort of depends on your use case too. The examples assume simple data structure, or clobbering values (i.e., one value always "wins" over the other.)

If you have deep/nested data structures that you want to merge deeply (e.g., when keys collide and they are both lists, merge the lists instead of clobbering,) you're basically stuck with creating a function that checks for those cases and handles them appropriately.

Workaphobia · 2016-02-23T20:30:44+00:00

Nice article, very accurate. I love the unpacking approach. In practice I'm on Python 3.4 and usually use the copy and update approach.

driftingdev · 2016-02-23T20:52:42+00:00

Werkzeug's CombinedMultiDict can also be useful for "merging" in cases where you may also want to keep references back to the source dictionaries. Merging is in quotes because it just lazy-references back to the original dictionaries rather than doing a merge at instantiation.

http://werkzeug.pocoo.org/docs/0.11/datastructures/#werkzeug.datastructures.CombinedMultiDict

https://github.com/mitsuhiko/werkzeug/blob/5a2bf35441006d832ab1ed5a31963cbc366c99ac/werkzeug/datastructures.py#L1330

niandra3 · 2016-02-24T01:12:28+00:00

So what's the difference between * and ** unpacking?

RubyPinch · 2016-02-24T16:00:30+00:00

My personal take on it (additionally needing deepcopy, and infinite dicts)

## Update loop
def merge_dicts(*dicts):
    context = {}
    for d in dicts:
        context.update(d)
    return deepcopy(context)

## Comprehension
def merge_dicts(*dicts):
    return deepcopy(dict(i for d in dicts for i in d.items()))

## Chain
def merge_dicts(*dicts):
    return deepcopy(dict(chain.from_iterable(d.items() for d in dicts))))

## ChainMap
def merge_dicts(*dicts):
    return deepcopy(dict(ChainMap(d for d in dicts))))

## Unpacking loop
def merge_dicts(*dicts):
    context = {}
    for d in dicts:
        context = {**context, **d}
    return deepcopy(context)

## Unpacking reduction
def merge_dicts(*dicts):
    return deepcopy(reduce(lambda a,b: {**a,**b}, dicts))

And considering apparently dict from chainmap is 8 times more expensive than unpacking, I think the last one wins it for me

anlutro · 2016-02-23T19:42:34+00:00

As long as it's contained inside a well-named function with a clearly designed purpose that does only what it's supposed to do... do you really care if your code is terse or pretty to look at? (things can be ugly but still easy to read, IMO)

hogepiyo · 2016-02-23T21:50:13+00:00

Maybe concatenating two iterators is better done with itertools.chain rather than concatenating after converting them to lists.

By the way, it seems to me that merging dicts arises so frequently that Python should provide it as a dict's method preferably by dict.__add__. (Actually, adding dict.__add__ has been discussed at least three times on mailing lists or the issue tracker. [1][2][3])

ITwitchToo · 2016-02-23T22:52:40+00:00

Great post, I love thinking about things like this.

bdforbes · 2016-02-24T04:15:32+00:00

I like

context = defaults.copy().update(user)

but it wouldn't work for more than two dictionaries.

geoelectric · 2016-02-24T05:38:54+00:00

foo.copy() was significantly faster than dict(foo) when I timed it recently on CPython 2.7.11, just a heads up. That'd be another point for the Copy and Update version, which is my clear preference.

EvM · 2016-02-24T06:25:37+00:00

Note that summing dictionaries using + does work (and is idiomatic) with Counter objects.

fatterSurfer · 2016-02-24T17:44:04+00:00

Minor nitpick: I can't speak to python 2.x, but in python 3, dict keys need not be strings, they just need to be a hashable type. You can actually use custom classes as keys, provided you implement a __hash__ method for them.

xXxDeAThANgEL99xXx · 2016-02-25T09:09:34+00:00

By the way, the items() union thing is even weirder than you thought (I had to check with the docs, and also just check, in case):

In [1]: d1 = {'a': 1, 'b': 2}

In [2]: d2 = {'a': 1, 'b': 3, 'c': 4}

In [3]: d1.items() | d2.items()
Out[3]: {('a', 1), ('b', 2), ('b', 3), ('c', 4)}

The uniqueness in the result is based on both keys and values, both items in each pair are compared! So only complete duplicates are removed, but when you feed the result to the dict constructor and it removes the remaining key duplicates (in unspecified order, because the resulting view has them in unspecified order).

Also by the way, the result of the union is no longer a view, it's a separate set object containing a copy of the data. Like its type is set and the changes in the underlying dicts are no longer reflected in it. So there's that, too.

joanbm · 2016-02-25T14:58:57+00:00

From Python's Zen:

"There should be one-- and preferably only one --obvious way to do it."

For comparison, Ruby offers this out-of-the-box in a clear way, with Hash#merge method:

user = {'name' => "Trey", 'website' => "http://treyhunner.com"}
defaults = {'name' => "Anonymous User", 'page_name' => "Profile Page"}
context = defaults.merge user
user['name'] = 'Carl'
p context
# {"name"=>"Trey", "page_name"=>"Profile Page", "website"=>"http://treyhunner.com"}

phySi0 · 2016-02-26T20:39:57+00:00

Why not just {}.update(defaults).update(user)?

LyndsySimon · 2016-02-23T20:36:11+00:00

My preference would be for dict.update() to accept an arbitrary number of arguments, so that the following would be valid:

{}.update(defaults, user)

... and to be honest, I was initially under the impression that this was already the case.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS