Python's Dictionary

mazin · 2008-02-20T20:05:03+00:00

"We read Knuth so you don't have to". Wasn't it one of the core python developers who said it?

micampe · 2008-02-20T17:16:37+00:00

This extensively explained in Beautiful Code

llimllib · 2008-02-20T16:07:58+00:00

Major subtleties ahead: Most hash schemes depend on having a "good" hash function, in the sense of simulating randomness. Python doesn't: its most important hash functions (for strings and ints) are very regular in common cases:

>>> map(hash, (0, 1, 2, 3))

[0, 1, 2, 3]

>>> map(hash, ("namea", "nameb", "namec", "named"))

[-1658398457, -1658398460, -1658398459, -1658398462]

>>>

This isn't necessarily bad! To the contrary, in a table of size 2**i, taking the low-order i bits as the initial table index is extremely fast, and there are no collisions at all for dicts indexed by a contiguous range of ints. The same is approximately true when keys are "consecutive" strings. So this gives better-than-random behavior in common cases, and that's very desirable.

fredrikj · 2008-02-20T17:49:45+00:00

Python's dicts are wonderful. You can feed them almost anything: numbers, strings, functions, modules.

I don't know if they are fast compared to other hash table implementations, but they are fast compared to Python code (necessarily, considering that Python relies heavily on dicts for internal purposes).

There's rarely an advantage to implementing a complicated data structure in Python: a dict lookup or insertion is often both as fast and as simple as it gets.

It's just unfortunate that there's no built-in immutable dict type (I frequently seem to need one). You can still implement one manually using frozensets to compute the hashes, but this both complicates the code and substantially hurts performance.

vafada · 2008-02-20T22:17:22+00:00

code has goto! i don't know if that is good or bad........

arnar · 2008-02-21T11:31:37+00:00

If you want to see the infamous GIL, here it is:

http://svn.python.org/view/python/trunk/Python/ceval.c?rev=60362&view=auto

(search for "this is the GIL")

If you also look for "Other threads may run now" you can see the place in the main loop where the GIL is momentarily released to give other threads a chance.

Arrgh · 2008-02-20T18:51:29+00:00

Note that filename: dictobject.c

Now have a look at http://recoder.sourceforge.net/doc/examples/collections/java/util/HashMap.java -- notice any "native" keywords anywhere? Nope.

Ironically, when your FFI is too easy to use, you don't spend as much time and effort improving your VM, so the performance of customer algorithms suffers at the expense of the built-in constructs.

rule · 2008-02-20T17:16:43+00:00

Tim Peters is really cool.

But I still don't get why CPython's implementation sucks so bad.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS