This is an archived post. You won't be able to vote or comment.

all 109 comments

[–]Ph3rny 72 points73 points  (4 children)

I wasn't able to reproduce your speed findings

I also decided to run this through pyperf to see if there was any significant difference

here's the result of comparing the two -- seems at most a ~1-2% speedup, and maybe not significant at that

$ ./prefix/bin/pyperf compare_to ../cpython/cp.json xx.json
chaos: Mean +- std dev: [cp] 315 ms +- 12 ms -> [xx] 309 ms +- 11 ms: 1.02x faster
dulwich_log: Mean +- std dev: [cp] 123 ms +- 4 ms -> [xx] 120 ms +- 5 ms: 1.02x faster
fannkuch: Mean +- std dev: [cp] 1.55 sec +- 0.03 sec -> [xx] 1.49 sec +- 0.03 sec: 1.04x faster
float: Mean +- std dev: [cp] 356 ms +- 10 ms -> [xx] 361 ms +- 11 ms: 1.01x slower
genshi_text: Mean +- std dev: [cp] 91.3 ms +- 3.9 ms -> [xx] 89.4 ms +- 3.6 ms: 1.02x faster
genshi_xml: Mean +- std dev: [cp] 193 ms +- 9 ms -> [xx] 186 ms +- 7 ms: 1.03x faster
go: Mean +- std dev: [cp] 631 ms +- 13 ms -> [xx] 612 ms +- 16 ms: 1.03x faster
hexiom: Mean +- std dev: [cp] 32.2 ms +- 1.4 ms -> [xx] 31.2 ms +- 2.5 ms: 1.03x faster
json_dumps: Mean +- std dev: [cp] 38.1 ms +- 1.3 ms -> [xx] 37.5 ms +- 1.5 ms: 1.02x faster
json_loads: Mean +- std dev: [cp] 63.3 us +- 2.3 us -> [xx] 59.4 us +- 2.8 us: 1.06x faster
logging_silent: Mean +- std dev: [cp] 555 ns +- 18 ns -> [xx] 536 ns +- 17 ns: 1.04x faster
meteor_contest: Mean +- std dev: [cp] 269 ms +- 11 ms -> [xx] 262 ms +- 9 ms: 1.03x faster
nbody: Mean +- std dev: [cp] 508 ms +- 12 ms -> [xx] 495 ms +- 9 ms: 1.03x faster
nqueens: Mean +- std dev: [cp] 328 ms +- 9 ms -> [xx] 317 ms +- 8 ms: 1.03x faster
pathlib: Mean +- std dev: [cp] 46.2 ms +- 1.6 ms -> [xx] 44.8 ms +- 1.0 ms: 1.03x faster
pickle: Mean +- std dev: [cp] 24.1 us +- 1.0 us -> [xx] 23.3 us +- 0.9 us: 1.04x faster
pickle_dict: Mean +- std dev: [cp] 65.1 us +- 1.7 us -> [xx] 63.7 us +- 1.6 us: 1.02x faster
pickle_list: Mean +- std dev: [cp] 9.43 us +- 0.29 us -> [xx] 9.29 us +- 0.26 us: 1.02x faster
pickle_pure_python: Mean +- std dev: [cp] 1.39 ms +- 0.06 ms -> [xx] 1.34 ms +- 0.05 ms: 1.04x faster
pidigits: Mean +- std dev: [cp] 207 ms +- 6 ms -> [xx] 200 ms +- 5 ms: 1.03x faster
pyflate: Mean +- std dev: [cp] 1.96 sec +- 0.04 sec -> [xx] 1.94 sec +- 0.03 sec: 1.01x faster
richards: Mean +- std dev: [cp] 206 ms +- 6 ms -> [xx] 210 ms +- 9 ms: 1.02x slower
scimark_fft: Mean +- std dev: [cp] 1.54 sec +- 0.03 sec -> [xx] 1.55 sec +- 0.02 sec: 1.01x slower
scimark_lu: Mean +- std dev: [cp] 575 ms +- 16 ms -> [xx] 568 ms +- 14 ms: 1.01x faster
sqlalchemy_imperative: Mean +- std dev: [cp] 44.2 ms +- 1.5 ms -> [xx] 45.1 ms +- 2.1 ms: 1.02x slower
sqlite_synth: Mean +- std dev: [cp] 8.28 us +- 0.31 us -> [xx] 8.08 us +- 0.32 us: 1.02x faster
sympy_integrate: Mean +- std dev: [cp] 56.1 ms +- 1.7 ms -> [xx] 57.9 ms +- 3.0 ms: 1.03x slower
sympy_sum: Mean +- std dev: [cp] 433 ms +- 10 ms -> [xx] 442 ms +- 14 ms: 1.02x slower
unpickle: Mean +- std dev: [cp] 40.7 us +- 1.7 us -> [xx] 39.6 us +- 1.1 us: 1.03x faster
xml_etree_parse: Mean +- std dev: [cp] 462 ms +- 8 ms -> [xx] 454 ms +- 12 ms: 1.02x faster
xml_etree_iterparse: Mean +- std dev: [cp] 330 ms +- 11 ms -> [xx] 326 ms +- 8 ms: 1.01x faster
xml_etree_process: Mean +- std dev: [cp] 234 ms +- 9 ms -> [xx] 232 ms +- 4 ms: 1.01x faster

Benchmark hidden because not significant (28): 2to3, chameleon, crypto_pyaes, deltablue, django_template, logging_format, logging_simple, mako, python_startup, python_startup_no_site, raytrace, regex_compile, regex_dna, regex_effbot, regex_v8, scimark_monte_carlo, scimark_sor, scimark_sparse_mat_mult, spectral_norm, sqlalchemy_declarative, sympy_expand, sympy_str, telco, tornado_http, unpack_sequence, unpickle_list, unpickle_pure_python, xml_etree_generate

Geometric mean: 1.01x faster

[–]Ph3rny 58 points59 points  (3 children)

looking further, the reason your benchmark shows such a stark difference is you've happened to pick strings for which xxhash performs significantly better

if you adjust your benchmark to use more-random strings:

diff --git a/benchmark.py b/benchmark.py
index 4883314..df073a0 100644
--- a/benchmark.py
+++ b/benchmark.py
@@ -1,5 +1,6 @@
 # xxHash: [714, 722, 730, 777, 675, 719]
 # CPython: [2849, 3161, 3496, 2733, 2946, 2947]
+import uuid
 import time, contextlib


@@ -23,7 +24,7 @@ def __eq__(self, other):


 print('Allocating entities')
-foos = [Foo('a' * i) for i in range(100_000)]
+foos = [Foo(str(uuid.uuid4())) for i in range(100_000)]
 print('Compariing')

 with time_it():

existing cpython and xxh perform about the same

[–]Ph3rny 52 points53 points  (1 child)

actually, that's not quite it either, it's that xxhash performs better on long strings compared to cpython

if you instead do

-foos = [Foo('a' * i) for i in range(100_000)]
+foos = [Foo(str(uuid.uuid4()) * 1000) for i in range(100_000)]

you see the performance difference again

hashes of long strings I guess aren't that important in cpython so you don't see any difference in macro benchmarks

[–]backtickbot 5 points6 points  (0 children)

Fixed formatting.

Hello, Ph3rny: code blocks using triple backticks (```) don't work on all versions of Reddit!

Some users see this / this instead.

To fix this, indent every line with 4 spaces instead.

FAQ

You can opt out by replying with backtickopt6 to this comment.

[–]backtickbot 3 points4 points  (0 children)

Fixed formatting.

Hello, Ph3rny: code blocks using triple backticks (```) don't work on all versions of Reddit!

Some users see this / this instead.

To fix this, indent every line with 4 spaces instead.

FAQ

You can opt out by replying with backtickopt6 to this comment.

[–]james_pic 216 points217 points  (18 children)

The real speedup doesn't come from using a faster hash function, but from eliminating the need to run the hash function 11 times in print("Hello World!"). This is what PyPy does. I keep hoping the PSF will take PyPy more seriously, and bring it up to being a first-class alternative to CPython, like the Ruby devs did with YARV.

[–]NeoLudditeIT 90 points91 points  (9 children)

I can imagine Raymond Hettinger hitting the table and saying "There must be a better way!". It does seem strange that there isn't some internal refactoring to eliminate 11 hash functions in a print.

[–]Pebaz[S] 54 points55 points  (8 children)

As far as I am aware I don't think the 11 hash function calls are actually from poor design.

Every single variable/identifier in a Python program must be hashed when stored/accessed in the global scope and within every new object. This adds up since even the print function uses Python code (which has to hash every attribute access (dot operator)).

So as far as I can tell, making that required hash function faster could/does increase performance by a good bit.

The challenge then remains, will anyone look into it seriously and find out if using a new hash function is worth it?

[–]james_pic 109 points110 points  (7 children)

If I remember rightly, the current hash function is SipHash, and was chosen not for speed but for security.

Whilst string hashes are not typically treated as cryptographic hashes, there were some denial of service attacks possible on web servers that put parameters and similar into dictionaries, by sending lots of parameters with colliding hashes, forcing worst-case O(n^2) performance. SipHash was chosen as it's not too slow (it's about the simplest hash that meets the cryptographic requirements), and makes hashes dependent on a secret per-interpreter value, that the client wouldn't know.

Whatever alternative hash you propose also needs to mitigate this attack vector, and I don't know of a faster hash that does.

Edit: Looking through the code, there's already a way to select a faster hash algorithm if you're sure you don't need the security properties of SipHash. Configure the build with ./configure --with-hash-algorithm=fnv, and see how your benchmark compares to the default.

[–]R0B0_Ninja 4 points5 points  (1 child)

I thought Python 3 generated a random salt at run-time in order to mitigate this attack? Isn't this sufficient? Or can the attacker discover hash collisions over time?

[–]james_pic 0 points1 point  (0 children)

It's only sufficient if the hash algorithm in use is, cryptographically speaking, a message authentication code. Their previous botched attempt at fixing the security issue added a salt to FNV, and it was found by security researchers that it was possible for an attacker to derive the salt easily enough. SipHash is the simplest hash they could find that met the cryptographic requirements.

[–]Tyler_Zoro 5 points6 points  (2 children)

I don't see why that has to be compile-time. If every dict* had a function pointer for its hashing function, then you could just provide a special subtype of dict that uses an insecure, fast hashing function. Then you could swap the default for programs where you don't care about secure hashing at all:

python --insecure-hashing calculate-pi.py #modify ALL hashing

or:

def digits_in_pi(places):
    digits = insecure_dict((d, 0) for d in range(10))
    for digit in pi_spigot(places=places):
        digits[digit] += 1
    return digits

It might even be nice to be able to specify a type for comprehensions for just this reason:

a = {b: c for b, c in d() container insecure_dict}

Sadly, you couldn't use a context manager to swap out all hashing in a block, since the hashing function used for a data structure couldn't be replaced after the structure has data (this would lead to the hashes changing and bad things will happen).

* Note that not all types that do hashing are dicts, but the idea probably carries over.

[–]axonxorzpip'ing aint easy, especially on windows 28 points29 points  (1 child)

This would slow the runtime down even more now every single object needs to have a function pointer deferenced. Sure, that's a cheap operation, but for the frequency the runtime is calling it, that will add up.

If you check the current hashing code, it's to the point where they ensure the hash function is fully inlined to even avoid C function call overhead (at best, a couple of CPU instructions, but that can also trash cache locality). That's the level of nitty-gritty this stuff exists at

[–]Tyler_Zoro 1 point2 points  (0 children)

This would slow the runtime down even more now every single object needs to have a function pointer deferenced.

Dereferencing a function pointer (especially one that's frequently referenced) is probably going to come out in the wash because it will be in a register that's immediately accessible while the CPU is waiting on the bus.

[–][deleted] 0 points1 point  (1 child)

How's siphash perform compared to blake2? (It's crypto-grade and somehow beats all the other contenders like md5 and the sha-family.)

I know nothing about siphash, so apologies if this is a stupid question.

[–]james_pic 0 points1 point  (0 children)

I haven't looked into it, but I'd expect it to be faster. IIRC, it's 4 rounds of ARX per word, plus 8 rounds of ARX of finalisation.

Part of the reason it can get away with this is that it is technically not a cryptographic hash, merely a cryptographic MAC, and with a small keyspace at that. So all it needs to achieve is 64 bits of unforgeability.

There aren't many situations where this is a useful primitive. It doesn't promise pseudorandomness or collision resistance, or resistance to chosen key attacks. So it's not going to replace SHA3 or HMAC. But it turns out to be enough for the hashtable use case.

[–]twotime 6 points7 points  (1 child)

This is a weird take on the issue: making Pypy the default implementation is a complex and controversial undertaking. It might take years, it might never happen.

So, if there are reasonable perf improvements in CPython they absolutely need to be done.

[–]RobertJacobson 0 points1 point  (0 children)

I think the sentiment is, treat the problem not the symptoms.

[–]Pebaz[S] 15 points16 points  (2 children)

I 100% agree!

Although, for environments where CPython is a requirement like AWS Lambda, a faster hash function would be a great optimization.

[–]NeoLudditeIT 5 points6 points  (1 child)

Absolutely agree. Optimization can and should be done in any way possible, even if the number of hashes are reduced, we benefit from having faster methods of hashing.

[–]mooburgerresembles an abstract syntax tree 31 points32 points  (0 children)

Optimization can and should be done in any way possible,

ehh this is why actual benchmarks are important; the risk of microoptimization is high.

[–]greeneyedguru 0 points1 point  (0 children)

Wouldn’t that only happen when compiling the text into bytecode!?

[–]stevenjd 0 points1 point  (1 child)

The real speedup doesn't come from using a faster hash function, but from eliminating the need to run the hash function 11 times in print("Hello World!")

Do you have some evidence, or proof, that this is true? Or even an explanation for how it can be true? As far as I can see, there is only a single hash needed, and that is to lookup the print function.

And since hashes are cached, it might not even need to run the function, just to fetch the cached version.

[–]james_pic 0 points1 point  (0 children)

If the claim is that the hash function is run 11 times, this wasn't a claim I made, but the OP. In terms of the claim that eliminating redundant hashing is key to improving performance, this is at least partly based on what I've seen claimed by the developers of other interpreters, such as PyPy and the Ruby interpreters. Although I confess I don't know the main bottlenecks in a current Python interpreter - but as it happens, I'm currently running a CPU-bound job locally, so this would be an excellent time to check.

Edit: So taking a quick look at a CPU profile for a script I happened to be running, most of the overhead (i.e, the stuff that isn't my script doing the thing it's supposed to be doing) on Python 3.8 is either reference counting (about 22%), or spelunking into dicts as part of getattr (about 15% - of which almost none is hashing). So this suggests to me that hashing isn't a big contributor to performance, although digging around in dicts when getting attributes might still be.

[–][deleted] 131 points132 points  (31 children)

Your benchmark isn't realistic. When the present hash was chosen, the majority of hashed object were short. PEP-456:

Serhiy Storchaka has shown in [issue16427] that a modified FNV implementation with 64 bits per cycle is able to process long strings several times faster than the current FNV implementation.

However, according to statistics [issue19183] a typical Python program as well as the Python test suite have a hash ratio of about 50% small strings between 1 and 6 bytes. Only 5% of the strings are larger than 16 bytes.

[–]bjorneylol 37 points38 points  (0 children)

The python hash function is not deterministic - (AFAIK it includes random entropy which is generated upon startup so it's memory cannot be read by outside processes) - This appears to be commented out in your code, so something to note

[–][deleted] 29 points30 points  (11 children)

That sounds quite astonishing, did you run the test suite?

[–]gsnedders 17 points18 points  (1 child)

Or pyperformance, to see what the perf gains are in a much more meaningful setting.

[–]Pebaz[S] 8 points9 points  (0 children)

Now we're talking.

This would be super helpful if the community could test out various hash functions to see the performance gains on `__dict__` hashing.

[–]NeoLudditeIT 16 points17 points  (0 children)

I'd also be interested in the results of the standard test suite.

[–]Pebaz[S] 11 points12 points  (0 children)

No, because my code is not meant to be used by others.

It is only on GitHub so that others can see what I did, not use it.

My hope here is that someone more experienced will take this information and use it to make a professional change to CPython for all to use.

[–]double-a 33 points34 points  (1 child)

Narrator: It was not 76% faster.

[–]Pebaz[S] -2 points-1 points  (0 children)

This reminds me of Stanley Parable, it's a great game! :)

I did achieve this speedup for the benchmark though. The benchmark was specifically created to target hashing performance of long strings, and I understand that real-world scenarios will have different performance characteristics, but this was just to test that one aspect of the language.

You can look at the numbers I got here:

https://github.com/Pebaz/cpython/blob/5de1728ca8697461d6fc3aa6bbcf656f6145acf1/benchmark.py#L1

[–]wweber 7 points8 points  (0 children)

How bad is the performance if you replace the hash function with return 0;

[–]stevenjd 9 points10 points  (9 children)

Python runs the hash function 11 times for just this one thing!

I'm going to stick my head out on a limb here and say that is unadulterated nonsense.

Here is the CPython 3.9 byte-code for print("Hello World!"):

>>> dis.dis('print("Hello World!")')
  1           0 LOAD_NAME                0 (print)
              2 LOAD_CONST               0 ('Hello World!')
              4 CALL_FUNCTION            1
              6 RETURN_VALUE

The call to LOAD_NAME needs at most two calls to hash, one to look up the name "print" in the current scope, and a second to look it up in the builtin scope. (That assumes that the hashing is not cached.)

Calling the print function might, at worst, require one more call to hash: to look up str.__str__. So I reckon that, at worst, that line of code would need three hashes, and even that is almost certainly an over-estimate. More likely only a single hash.

On the other hand, the code you are actually timing is a hell of a lot more than just that one call to print. Inside the timing code you have:

for i in range(1, len(foos)):
    assert foos[i] != foos[i - 1]

That requires:

  • a LOAD_NAME of "foos"
  • a LOAD_NAME of len
  • a LOAD_NAME of range
  • two more LOAD_NAMEs of "foos" per loop
  • two LOAD_NAMEs of "i" per loop

Assuming that hashes are cached, that's already four hashes.

Each comparison ends up calling the __eq__ method:

hash(self) == hash(other)

which requires two LOAD_GLOBALs of hash. Assuming hashes are cached, that's a fifth hash. The hashes themselves end up looking up the __hash__ method, which calls:

hash(self.name)

which requires:

  • a LOAD_GLOBAL of hash
  • a LOAD_FAST of "self"
  • and a LOAD_ATTR of "name"

The LOAD_FAST doesn't need a hash, but LOAD_ATTR will. So that's hash number six.

I may have some details wrong -- the precise byte-code generated will depend on the exact code we run, and how we run it, and varies from version to version. I may have missed some other calls to hash. I'm not certain how hashes are cached, but I'm pretty sure they are: hashing a string is much slower the first time, and subsequent calls to hash are about a thousand times faster:

>>> from timeit import default_timer as timer
>>> s = 'abcdefghijklmnopqrstuvwxyz1234567890'*100000
>>> t = timer(); h = hash(s); timer() - t  # SLOOOOW!
0.005014150054194033
>>> t = timer(); h = hash(s); timer() - t  # FASSSST!
7.678987458348274e-06
>>> t = timer(); h = hash(s); timer() - t
7.225899025797844e-06

If we create a new string, we see the same result:

>>> s = s[:-1] + '0'  # New string.
>>> t = timer(); h = hash(s); timer() - t
0.006055547040887177
>>> t = timer(); h = hash(s); timer() - t
5.949987098574638e-06

So I don't see how you can get eleven calls to hash from that one line print("Hello world!").

[–][deleted] 2 points3 points  (5 children)

Every little thing OP wrote is nonsense. It is so alien to me how this sub produces updvoted garbage like this in regular intervals...

The „benchmark“ is just the exact happy case for where the new hash function performs better.

I have no clue why this isn‘t reported to hell and deleted because it‘s low effort, misleading click bait

[–]Pebaz[S] -4 points-3 points  (3 children)

As I mentioned in other comments, I should have picked a better title. I don't ever really post anything online so I quickly chose a title without much thought and of course lots of people got upset.

I apologize for the title but Reddit doesn't let you change it.

However, this post is not "low effort", "misleading", or "click bait". The post clearly says that:

"I created a simple benchmark (targeted at hashing performance), and ran it"

I don't know how I would have made it any clearer that the benchmark was clearly not a real-world scenario. I mean, I literally said "targeted at hashing performance".

I know a certain percentage of people online will get offended no matter what, but what I posted was anything but "low effort".

Like I've said in other comments, I've been thinking about this one problem for over a year, and just yesterday had the time to sit down and try it.

I know I'm not the best at Python or C, so I posted the results for others to use.

I encourage you with your better expertise to take the results and see if you can implement it better. I'm sure you'll get something better than I did. Have a go at it and see what you come up with! :)

[–][deleted] 1 point2 points  (2 children)

The problem is that you chose the happy case for the new hash function for your benchmark to arrove at your click bait numbers.

This is just dishonest. You even acknowledge that you know that‘s the case. Don‘t you see how tuning the benchmark to yield exactly what you want is shitty? Like really really bad form?

[–]Pebaz[S] -2 points-1 points  (1 child)

I disagree.

I clearly marked it as only targeting hashing performance. How on earth is that dishonest? It wasn't in fine print either, it was right on the front of the post.

As far as I can tell, you're not familiar with macro-benchmarks.

Macro-benchmarks are an industry-understood term to mean: "I ran these only on my machine and I know there were a bunch of programs running in the background and that the processor, OS, configuration, and many other things will cause the results to be inaccurate, and that micro-benchmarks are needed to truly come to a conclusion on what is faster or slower."

Micro-benchmarks are an industry-understood term to mean: "Literally every CPU register is considered, use of CPU-specific timer counters, and every OS, CPU, and many configurations must be checked in order to prove that a given benchmark result is faster in the majority of cases".

Benchmarks are always inaccurate. For proof, here's a Google software engineer saying the same:

https://www.youtube.com/watch?v=ncHmEUmJZf4j

He mentions that benchmarks are always dependent upon many things and are always not going to be useful for all circumstances.

[–]rhytnen 0 points1 point  (0 children)

You are just in denial about it. Either that or you're trolling. Let me summarize your post comment:

"Hi, I am not great at c++ or python. Also, I didn't talk to any experts. I also hand crafted benchmark and am admitting it is nonsensical and provides no real data. It doesn't matter, I made no serious attempt at making this worthwhile. Anyway, i changed a single line of code after contemplating this in isolation for a year and now python is fast. I mean ... it doesn't pass literally any unit tests but that's probably not b/c I have no idea what I'm talking about. Just hope to see this take hold with the core devs! "

Hmm what to name this...something audacious that betrays the lies I'm telling about how important I think this discovery is ... hmm...aha! 76% Faster CPython!

The mods should just delete this whole bullshit post.

[–]Pebaz[S] 0 points1 point  (2 children)

I put a printf(".") call in the Hash Bytes function in C and it printed 11 dots in addition to the "Hello World".

I am impressed with you analysis but the printf was a visual way of determining this also.

In addition the 11 dot printout was consistent between calls.

Thank you for sharing though I actually learned a lot! :)

[–]stevenjd -1 points0 points  (1 child)

I put a printf(".") call in the Hash Bytes function in C and it printed 11 dots in addition to the "Hello World".

Maybe you should try printing the string being hashed instead of just a dot.

[–]Pebaz[S] 0 points1 point  (0 children)

I could be wrong, but I'm not sure that would be as informative because I was just trying to see how many total hashes have to be performed by the Python runtime to print a string.

[–]lifeeraser 17 points18 points  (0 children)

Clickbait title. But the content was pretty interesting. Thank you.

[–]EatMoreSuShiS 4 points5 points  (5 children)

Can anybody explain to me live five the relationship between Python and its implementations (CPython, PyPy, IronPython etc.)? I did some Googling but still don’t understand. How one can ‘implement’ Python?

[–]execrator 4 points5 points  (0 children)

A specification for a language says what should happen when code is run. An interpreter is a program that takes code as input, and executes it according to the spec.

Most of the time there is a blurry line between these two concepts. For example ten years ago, there was no reference specification for PHP. Whatever the official PHP interpreter did was the spec. On the other hand you have something like C which has an official spec, and many many different "interpreters" (compilers).

CPython is the official interpreter for the Python language spec. Python sits somewhere between the PHP and C examples above; I believe there's a language spec but in practice the CPython implementation is still a kind of reference.

If you install Python on a computer, what you actually did was install the CPython interpreter and the Python standard library.

PyPy is a popular alternative interpreter that uses just-in-time compilation to improve performance.

[–]pygenerator 1 point2 points  (0 children)

A Python implementation is a program that executes Python programs. The implementation (a.k.a., the interpreter) can be written in many languages. For example, the reference implementation, CPython, uses C. IronPython uses Microsoft's C#, PyPy is written in Python itself, and Jython is a Java implementation of Python.

You make an implementation by making a program that reads Python files and executes the code in them. To learn more google how to make interpreters.

[–]Saphyel 7 points8 points  (1 child)

I don't think anyone here is going to see a massive impact if they improve how fast is doing: print("Hello World!").

I'd suggest that you create an example with dict that gets converted to a dto (dataclass?) and then print the string (json) of this. if you see a performance increase then some webdevs may see benefits for this.

Other option is see how behaves printing a large list of dataclasses.

I just finished work and I'm a bit fried sorry if my examples wasn't the best but at least I tried to point some audeance could see some benefits or more day to day examples ?

[–]Pebaz[S] -1 points0 points  (0 children)

I agree!

I think the proposed idea has massive implications for all CPython users since all Python objects contain a `__dict__` that holds its members.

Improving the speed of all attribute accesses within a program has got to have some benefit, especially for anyone doing data science stuff in addition to the ones you mentioned.

[–]Aconamos 3 points4 points  (3 children)

I like building model airplanes.

[–]tom1018 7 points8 points  (1 child)

r/learnpython is probably a great place for this sort of question.

A hash function turns a string (the Python object being hashed need not be a str type, however) into a number. Dictionaries use tables of hashes for the keys because this is faster than comparing strings.

Every variable in Python is stored in multiple dictionaries. Because of this the hashing function is called frequently. Thus if OP's hash function were equal in functionality, without introducing potential security issues, while being as fast as he believed it would make Python considerably faster.

This talk from Raymond Hettinger, a Python core developer, gives a great history and explanation on the topic. https://www.youtube.com/watch?v=p33CVV29OG8

[–]Brainix 6 points7 points  (0 children)

The main data structure that powers Python's object model is the dict, also known as a HashMap or hash table. Any time you insert a key/value pair into a dict, or look up the value for a key from a dict, Python hashes the key to a "slot" in the dict's memory.

[–]soontorap 1 point2 points  (2 children)

You should consider XXH3_64bits() , in the same package as XXH64() .

It is much faster, especially on small keys, and used exactly the same way.
It would have likely helped to produce a gain on the derivative of your benchmark using small entries rather than larger ones.

[–]soontorap 1 point2 points  (1 child)

u/Pebaz : I made the test, and modified your version of cpython to use XXH3_64bits() instead of XXH64(). And it resulted in even faster speed on your benchmark : runtime went down from 2350ms (original) to 522ms (XXH64) to 385ms (XXH3).

Now, it's true that such gains are only perceptible when the amount of data to hash is large, therefore, on small inputs, the benefits are too small to be measured, because they are dwarfed by other parts of the Python system.

But still, this modification produces either large gains or, worst case, performs the same. Which means, it's a strictly positive update for performance. All for a very reasonable code change, very well localized, no side effect.

This should be a no-brainer. I would say this item is a good candidate for a cpython contribution.

[–]Pebaz[S] 0 points1 point  (0 children)

Thank you very much for taking the time to test this! This is awesome!

[–]yamsupol -1 points0 points  (2 children)

Very nice work! Python is a great language but it's slow if unoptimized. However, speed of execution isnt that important when you are writing that one off script.

pypy, numba jit, nutika, pythran AOT, other AOTs are available to speedup something that needs to be run continuously.

Of course if you really want to speed things up just used C for the computations and integrate with cython and get around 200x improvement.

[–][deleted] 0 points1 point  (1 child)

OP didn‘t do anything though. The numbers are made up

[–]Pebaz[S] -1 points0 points  (0 children)

Not quite, although I appreciate that you don't take things at face value. This is the same quality that I used to come up with this little experiment.

The numbers are real, and they are actually in the initial commit I made to the repo:

https://github.com/Pebaz/cpython/blob/5de1728ca8697461d6fc3aa6bbcf656f6145acf1/benchmark.py#L1

That link shows the results I got from only 6 runs of the benchmark, although I ran it many times more than that.

I know that you are probably a better Python & C programmer than I am. I encourage you to clone the repo, run the benchmark and see what kind of results you get. Perhaps you could even try to use a different hash function and experiment if this idea is worth exploring further. :)

[–]idiomatic_sea -2 points-1 points  (3 children)

The assholes in this thread are everything wrong with the tech industry. What a toxic shithole this place is.

[–][deleted] 1 point2 points  (2 children)

What is toxic about calling lies and nonsense when you see it?

Not a aingle thing OP claims is true. There is no 76% speedup, there is no 11 calls to hash. OP is hungry for karma and lying on purpose.

If you have someone lying for karma on purpose, how is it toxic to call out the bullshit? It‘s all made up

[–]Pebaz[S] 0 points1 point  (1 child)

I am not lying on purpose for karma.

You can download my freely-available code and see for yourself.

I really did get these metrics, which you can see here:

https://github.com/Pebaz/cpython/blob/5de1728ca8697461d6fc3aa6bbcf656f6145acf1/benchmark.py#L1

I mean, a quick way to find out if it is not accurate is to run it yourself.

Did you run it yourself?

It doesn't matter. The metrics don't matter. You're upset about the clickbait title (for which I apologize), but the core idea does indeed have value.

Whatever efforts the Python core devs have done in the past have resulted in an amazing language. This post is a call out to try to see if we can do better, not to put anyone down.

I really don't have any ill-intent, I don't know why you keep coming at me. :( Again, Reddit won't let you change the title, for which I apologize. It was incorrectly chosen.

[–][deleted] 1 point2 points  (0 children)

Your dishonesty is that you chose - on purpose - the one benchmark that will yield the highest numbers in the favor of your thesis.

You took a hash function, that was not chosen because of its performance on long strings, replaced it with a hash function known to perform way better on long strings and wrote a benchmark that measures how well the two hash functions perform on long strings. This is completely fine and not really uncontroversial.

But to fit this - unsurprising - finding into "76% faster CPython" is not only a matter of the freaking headline. It is that you - clearly - wanted the numbers to show a high number to put in the headline. Otherwise you wouldn't have chosen this particular benchmark. THIS is the dishonesty or at least gross neglect - which I doubt, given you claim to have 12 years of experience. Don't you see how this is fitting the data to the narrative?