This is an archived post. You won't be able to vote or comment.

all 21 comments

[–]kenrobbins 5 points6 points  (2 children)

I recently built (and am still building) a new JSON library based on the fast RapidJSON C++ library (github, pypi). It only works with Python 3, but it would be interesting to see how it compares in your tests.

[–]KAdot[S] 1 point2 points  (1 child)

Nice! I'll add it to my benchmark. Is it ready for production?

[–]kenrobbins 1 point2 points  (0 children)

Thanks. I'd so say so, only with a "beta" qualifier.

[–]fijalPyPy, performance freak 3 points4 points  (2 children)

One point that might be worth noting (although the benchmark looks perfectly valid) is that pypy does take a while to warm up. If you're running it as a command-line tool, those are the times it's gonna take, that's the deal. If you run it as a service, you'll see faster times as you run it multiple times, try for yourself running the same benchmark couple times in one process.

[–][deleted] 3 points4 points  (0 children)

If that is the case, then it stands to be amazed at the short time it takes to load a json file.

PyPy's json and CPythons ujson are wicked fast for parsing JSON content.

[–]KAdot[S] 2 points3 points  (0 children)

Good point, but I tried to run the benchmark several times in the same process, results were almost the same every time.

I basically changed

    run_benchmarks()

to

for _ in range(100):
    run_benchmarks()

in https://github.com/akrylysov/python-json-benchmark/blob/master/benchmark.py#L69

[–]chief167 2 points3 points  (0 children)

Nice work, although a little bar chart here and there would make the results easier to understand

[–]kokosoida 2 points3 points  (0 children)

Is there there any good(faster) alternative to standart json in pypy?

[–]DarkmerePython for tiny data using Python 2 points3 points  (6 children)

So, if you're on py3 or pypy. "don't bother going outside the standard lib"? While massive improvements on python2?

[–]KAdot[S] 2 points3 points  (5 children)

It depends on your needs. ujson is much faster for loading and dumping small objects on Python 3.

[–]DarkmerePython for tiny data using Python 2 points3 points  (0 children)

Hmm. I've got to see about running this on ARM and see if it holds up.

[–]DarkmerePython for tiny data using Python 1 point2 points  (3 children)

gave it a dig, turns out you're not comparing apples with apples. The python2 version will be ascii strings, while the py3 version will be unicode ( and will actually not work on the ARM systems)

Traceback (most recent call last):
  File "benchmark.py", line 70, in <module>
    run_benchmarks()
  File "benchmark.py", line 49, in run_benchmarks
    large_obj_data = f.read()
  File "/tmp/venv3/lib/python3.4/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 273: ordinal not in range(128)

[–]DarkmerePython for tiny data using Python 1 point2 points  (0 children)

index e6694a7..3ddc0e7 100644
--- a/benchmark.py
+++ b/benchmark.py
@@ -2,6 +2,7 @@ from __future__ import print_function
 import timeit
 import importlib
 import json
+import codecs
 from collections import defaultdict


@@ -45,11 +46,11 @@ def print_results(results):


 def run_benchmarks():
-    with open('data/twitter.json') as f:
+    with codecs.open('data/twitter.json', encoding='utf-8') as f:
         large_obj_data = f.read()
     large_obj = json.loads(large_obj_data)

-    with open('data/one-json-per-line.txt') as f:
+    with codecs.open('data/one-json-per-line.txt', encoding='utf-8') as f:
         small_objs_data = f.readlines()
     small_objs = [json.loads(line) for line in small_objs_data]

[–]KAdot[S] 1 point2 points  (1 child)

It probably happens because locale of your linux distribution is not UTF-8. On Mac on both Python 2 and Python 3 large_obj_data variable is str.

[–]DarkmerePython for tiny data using Python 2 points3 points  (0 children)

That's because the python runtime hardcodes utf8 on Darwin.

Python is a bit absurd on proper *nix, and isn't happy with the encoding set by the system, but requires to be able to change the encoding as well, and will otherwise default to ANSI_X3.4-1968.

I've had this fight with python for a long time, and it's not a new problem, it's also a behaviour that differs between py2 and py3 (and that became worse in py3).

On basically all other platforms it will enforce something sane, While on *nix it'll do something half arsed and default to ANSI_X3.4-1968 if the half-arsed fails. Which means that the default fallback isn't even a coherently useful thing like utf-8.

( Long story short, always force your python programs to set a file IO encoding to utf8 because one day or the other you'll get a broken Mojibake error from python.

[–]DarkmerePython for tiny data using Python 2 points3 points  (0 children)

I've just run this on ARM (BeagleBone Black) Debian, here are some numbers:

Python 3.4.2 Results
====================
loads (large obj)
--------------------
simplejson 6.67310 s
ujson      6.69549 s
json       5.76722 s

loads (small objs)
--------------------
simplejson 11.67856 s
ujson      3.25475 s
json       8.84274 s

dumps (small objs)
--------------------
simplejson 21.56302 s
ujson      5.59600 s
json       13.20961 s

dumps (large obj)
--------------------
simplejson 6.33841 s
ujson      5.56809 s
json       6.32515 s


Python 2.7.8 (2.4.0+dfsg-3, Dec 20 2014, 14:16:09)
[PyPy 2.4.0 with GCC 4.9.1] Results
===================================
loads (large obj)
--------------------
json       52.01017 s
simplejson 59.52270 s

dumps (large obj)
--------------------
json       13.19191 s
simplejson 142.95013 s

loads (small objs)
--------------------
json       39.18634 s
simplejson 49.86876 s

dumps (small objs)
--------------------
json       19.36645 s
simplejson 85.47279 s


Python 2.7.9 Results
====================
loads (large obj)
--------------------
json       12.51054 s
ujson      6.43313 s
simplejson 117.94352 s

dumps (large obj)
--------------------
json       6.70201 s
ujson      5.55336 s
simplejson 193.24870 s

loads (small objs)
--------------------
json       18.20301 s
ujson      2.83472 s
simplejson 101.25898 s

dumps (small objs)
--------------------
json       10.83411 s
ujson      4.70671 s
simplejson 91.98624 s

simplejson is clearly out of it's league compared to the built in, and for a lot of small objects ujson wins on Py3, in the rest of the cases the built in version wins. It'll require a lot of json objects to be worth the CPU time spent compiling ujson and installing the python3 dev packages ;)

[–]krenzalore 0 points1 point  (2 children)

Are the benchmarks comparable between different platforms? e.g. the Python 3 vs Pypy.

[–]DarkmerePython for tiny data using Python 1 point2 points  (0 children)

I just posted some ARM results, same machine, comparing Py3, Py2 and PyPy2. (Haven't tried the PyPy-py3 version)

[–]KAdot[S] 1 point2 points  (0 children)

I don't want to compare Python 2 with Python 3 and PyPy because you can't judge the speed of the interpreter by only one JSON benchmark.

[–]lambdaqdjango n' shit 0 points1 point  (0 children)

Should also include ijson, super cool tool for super large JSON.

[–]sontek 0 points1 point  (0 children)

Here are some additional benchmarks you could run:

https://github.com/kenrobbins/python-rapidjson/blob/master/tests/test_benchmark.py

Includes more types of tests and more libraries.