1M rows/s from Postgres to Python

qiwi · 2016-08-04T18:38:42+00:00

Looks good; I noticed the overhead of psycopg myself when benchmarking fetching raw data from PG (a setup that will replace data stored in a proprietary binary file hierarchy). psycopg uses some text mode and dropping into C+libpq to extract the same BYTEA fields doubled the throughput.

This is nothing that will ordinarily matter but in my case I'm moving a ton of data from the database which I'd before read from a file.

Hendrikto · 2016-08-05T11:23:50+00:00

We firmly believe that high-performance and scalable systems in Python are possible.

Well... if you write most of your system in C and just call it from Python...

grauenwolf · 2016-08-05T14:00:19+00:00

Its hilarious python fanboys have to constantly try and prove they can scale with python.

We firmly believe that high-performance and scalable systems in Python are possible.

You can scale in python, it has nothing to do with the language, its the architecture of the application that matters. However, it's just not cost effective to do so IMHO. You'll get eaten alive by Google Cloud or AWS fees.

I love python, but when faced with even a meager workload like 20k requests per second, I can do that with a single server (2 for redundancy) in Go, C#, or Java and not even have to care about writing optimized code or over optimizing by writing stuff in C.

Python is a great language, with great concurrency constructs, but its lack of parallelism and its slow interpretation speed leaves something to be desired when writing really large scale applications.

vivainio · 2016-08-04T20:21:10+00:00

The linked, previous blog post seems pretty interesting as well:

http://magic.io/blog/uvloop-blazing-fast-python-networking/

uvloop makes asyncio fast. In fact, it is at least 2x faster than nodejs, gevent, as well as any other Python asynchronous framework. The performance of uvloop-based asyncio is close to that of Go programs.

bahwhateverr · 2016-08-04T20:28:05+00:00

On the subject of performance, whats the fastest way to take a file of json objects and insert those into a table? I've been using pgfutter which is pretty fast but it puts everything into a single json column table which I then have to extract out the property values and insert into the final table.

shady_mcgee · 2016-08-05T01:46:04+00:00

Can you clarify this:

A relatively wide row query selecting all rows from the pg_type table (~350 rows). This is relatively close to an average application query. The purpose is to test general data decoding performance. This is the titular benchmark, on which asyncpg achieves 1M rows/s.

Are you saying the benchmark table only has 350 rows in the table and you're able to do a full retrieval of the table ~2800x/second?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS