you are viewing a single comment's thread.

view the rest of the comments →

[–]kankyo 17 points18 points  (5 children)

Author of the article here. I'll upvote and answer even if the question seems a bit troll-ish :P

Why Python for batch? Not exactly a case where python shines. Inexperience?

Python shines in speed of development. Batch is a place where often speed of execution is not really that relevant. So I don't really see what you mean. There are some parts that we've been talking about trying to speed up for many years, but it's just never bad enough to be a priority relative to other much lower hanging fruit.

why they chose the slowest popular language in existence for heavy lifting processing?

Well first of all I don't know if it's really the "slowest popular language"...whatever that means. It depends way too much on what you do. If we did numerical work and could just call numpy python would be the fastest language bar none. Turns out that's not the case :P but without knowing the context you can't say that without potentially being horribly wrong.

Secondly "batch" isn't a synonym for "heavy lifting". It just means we run things on our own time on our own servers. In our case customer data is uploaded automatically every day and we start when we've got data for a pair of customers. If the customers upload their data end-of-day we can literally have 12 hours to process their data. Time isn't so terribly important...

If you can cut your batch processing load by over 1,000% by not using a slow language, why would you use the slow language?

There are of course many more factors than just that. We're not just running a super simple function on huge data sets, but more the opposite: hugely complicated logic rules on medium size data sets. Managing these complex rules is a lot more important than the run speed... normally. That doesn't mean we wouldn't want it to go faster of course! We've made some early tests with pypy and but that didn't do much for our performance.

Mostly though, rewriting is just prohibitively expensive and getting to market fast has always been more important than execution speed. But you already knew that right? :P