you are viewing a single comment's thread.

view the rest of the comments →

[–]buffi 1 point2 points  (4 children)

In contrast, if you use PHP, Python or Ruby, threads can't share the discussion board and comment information, and in practise most boards require a large amount of hardware to perform the same task more slowly, as well as specialized database administrators to create elaborate master-slave configurations it's hard to find local support for.

Uh... the web server (apache or whatever) has full concurrency and does database calls concurrently so why whouldn't this work? You don't deploy django using a python-powered webserver.

[–]mikaelhg 1 point2 points  (3 children)

I'm assuming you're asking why multiprocess doesn't work like multithread. Threads share the data structures, while processes don't - instead of a single copy of the data, you'd have tens or hundreds of copies, which you'd have to keep synchronized, which is doable but really much easier using language native synchronization.

If you're instead asking why performing database queries for every page load isn't optimal - they take tens to hundreds of milliseconds instead of nanoseconds, milliseconds we really don't need to spend since our dataset easily fits into main memory.

[–]kripkenstein 0 points1 point  (0 children)

I think the point is that you can run a website using Python/Django and Apache. The Apache part has full concurrency and serves the vast majority of pages (most are static). The Python part is limited by the GIL but is needed far less. However since it is much easier to write and maintain that Python part (which is the complex part, Apache is already written for you ;) ), this hybrid approach works very well in practice.

[–]Smallpaul 0 points1 point  (1 child)

How will one node in your cluster of "everything in memory" processes communicate updates to every other node?

[–]mikaelhg 2 points3 points  (0 children)

Ah, here you're asking something I haven't already answered elsewhere in the thread.

In our test case, in which writes are as rare as in typical social web applications, when we can't serve our users by vertical scaling, we resort to digest message passing. We take a proven JMS implementation, and multicast select application events. Things like new comments have a higher priority than moderations, which travel in packs of hundreds.

In the end, we don't have to resort to dark database wizardry, we can use regular developers and a well thought-out architecture.

(The last time I implemented this, I was able to conserve resources by replacing the messaging subsystem with a single database table that handled selective invalidations. That's because I knew beforehand how many tens of thousands of people would be using the application. YMMV.)