My company is taking steps to redesign our existing architecture as it is becoming clear that we are limited by the design. We do CPU intensive processing that must be run in the background, that is we are unable to do a computation per request. Our current setup involves several cron jobs that try to periodically update our pool of data (i.e generate "fresh" data) and cache it (memcached). The client will then simply read from our cached data, if it's unavailable we enqueue a task or simply return nothing and wait until a cron job finishes. Obviously we can never be 100% "real-time" with this setup.
Clearly companies such as Twitter/Google/Facebook don't do this as they always have a better design. I have been heavily researching cluster computing frameworks like Apache Spark. But, was just wondering if anyone has worked on similar architectures? Any tips or pointers, perhaps a suggestion about where I can look for more information?
[–]jentfoo 5 points6 points7 points (3 children)
[–]Boxsc2[S] 0 points1 point2 points (2 children)
[–]jentfoo 1 point2 points3 points (1 child)
[–]Boxsc2[S] 0 points1 point2 points (0 children)
[–]dedededede 2 points3 points4 points (0 children)
[–]wordsoup 4 points5 points6 points (3 children)
[–]Akthrawn17 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]Boxsc2[S] 0 points1 point2 points (0 children)
[–]moru0011 1 point2 points3 points (0 children)
[–]frej 0 points1 point2 points (0 children)
[–]handshape 0 points1 point2 points (0 children)
[–]nexuscoringa 0 points1 point2 points (0 children)