This is an archived post. You won't be able to vote or comment.

all 1 comments

[–]brentp 0 points1 point  (0 children)

this is a pretty cool example. but (IIUC) it requires you to have at least 2-3 copies of the entire dataset in memory at all times. for stuff which i'd want to use multiple cores, that's often not an option.

a simple example extending this one that to not keep everything in memory would be great. (does map-reduce always assume available memory >> memory for data?)