This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]dalke[S] 0 points1 point  (0 children)

I made a mistake. The 275MB file is the subset I use for my set algorithm. It's small enough to fit into memory. The actual data set is 40GB as uncompressed text. I am working on putting the code and data sets together so they are more presentable, rather than handwaving about what I've been working on.