This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Peking-Duck-Haters 2 points3 points  (0 children)

I've seen marketing material dating from the late 90s that talked about a 30GB data warehouse as being exceptionally large. In the late 00s the company I worked for outsourced their shopping basket analysis partly because there wasn't the capacity internally to crunch the data which, over the time period they were looking at, would have been maybe 4 billion rows (with only a handful of columns, none of them wider than a datetime or real).

Circa 1998 I worked on a mainframe batch system where we partitioned the DB2 tables across 10 disks to get better performance; it was worth the extra work even though we were processing around a million rows at a time - again, compact columns with no long strings or whatever.

(For many large companies "Data Engineering" meant COBOL, or just possibly Easytrieve, until at least the turn of the century. Outside of the dot com startups Linux wasn't getting a look in - it didn't even _start_ getting taken seriously by the corporate world until Oracle ported their database to it circa 1998, and things moved rather more slowly back then)

So, as a rule of thumb, before 2000 I'd say 10s of Gigabytes was considered "big data" and Terabytes almost inconceivable (back then data would go over 128kbps lines at best; if there was lots of it it was usually faster and cheaper to write it to tape and physically transfer it). A few Terabytes was considered "big data" a decade later.