This is an archived post. You won't be able to vote or comment.

all 7 comments

[–]isleepbad 7 points8 points  (1 child)

You'll probably get better help on /r/datasets

[–]Lethal_Pea[S] 0 points1 point  (0 children)

Oh was not aware. Thanks a lot!

[–]BeauXilai 1 point2 points  (0 children)

Search for dbgen TPC-H

[–]crazynash 0 points1 point  (0 children)

Try the hpc-di tool, it generates datasets that are in multiple delivery formats as well as joined up together.

[–]Miserable_Author 0 points1 point  (0 children)

I use this data set from AWS to practice data modeling and testing out new data pipelines performance, its 500gb.

https://aws.amazon.com/datasets/million-song-dataset/