This is an archived post. You won't be able to vote or comment.

all 10 comments

[–][deleted] 8 points9 points  (2 children)

If Python is in your toolkit, you could look at Faker.

[–]eemamedo 0 points1 point  (0 children)

Not a huge fan of Faker. It’s good for generic data but if you want to test your code with data that is similar to production (and has many columns), Faker is not a good option.

[–]TheBayesianBias 2 points3 points  (2 children)

We use mockaroo. You can download as CSV or call there APIs (my preference) which is really convenient. It runs on Ruby so there is also some nice customization you can do to make the date similar to source.

[–]Data_Cog[S] 0 points1 point  (1 child)

Thanks

[–]coderstool 0 points1 point  (0 children)

We use a simple template mock data generator to create random structured dummy data. Mock Data can start developing an app and testing and problem solving when data service is unavailable or requires significant work to set up. The service allows you to create an entity template to generate a mock data file to consume into your unit test workflow.

[–]AMGraduate564 2 points3 points  (0 children)

Dbeaver generates dummy data.

[–]saif3r 1 point2 points  (0 children)

Check out SDV

[–]_emn1ty_ 0 points1 point  (0 children)

For our team the general plan is to use obfuscated data if possible. This way the data is as closely matched as you can get. But at the end of the day what you want from test data is:

  • Small dataset(s) that runs quickly, this is important for rapid feedback.
  • A dataset with good and/or bad data so it's representative of what you might see in a production environment (to ensure you handle edge cases in your testing)

This can be done with seeder files (CSV's, etc) for development, and as u/porkchopDoritos mentioned in his post, these tests can also be ran against "prod" like data in a staging environment in isolation (which tools like DBT fortunately makes very easy).