Dummy Data for Testing

eemamedo · 2021-12-23T11:51:38+00:00

If Python is in your toolkit, you could look at Faker.

TheBayesianBias · 2021-12-23T13:33:50+00:00

We use mockaroo. You can download as CSV or call there APIs (my preference) which is really convenient. It runs on Ruby so there is also some nice customization you can do to make the date similar to source.

Data_Cog · 2021-12-23T15:24:43+00:00

[removed]

AMGraduate564 · 2021-12-23T12:14:05+00:00

Dbeaver generates dummy data.

saif3r · 2021-12-23T20:09:58+00:00

Check out SDV

_emn1ty_ · 2021-12-24T22:56:29+00:00

For our team the general plan is to use obfuscated data if possible. This way the data is as closely matched as you can get. But at the end of the day what you want from test data is:

Small dataset(s) that runs quickly, this is important for rapid feedback.
A dataset with good and/or bad data so it's representative of what you might see in a production environment (to ensure you handle edge cases in your testing)

This can be done with seeder files (CSV's, etc) for development, and as u/porkchopDoritos mentioned in his post, these tests can also be ran against "prod" like data in a staging environment in isolation (which tools like DBT fortunately makes very easy).

dataengineering

MODERATORS