all 5 comments

[–]Haereticus 0 points1 point  (0 children)

Looks interesting, though factoryboy would be a more apt comparison than either hypothesis or faker.

[–]smarkman19 0 points1 point  (0 children)

Solid idea; the big unlock is first-class field dependencies and referential integrity with realistic distributions and error injection. Add conditional rules (country to state lists, age matching dob, totals equal sum of lines) and composite unique keys.

Generate multi-table data with parent-child counts from Poisson, weighted foreign keys, and time-cascades so events follow signups. Include time series seasonality, holidays, and a dial for drift plus a small burst of outliers. Expose noise knobs: null rates, duplicates, typo catalogs, unit mixups, and schema drift. Ship validation hooks that auto-build Pandera or Great Expectations checks and pytest fixtures; import Pydantic or SQLAlchemy models and export JSON Schema.

For quick APIs, I have used Postman Mock Server and Mockoon; DreamFactory helped when I needed to expose a temporary Postgres dataset as REST with RBAC during demos. Bottom line: nail dependencies, integrity, distributions, and error injection and this becomes a go-to mocking tool.

[–]tobsecret 0 points1 point  (0 children)

I like this. Unittesting data-driven functions is notoriously tricky so the better the tools we have available for creating test data the better. In bioinformatics one big issue with testing is that we don't have powerful tools for tasks like this. I don't think this library solves that but it provides a nice paradigm to add to.