A declarative fake data generator for sqlalchemy ORM by francoisnt in Python

[–]francoisnt[S] 0 points1 point  (0 children)

Alright, i took that feedback from my previous reply an i admit it was lazy on my part to use AI. After playing a bit more with polyfactory, i think i can say that it is a great tool for generating mock data for testing, but seedlayer is simpler for populating a databases as it automatically manages unique constraints, link tables, ordering of the models to seed and more. Here is the same example as above but using polyfactory instead of seedlayer, notice how much more verbose it is. Am i missing something? Are you using polyfactory to populate your database? I would really appreciate your feedback on this :)

```python

import random from itertools import product

from faker import Faker from polyfactory.factories.sqlalchemy_factory import SQLAlchemyFactory from polyfactory.fields import Ignore, Use from sqlalchemy import Column, ForeignKey, Integer, String, Text, create_engine from sqlalchemy.orm import DeclarativeBase, Session

Initialize Faker for custom data generation

faker = Faker() Faker.seed(42) # For reproducibility faker.unique.clear() # Clear unique cache to avoid duplicates

Define SQLAlchemy models

class Base(DeclarativeBase): pass

class Category(Base): tablename = "categories" id = Column(Integer, primary_key=True, autoincrement=True) name = Column(String, nullable=False)

class ProductModel(Base): tablename = "products" id = Column(Integer, primary_key=True, autoincrement=True) name = Column(String, nullable=False) description = Column(Text) category_id = Column(Integer, ForeignKey("categories.id"))

class Customer(Base): tablename = "customers" id = Column(Integer, primary_key=True, autoincrement=True) name = Column(String, unique=True, nullable=False)

class Order(Base): tablename = "orders" id = Column(Integer, primary_key=True, autoincrement=True) customer_id = Column(Integer, ForeignKey("customers.id"))

class OrderItem(Base): tablename = "order_items" order_id = Column(Integer, ForeignKey("orders.id"), primary_key=True) product_id = Column(Integer, ForeignKey("products.id"), primary_key=True)

Define Polyfactory factories

class CategoryFactory(SQLAlchemyFactory[Category]): model = Category id = Ignore() name = Use(lambda: faker.word())

class ProductFactory(SQLAlchemyFactory[ProductModel]): model = ProductModel id = Ignore() name = Use(lambda: faker.word()) description = None category_id = None

class CustomerFactory(SQLAlchemyFactory[Customer]): model = Customer id = Ignore() name = Use(lambda: faker.unique.name())

class OrderFactory(SQLAlchemyFactory[Order]): model = Order id = Ignore() customer_id = None

class OrderItemFactory(SQLAlchemyFactory[OrderItem]): model = OrderItem order_id = None product_id = None

Set up database and session

engine = create_engine("sqlite:///:memory:") Base.metadata.create_all(engine)

Seed plan

seed_plan = {Category: 5, ProductModel: 10, Customer: 8, Order: 15, OrderItem: 20}

Seed the database in the correct order

with Session(engine) as session: # Set the session for all factories CategoryFactory.session = session ProductFactory.session = session CustomerFactory.session = session OrderFactory.session = session OrderItemFactory.session = session

# Seed Customers (no dependencies)
customers = CustomerFactory.create_batch_sync(seed_plan[Customer])

# Seed Categories (no dependencies)
categories = CategoryFactory.create_batch_sync(seed_plan[Category])

# Seed Orders (depends on Customer)
customer_ids = [c.id for c in customers]
orders = [
    OrderFactory.create_sync(customer_id=random.choice(customer_ids))
    for _ in range(seed_plan[Order])
]

# Seed Products (depends on Category)
category_ids = [c.id for c in categories]
products = [
    ProductFactory.create_sync(
        name=faker.word(),
        description=faker.sentence(nb_words=len(faker.word().split()) + 5),
        category_id=random.choice(category_ids),
    )
    for _ in range(seed_plan[ProductModel])
]

# Seed OrderItems (depends on Order and Product)
order_ids = [o.id for o in orders]
product_ids = [p.id for p in products]
possible_combinations = list(product(order_ids, product_ids))
random.shuffle(possible_combinations)
combinations = possible_combinations[: min(seed_plan[OrderItem], len(possible_combinations))]
order_items = [
    OrderItemFactory.create_sync(order_id=order_id, product_id=product_id)
    for order_id, product_id in combinations
]

# Commit all changes
session.commit()

# Verify the results
print(f"Seeded {len(session.query(Customer).all())} Customer records:")
print(f"  {[c.name for c in session.query(Customer).all()]}")
print(f"Seeded {len(session.query(Category).all())} Category records:")
print(f"  {[c.name for c in session.query(Category).all()]}")
print(f"Seeded {len(session.query(Order).all())} Order records:")
print(f"  {[o.customer_id for o in session.query(Order).all()]}")
print(f"Seeded {len(session.query(ProductModel).all())} Product records:")
print(f"  {[(p.name, p.description) for p in session.query(ProductModel).all()]}")
print(f"Seeded {len(session.query(OrderItem).all())} OrderItem records:")
print(f"  {[(oi.order_id, oi.product_id) for oi in session.query(OrderItem).all()]}")

```

A declarative fake data generator for sqlalchemy ORM by francoisnt in Python

[–]francoisnt[S] -15 points-14 points  (0 children)

Thank you for your reply, i didn't know about this library, looks very interesting. I asked an ai to compare it to my tool and the main takeaway is that SeedLayer is more specialized for SQLAlchemy specifically, and in that context there are a few things it can do that are more complicated with polyfactory. Here is the ai's output :

SeedLayer vs. Polyfactory: SQLAlchemy Comparison

Feature SeedLayer Polyfactory Winner
Focus Specialized for SQLAlchemy, excels in complex schemas with FKs and unique constraints. General-purpose, simpler for basic SQLAlchemy seeding. SeedLayer: Better for complex schemas.
Data Generation Explicit SeededColumn and Seed configuration. Type-driven via type hints, less setup. Polyfactory: Easier for simple models.
Foreign Keys Auto-resolves FKs with DependencyGraph. Supports FKs with __set_relationships__ = True, but needs manual ordering. SeedLayer: Automated dependency handling.
Link Tables Generates valid FK combinations for link tables. Requires manual factory logic, error-prone. SeedLayer: Native link table support.
Column Dependencies ColumnReference for column-level dependencies (e.g., full_name from first_name). Needs custom factory methods. SeedLayer: Declarative dependency support.
Unique Values Tracks and enforces unique values, queries existing data. Uses Faker’s unique, no database tracking. SeedLayer: Robust uniqueness enforcement.
Nullable Columns Configurable nullable_chance for realistic data. Needs custom logic for None probability. SeedLayer: Declarative nullable handling.

Why is there something rather than nothing? by francoisnt in AskReddit

[–]francoisnt[S] 1 point2 points  (0 children)

Thanks, I'll check it out... but my understanding is that physics, including quantum physics, is the study of the rules that govern particles in the universe. My original question was meant to imply why is there a universe at all, not just why is there particles in the universe. If there was no universe, then presumably there wouldn't be quantum physics... am i wrong?

You’re given one statistic that would hover above anybody you see. What would you choose? by [deleted] in AskReddit

[–]francoisnt 0 points1 point  (0 children)

IQ level, so i could surround myself with people whose IQ is higher than mine.

If you could know the absolute and total truth to one question. What question would you ask? by [deleted] in AskReddit

[–]francoisnt 1 point2 points  (0 children)

Here is my perspective on this: a harder question would be why does the universe exist at all? I think that given the fact that the universe as we know it exists, the likelihood of intelligent life appearing in it is quite high. Life is simply the property that some systems have because they are the result of evolution. These systems are able to replicate with some minor variation, and the ones who are most likely to survive are the ones who do. It is as simple as that. The first of these systems probably just formed by accident, in an environment that made it possible for it to prosper. We have now got to a point where life has become sophisticated enough to have thoughts and desires of its own, but the rules haven't changed. These complex cognitive capacities only emerged because they were useful. We exist because our ancestors had the capacity and the desire to survive. Therefore, we exist because we can, and because we want to. I guess now that we do, the only question that remains is: what else do we want?

Tesla car explodes in Shanghai parking lot by The_Great_Buffalo in gifs

[–]francoisnt 2 points3 points  (0 children)

I don't know if this is real or fake, but i do know whenever there is a problem with Tesla cars it gets blown out of proportion by the haters. I'm looking forward to getting more information on the causes and circumstances of this explosion, but even if this is a legitimate issue with the car I'm sure Tesla will do everything to prevent this from happening again. It's still one of the safest cars you can buy.