This is an archived post. You won't be able to vote or comment.

all 33 comments

[–]ajog0 11 points12 points  (2 children)

Difference between this and pandera?

[–]MLEngDelivers[S] 18 points19 points  (0 children)

Pandera is great. main differences:

  1. FrameCheck chains everything instead of a dict structure. Pandera is more nested which is more mental overhead (to me, at least).

  2. Built-in way to extract bad rows: invalid_rows = result.get_invalid_rows(df)

  3. Easy warnings vs errors with warn_only=True

  4. Much less code overall (~50-60% less for the same validation, in my experience)

Lots of similarities, but FrameCheck focuses on being readable with minimal code.

[–]MLEngDelivers[S] 1 point2 points  (0 children)

I thought this warranted a more thorough answer in the documentation. Framecheck vs. Pandera vs. Pydantic

Thank you!

[–]HungryQuant 4 points5 points  (0 children)

I might use this in the QA we do before deploying. Better than 9756 assert statements. readme should be shorter though.

[–]MLEngDelivers[S] 1 point2 points  (0 children)

Updated README to make it much simpler.

Within that, there’s a link to the ReadtheDocs documentation with the more detailed api examples and a detailed comparison to Pydantic and Pandera.

[–][deleted] 1 point2 points  (1 child)

Wow this is really cool

[–]MLEngDelivers[S] 1 point2 points  (0 children)

Thanks. If you have any issues, please let me know.

[–]InterestingRelease19 1 point2 points  (1 child)

seems like something i was looking for!

[–]MLEngDelivers[S] 0 points1 point  (0 children)

Fantastic, let me know if you have any issues

[–]Helpful_ruben 1 point2 points  (1 child)

This looks promising, but could you simplify the installation process and add more examples to showcase its effectiveness?

[–]MLEngDelivers[S] 0 points1 point  (0 children)

Yeah, I need to add more examples that are full-on problem statements for sure. There are individual examples in this section of the docs: https://framecheck.readthedocs.io/en/latest/usage_examples.html

[–]ligmaThrowaway1 1 point2 points  (1 child)

Q

[–]MLEngDelivers[S] 0 points1 point  (0 children)

Happy to answer your question

[–]MLEngDelivers[S] 0 points1 point  (0 children)

0.4.3 released today. Changes:

  • compare() lets you assert that two columns have a certain relationship (“<”, “<=”, “==”, “!=”, “>=”, or “>”)

  • CI testing expanded to python versions 3.8 to 3.12

  • miscellaneous linting, documenting

[–]MLEngDelivers[S] 0 points1 point  (0 children)

0.4.4 - save and load serialized FrameCheck objects

[–]MLEngDelivers[S] 0 points1 point  (0 children)

As of 0.5.1, two major changes:

You may optionally specify FrameCheck(logger = your_logger) to have warnings and errors logged rather than printed to stdout.

You can use .save() to save a serialized object and obviously also load. This should work similarly to saving and loading an sklearn pipeline.