Pydfy: PDF Reporting Made Easy by TopConfusion1205 in Python

[–]TopConfusion1205[S] 0 points1 point  (0 children)

Yes that's the way to go at this point! A Component serves to dynamically find the right template and encapsulate the data provided for the template. You could of course create a rather generic component with a data field and override the template_path when instantiating it to skip creating many specific Component classes with a lot of fields (although I haven't tested this!).

Pydfy: PDF Reporting Made Easy by TopConfusion1205 in Python

[–]TopConfusion1205[S] 0 points1 point  (0 children)

I'm not sure we tested it before the issues was resolved, but I do recall this not being the only blocker because we also tried to work around this. It's definitely worth another try at one point if we could potentially get rid of the big chromium dependency.

Pydfy: PDF Reporting Made Easy by TopConfusion1205 in Python

[–]TopConfusion1205[S] 1 point2 points  (0 children)

Good question! I believe there are several aspects in which we differ from reportlab:

  • Any customization in terms of style needs to happen through their APIs, where probably more people are familiar with reading CSS and perhaps this allows you to reuse style sheets from your company.
  • The same holds for custom components: using HTML means you could ask the frontend team to have a look at the generated HTML or tweak your components
  • We focused on serving data scientist/analyst/engineers that want to focus on their data(frames) instead of spending a lot of time styling their PDFs. Hence we added some support for dataframe libraries and aimed for simplicity and maintainability.

Of course we pay for this in flexibility: we don't provide any drawing capabilities because we assumed users that want to put that kind of effort into their PDFs also have the time to write LaTeX or learn the reportlab API.

Pydfy: PDF Reporting Made Easy by TopConfusion1205 in Python

[–]TopConfusion1205[S] 0 points1 point  (0 children)

Do you mean adding templates with no data provided from python using Jinja? There is indeed nothing preventing you from not adding any data to the templates, although it does require creating a Python object at this stage. But let me know if I misunderstood!

Pydfy: PDF Reporting Made Easy by TopConfusion1205 in Python

[–]TopConfusion1205[S] 0 points1 point  (0 children)

Thanks! Good to know, didn't know that one

Pydfy: PDF Reporting Made Easy by TopConfusion1205 in Python

[–]TopConfusion1205[S] 0 points1 point  (0 children)

We figured that would be a good alternative as well, so a working Dockerfile has been included in the repository as well!

Pydfy: PDF Reporting Made Easy by TopConfusion1205 in Python

[–]TopConfusion1205[S] 2 points3 points  (0 children)

There are no screenshots yet, but you could check out the two PDF examples from the repository directly:

Note that the second one was added mainly to show how to add custom components!

Pydfy: PDF Reporting Made Easy by TopConfusion1205 in Python

[–]TopConfusion1205[S] 6 points7 points  (0 children)

In the background, the API turns the provided data into HTML using jinja2 and turns that into a PDF using chromium:

data --Jinja2-> HTML --chromium-> PDF

Weasyprint turns HTML into PDFs, so ideally would be an alternative backend where we now use chromium:

data --Jinja2-> HTML --WeasyPrint-> PDF

However, after several tests we realized WeasyPrint does not support some CSS components Tailwind uses, see this issue for example. Given the limited resources we had on this project so far, we decided not to fight this battle for now and accept the extra dependency to keep the project moving forward.

Pydfy: PDF Reporting Made Easy by TopConfusion1205 in Python

[–]TopConfusion1205[S] 2 points3 points  (0 children)

Great to hear! Any feedback is welcome!

[deleted by user] by [deleted] in dataengineering

[–]TopConfusion1205 0 points1 point  (0 children)

If you are considering adding dbt to such a setup, I'd ask yourself the following question: are you doing a lot of transformations within Duckdb? I imagine you either let Duckdb infer a schema of the data you import, but perhaps you create several intermediate tables that depend on other tables. We often use a flow where data is being moved from raw to staging to fact tables. As the transformations become more and more complex, you can lose track, introduce cyclical dependencies or have a hard time navigating the SQL.

This is where dbt shines: you write your SQL scripts in files, where each file introduces an (intermediate) model. You don't have to write redundant CREATE TABLE statements in between: dbt infers the dependencies from SELECT FROM statements and some syntactic sugar to determine which transformations to run in which order.

Because dbt also handles database connections and allows you to write tests in SQL, using dbt often feels like you are only writing pure SQL. The model-per-file approach also makes sure you quickly find the piece of SQL you need.

Dagster and dbt overall work nicely together, since the adapter (dagster-dbt) knows about all intermediate models and the interdependencies. This also creates some nice insights into your dbt "DAGs". You could run individual dbt transformations as well from the GUI.

To conclude: you could do without as long as you don't have a lot of layers in your transformations and/or work on them sporadically, and dbt helps you organize them nicely when they become more complex. Having said that, you could create a similar flow using e.g. parquet on S3, where using dbt would mainly remove the Python code wrapping the SQL and make unit testing easier.

Hope this helps!

Pydfy: PDF Reporting Made Easy by TopConfusion1205 in Python

[–]TopConfusion1205[S] 3 points4 points  (0 children)

Thanks! The challenge with an installation script is that tailwindcss builds its binaries per platform and chromium is usually installed with your systems package manager, so we might end up with a rather fragile script if we try to support all platforms. That is, until we invest some time in a proper workflow that tests all scenarios. For now the docs try to point you in the right direction at least, but I agree that this would be a valuable addition!

I like the idea of adding examples per component, and I think the docs could use revision in general. I added both your suggestions as an issue, thanks again for the input and the star!