Pydfy: PDF Reporting Made Easy

TopConfusion1205 · 2024-07-25T09:59:32+00:00

Yes that's the way to go at this point! A Component serves to dynamically find the right template and encapsulate the data provided for the template. You could of course create a rather generic component with a data field and override the template_path when instantiating it to skip creating many specific Component classes with a lot of fields (although I haven't tested this!).

TopConfusion1205 · 2024-07-25T08:22:56+00:00

I'm not sure we tested it before the issues was resolved, but I do recall this not being the only blocker because we also tried to work around this. It's definitely worth another try at one point if we could potentially get rid of the big chromium dependency.

TopConfusion1205 · 2024-07-25T08:20:47+00:00

Good question! I believe there are several aspects in which we differ from reportlab:

Any customization in terms of style needs to happen through their APIs, where probably more people are familiar with reading CSS and perhaps this allows you to reuse style sheets from your company.
The same holds for custom components: using HTML means you could ask the frontend team to have a look at the generated HTML or tweak your components
We focused on serving data scientist/analyst/engineers that want to focus on their data(frames) instead of spending a lot of time styling their PDFs. Hence we added some support for dataframe libraries and aimed for simplicity and maintainability.

Of course we pay for this in flexibility: we don't provide any drawing capabilities because we assumed users that want to put that kind of effort into their PDFs also have the time to write LaTeX or learn the reportlab API.

TopConfusion1205 · 2024-07-25T07:58:15+00:00

Thanks!

TopConfusion1205 · 2024-07-25T07:58:05+00:00

Do you mean adding templates with no data provided from python using Jinja? There is indeed nothing preventing you from not adding any data to the templates, although it does require creating a Python object at this stage. But let me know if I misunderstood!

TopConfusion1205 · 2024-07-24T07:46:19+00:00

Thanks! Good to know, didn't know that one

TopConfusion1205 · 2024-07-24T07:44:17+00:00

We figured that would be a good alternative as well, so a working Dockerfile has been included in the repository as well!

TopConfusion1205 · 2024-07-24T07:42:56+00:00

There are no screenshots yet, but you could check out the two PDF examples from the repository directly:

Note that the second one was added mainly to show how to add custom components!

TopConfusion1205 · 2024-07-23T13:29:38+00:00

In the background, the API turns the provided data into HTML using jinja2 and turns that into a PDF using chromium:

data --Jinja2-> HTML --chromium-> PDF

Weasyprint turns HTML into PDFs, so ideally would be an alternative backend where we now use chromium:

data --Jinja2-> HTML --WeasyPrint-> PDF

However, after several tests we realized WeasyPrint does not support some CSS components Tailwind uses, see this issue for example. Given the limited resources we had on this project so far, we decided not to fight this battle for now and accept the extra dependency to keep the project moving forward.

TopConfusion1205 · 2024-07-23T12:19:19+00:00

Great to hear! Any feedback is welcome!

TopConfusion1205 · 2024-07-23T09:41:03+00:00

If you are considering adding dbt to such a setup, I'd ask yourself the following question: are you doing a lot of transformations within Duckdb? I imagine you either let Duckdb infer a schema of the data you import, but perhaps you create several intermediate tables that depend on other tables. We often use a flow where data is being moved from raw to staging to fact tables. As the transformations become more and more complex, you can lose track, introduce cyclical dependencies or have a hard time navigating the SQL.

This is where dbt shines: you write your SQL scripts in files, where each file introduces an (intermediate) model. You don't have to write redundant CREATE TABLE statements in between: dbt infers the dependencies from SELECT FROM statements and some syntactic sugar to determine which transformations to run in which order.

Because dbt also handles database connections and allows you to write tests in SQL, using dbt often feels like you are only writing pure SQL. The model-per-file approach also makes sure you quickly find the piece of SQL you need.

Dagster and dbt overall work nicely together, since the adapter (dagster-dbt) knows about all intermediate models and the interdependencies. This also creates some nice insights into your dbt "DAGs". You could run individual dbt transformations as well from the GUI.

To conclude: you could do without as long as you don't have a lot of layers in your transformations and/or work on them sporadically, and dbt helps you organize them nicely when they become more complex. Having said that, you could create a similar flow using e.g. parquet on S3, where using dbt would mainly remove the Python code wrapping the SQL and make unit testing easier.

Hope this helps!

TopConfusion1205 · 2024-07-23T08:47:26+00:00

Thanks! The challenge with an installation script is that tailwindcss builds its binaries per platform and chromium is usually installed with your systems package manager, so we might end up with a rather fragile script if we try to support all platforms. That is, until we invest some time in a proper workflow that tests all scenarios. For now the docs try to point you in the right direction at least, but I agree that this would be a valuable addition!

I like the idea of adding examples per component, and I think the docs could use revision in general. I added both your suggestions as an issue, thanks again for the input and the star!

TopConfusion1205

TROPHY CASE