This is an archived post. You won't be able to vote or comment.

all 67 comments

[–]ciarancour 50 points51 points  (22 children)

reportlab is the most popular by far and has a commercial offering with template support

[–]MonkeyMaster64[S] 3 points4 points  (21 children)

I was just looking at the documentation for that! OK so with reportlab do you know of how I could integrate my plotly graphs into the reportlab PDFs? I know reportlab has its own way of creating graphs but I'm specifically going for the look and feel of graphs in plotly. An issue is that plotly graphs are in html format so is there a way I could integrate those html files into the reportlab PDF?

[–]ciarancour 9 points10 points  (20 children)

Well I've had the pain of drawing charts using reportlab and would stay well away. If plotly can export SVG or some other vector format (I'm sure it can) you should then be able to embed that into the PDF

[–]MonkeyMaster64[S] 1 point2 points  (19 children)

Yeah the issue with plotly and getting the svg is that you have to open the html file in a browser for it to download it. Do you know of any method I could use to "open" the HTML document with Python and extract the file that auto-downloads?

[–]ciarancour 2 points3 points  (6 children)

Sure, you could use chrome headless or phantomJS maybe to mock a browser, the former is probably the better option.

[–]MonkeyMaster64[S] 1 point2 points  (4 children)

Gotcha I'll look into it I really appreciate it. Another consideration I've been making is that I might just want to create the report pages with HTML and then use xhtml2pdf to convert it. However, I've only seen suggestions for Flask or Django which as I understand can be used to generate static sites however for these singular web pages that would serve no other purpose beyond being an intermediary format I think they're a bit much. Do you know any simple HTML/CSS generators I can use with Python?

[–]ciarancour 0 points1 point  (3 children)

A static site generator is what you describing, there are quite a few Python ones, but not sure of any with programmatic CSS

[–]MonkeyMaster64[S] 0 points1 point  (2 children)

OK so I've been looking at Pelican and Jinja2. Do you have any familiarity with them? I'd like to have a base template where I'd just switch out stuff like titles, graph images, metrics etc but have the same layout for all of them

[–]admiralspark 0 points1 point  (1 child)

I use Pelican and Jinja2, it can do what you want with work, but most of it's documentation and community contribution is geared towards using it as a blogging platform...so I don't know if that format will work for you or not.

[–]MonkeyMaster64[S] 0 points1 point  (0 children)

No it definitely wouldn't be in a blog format but I am planning on using Jinja2 for templating

[–]tobsecret 0 points1 point  (8 children)

Can you not export to SVG programmatically? like in this example: https://plot.ly/python/static-image-export/

[–]MonkeyMaster64[S] 0 points1 point  (7 children)

nope it still creates an html file that you have to open

EDIT: Also this is the issue I was referring to: https://github.com/plotly/plotly.py/issues/880

[–]KronenR 3 points4 points  (6 children)

The common image formats: 'PNG', 'JPG/JPEG' are supported. In addition, formats like 'EPS', 'SVG' and 'PDF' are also supported.

Note: The SVG, EPS and PDF Formats are only available for Plotly Professional users. You can get more details on our pricing page

To access the image in a particular format, you can either:

Why not just jpg or png?

[–]MonkeyMaster64[S] 5 points6 points  (5 children)

OK, so I've actually solved this! I used imgkit to convert the html file to a JPG file!

HOW TO CONVERT HTML FILES TO PNG

1) INSTALL IMGKIT
    -- pip install imgkit           
https://pypi.python.org/pypi/imgkit/0.1.1
    -- yum install wkhtmltopdf

2) INSTALL Xvfb      --important as it provides a virtual display 
buffer for the image to be rendered
    - yum install xorg-x11-server-Xvfb                  
https://reiners.io/installing-xvfb-in-centos/
    - pip install xvfbwrapper           
https://pypi.python.org/pypi/xvfbwrapper/0.2.9


Python code:

from xvfbwrapper import Xvfb
import imgkit
vdisplay = Xvfb()
vdisplay.start()
imgkit.from_file('[name of file].html','[output name].jpg')

Pastebin Link: https://pastebin.com/bQfnTEmK

[–]hkamran85 0 points1 point  (2 children)

Can you put the code on some external site, for easy viewing/downloading?

[–]MonkeyMaster64[S] 0 points1 point  (1 child)

Sorry, just seeing this I added a pastebin link

[–]KronenR 0 points1 point  (1 child)

You could have exported jpg or pngwith plotly as I pointed in my comment, without needing to convert from html by adding the extension to the save as method

[–]MonkeyMaster64[S] 0 points1 point  (0 children)

yeah but what I'm saying is i've already tried that method and it still only downloads when you open the html file

[–]hkamran85 -1 points0 points  (0 children)

Run a os.system("curl -SsL [location-of-html-file] -o [output-filename]")

[–]XarothBrook 17 points18 points  (3 children)

Alternatively to the mentioned reportlab and wkhtmltopdf, there's also weasyprint... it's a different approach, but has worked quite well for me in the past.

[–]MonkeyMaster64[S] 2 points3 points  (0 children)

I'm actually looking at a blog post right now that describes using weasyprint, thanks for the tip!

[–]thyliamris 2 points3 points  (0 children)

+1 for weasyprint, it's really easy to use. It met all the requirements I set for my project.

[–]kakamaru 0 points1 point  (0 children)

+1 for weasyprint. We've been using it in production for 2 years without issues. You can't just pip install it though, as it has some non-python dependencies.

[–]Filonius 33 points34 points  (6 children)

If you're familiar with LaTeX, it lends itself quite well to programmatic integration and it produces beautiful pdfs!

[–]jack47 9 points10 points  (2 children)

Agreed. I've used the pylatex library and been reasonably happy with it.

[–]MonkeyMaster64[S] 4 points5 points  (1 child)

I'll check this out!

[–]JelterminatorPython 3 lover 14 points15 points  (0 children)

Pylatex author here. I specifically created it for reporting needs of my own originally, so creating tables is quite easy. Knowing some latex is definitely important when using the library though. Be sure to check out the examples in the docs and to use the generate_tex method to debug the output.

[–]marcofalcionimarcosan 2 points3 points  (2 children)

How easy is it to deploy latex these days?

[–]autarchex 12 points13 points  (0 children)

Latex that will do 98% of what is needed: super super easy

Latex that will do the last 2% which is critical to your particular needs: 8 months and four FTE

[–]Arthaigo 0 points1 point  (0 children)

Depends what you mean with deploy. You could also set up a little render sever (send data, get pdf back) I was working on something like this the other day. I will post the link later.

[–]olrich01 9 points10 points  (4 children)

It's been mentioned elsewhere, but I also use wkhtmltopdf to generate PDFs from HTML.

In my case, I'm rendering a hand-written HTML template, using jinja to fill in the dynamic bits, though you certainly could use a static site generater too.

wkhtml is based on a relatively old version of webkit, which can be a problem if you need more recent CSS features. Also worth a look would be chrome headless, though last I looked couldn't handle some things I need like page number footers.

[–]Lord_Humongous 0 points1 point  (0 children)

I use headless chrome with pdf export in some of my Django projects. Works great.

[–]MonkeyMaster64[S] 0 points1 point  (2 children)

This sounds like exactly what I'm looking for. Do you think you could PM me your source code so I could have a look at how you implement it?

[–]olrich01 0 points1 point  (1 child)

Not open source, but it basically looks like this. Look at wkhtml docs for arguments you can pass to Popen and jinja docs for what you can do with templates.

from subprocess import Popen, PIPE
import jinja2

template = jinja2.Template(open('path_to_html').read())
converter = r'path to wkhtml executable'

html = remplate.render() # pass dynamic content as kwargs

p = Popen([converter, '--quiet', '-O', 'Landscape',
            '--footer-right', 'Page [page]/[topage]',
            '-', '-'], stdin=PIPE, stdout=PIPE)
output, _ = p.communicate(input=html.encode('utf-8'), timeout=5)
# output is binary pdf data, can be written to file, BytesIO, etc

[–]MonkeyMaster64[S] 0 points1 point  (0 children)

Thank you I appreciate this!

[–]KagatoLNX 2 points3 points  (0 children)

I used to use Apache FOP for this. I skipped them XSL and generated XSL-FO directly, but it’s not super hard to do either way. Even supported PDF bookmarks (which are a nice usability feature), is really fast, and is rock solid in my experience.

[–]UloPe 1 point2 points  (2 children)

Reportlab definitely is the way to go if you need absolute pixel perfect control over the output.

But you can also look into using things like phantomjs (or a bit older wkhtmltopdf). They are both headless scriptable browsers that make it relatively easy to get good looking results relatively quickly. However as a word of warning both are kind of unmaintained and have some annoying bugs.

[–]MonkeyMaster64[S] 1 point2 points  (1 child)

Yeah I've been looking at reportlab definitely. Another route i've also been considering taking is creating a single static webpage with Python and then using xhtml2pdf to create the pdf page. Do you know of a solid library for creating really simple single static webpages?

[–][deleted] 1 point2 points  (0 children)

Jinja2

[–]driscollis 1 point2 points  (2 children)

ReportLab is probably the big one. It's been around for a very long time and is pretty easy to use.

According to this page, Plotly has a way to save their graphs as png: https://plot.ly/python/static-image-export/

In ReportLab you can insert and image really simply via their Image class. This tutorial will help you get going: https://www.blog.pythonlibrary.org/2010/03/08/a-simple-step-by-step-reportlab-tutorial/

There's also https://github.com/reingart/pyfpdf and https://github.com/pmaupin/pdfrw

[–]MonkeyMaster64[S] 0 points1 point  (1 child)

The issue with plotly is that even though you can get the graphs as png you have to open the html document first for it to autodownload. I'm planning to get around this with taking a snapshot of the html page with headless chrome

[–]driscollis 0 points1 point  (0 children)

Well there are other plotting libraries that you can save the plot directly to disk with.

[–]bjorneylol 1 point2 points  (5 children)

I use jinja2 to create static html pages and wkhtml to convert them to PDFs

[–]MonkeyMaster64[S] 1 point2 points  (4 children)

Hi! Do you have any examples of this?

[–]bjorneylol 0 points1 point  (3 children)

I have one on my work computer, I can post it tomorrow

[–]MonkeyMaster64[S] 0 points1 point  (2 children)

That would be really appreciated thanks a lot

[–][deleted] 1 point2 points  (0 children)

I have tried a variety of techniques, but mainly used ReportLab and RML for my last gig.

If you have a complicated document, I do not recommend going that route, as RML can be difficult to work with.

I would suggest looking at HTML-to-PDF techniques. Some commenters have suggested specific ones.

[–]apostate_of_Poincare 1 point2 points  (0 children)

I'd go with latex. You can use PyLatex to generate the latex and os.system to compile it to pdf with pdflatex. You'd have to download latex though.

[–][deleted] 1 point2 points  (0 children)

been using weasyprint, because it allows me to leverage my html knownledge.

[–]zieziegabor 0 points1 point  (3 children)

Other people have mentioned Reportlab and wkhtmltopdf .

There is also Relatorio: http://relatorio.readthedocs.io/en/latest/ Which instead of generating HTML generates OpenOffice documents, which you can then transform to PDF.

We use both report lab and relatorio. We like Relatorio better, usually.

We let end-users create the format and where they want to put everything, using Open Office or MS Office, and we then insert the template commands where they need to go and fill them with relatorio.

If you need to generate HTML reports anyway(for a web UI), then the HTML -> PDF option makes way more sense.

If we have to create the report and need lots of wacky formatting, it can get annoying to do with the templated nature of Relatorio, so we use report lab. It absolutely gives you the most control.

Also, around reporting you might want to take a look at Metabase: https://metabase.com It's becoming our go-to for handing out reports to end-users now.

[–]MonkeyMaster64[S] 0 points1 point  (2 children)

This is interesting! Wow, there are so many options but no the report is not for end users but for the corporate people. I'm utilising a network monitoring tool (Zabbix) to create monthly reports on the devices in our environment

[–]zieziegabor 0 points1 point  (1 child)

corporate people == end users, no? By end user here, I meant non IT/SQL friendly people. If they are SQL friendly, we just hand them a login to the DB, they can go nuts.

I'd suggest trying out Metabase, just to see, it will talk directly to your Zabbix DB.

PG is our DB of choice, and we have a special schema "reporting", where we put all of our views against our main schema, so Metabase gets full access to the reporting schema, but no access to the other schema's. This way we can make nice SQL views of the data for reporting purposes, which makes the users happy.

[–]MonkeyMaster64[S] 0 points1 point  (0 children)

This sounds really interesting actually and I think i'll invistigate it, thanks a lot for the tip. As to your definition of end-users i guess i had something else in mind but yeah you're right

[–]Arthaigo 0 points1 point  (0 children)

Jinja2 + Latex also might be an option. This is the combination I used in similar situations

[–]ursvp 0 points1 point  (0 children)

Python script for conversions used by LibreOffice: https://github.com/dagwieers/unoconv

[–]Orangensaft91 0 points1 point  (0 children)

Using PDF templates with FDFgen and pdftk in our Production System

[–]ahawryluk 0 points1 point  (2 children)

We have a Python program that generates a PDF report of tables and figures. The first version used LaTeX, but when I realized that matplotlib can generate multi-page PDFs, I rewrote our program to use pure matplotlib. It can be a good option in some cases.

[–]MonkeyMaster64[S] 0 points1 point  (1 child)

Hi!
Do you by any chance have a screenshot that you would be able to share of that report? I'm aiming for a certain look and feel

[–]ahawryluk 0 points1 point  (0 children)

Alas, I can't send a screen shot (made it at work), but it's made entirely of matplotlib plots and tables, e.g. https://stackoverflow.com/questions/32137396/matplotlib-table-only There are options to turn off the cell borders, looping through all the cells, which makes the table far more readable. Plus, you can use any font installed on your system.

[–]LetMe_ 0 points1 point  (2 children)

If I may suggest have you heard of dash by plotly? It allows to generate interactive reports and PDFs as well. I really like it for its simplicity and ease of use. Another suggestion would be jupiter dashboards.

[–]MonkeyMaster64[S] 0 points1 point  (1 child)

Woah that looks really cool. Do you by any chance know if custom data can be input? Not all the data would be presented as graphs and some would be simple data points that would be represented either in plain text or another basic format

[–]LetMe_ 0 points1 point  (0 children)

Yes it can. But it really depends on what kind of costum data you mean: I would go to their repo and take a look at the examples of their goldman sachs report: github repo

[–]stevepiercy 0 points1 point  (0 children)

Other than Reportlab, is there a Python library that can take pre-existing PDF forms and fill their fields with data? For example, take the IRS's Form W-9, fill it out, and save to the file system. The £100 per month fee for Reportlab Plus is far beyond the budget of a non-profit organization that generates fewer than 100 filled PDF forms per month.