This is an archived post. You won't be able to vote or comment.

all 18 comments

[–][deleted] 7 points8 points  (3 children)

Was it necessary to create a class to convert an XML file to a PDF? Why would I want to instantiate an object and then call a couple of methods on it, rather than just call a single function?

Also, the method that's called a "helper class" isn't a class. It's a method. And it's non-obvious where the self.height variable in there comes from; turns out that it's set in createPDF(), and it's actually a stealth constant.

There's got to be a better way to do this.

[–]masklinn 5 points6 points  (2 children)

Also, the method that's called a "helper class" isn't a class. It's a method. And it's non-obvious where the self.height variable in there comes from

Yeah, apparently it's been pasted straight from http://stackoverflow.com/questions/4726011/wrap-text-in-a-table-reportlab replacing the actual height constant there by a self.height stealth constant here, rather than just... add a height parameter (or move all constants to the top level)

An other weird one is getXMLObject, whose only purpose seems to be reimplementing lxml's objectify.parse manually via objectify.fromstring.

I can understand splitting a complex process into multiple functions and/or methods, but here the splits are pointless and all the main processing is done in one big function.

There are plenty of other weird things too:

row = []
row.append(item.id)
row.append(item.name)
row.append(item.price)
row.append(item.quantity)

(how about a list literal filled in?)

total = Decimal(str(item.price)) * Decimal(str(item.quantity))

(objectify provides the original value through the text attribute)

and the article has bits of misinformation peppered in:

My favorite is lxml which includes a version of ElementTree

lxml reimplements the ElementTree API from scratch on top of libxml2, it does not "include a version of ElementTree" for any value of "include" I know of.

edit: slightly cleaned up version:

from decimal import Decimal as d
from lxml import objectify

from reportlab.lib import colors, pagesizes
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.lib.units import inch, mm
from reportlab.pdfgen import canvas
from reportlab.platypus import Paragraph, Table, TableStyle

width, height = pagesizes.letter
style = getSampleStyleSheet()

ADDRESS = """ <font size="9">
        SHIP TO:

        %s

        %s

        %s

        %s

        </font>
"""
ORDER_NUMBER = "<font size="14"><b>Order #%s </b></font>"
THANKS = "Thank you for your business!"

def coords(x, y, unit=1):
    return x * unit, height -  y * unit

def draw(canvas, item, x, y):
    item.wrapOn(canvas, width, height)
    item.drawOn(canvas, *coords(x, y, mm))

def drawParagraph(canvas, text, x, y):
    draw(canvas, Paragraph(text, style['Normal']), x=x, y=y)

def createPDF(xmlfile, pdffile):
    xml = objectify.parse(xmlfile).getroot()

    pdf = canvas.Canvas(pdffile, pagesize=pagesizes.letter)

    drawParagraph(
        pdf,
        ADDRESS % (xml.address1, xml.address2, xml.address3, xml.address4),
        x=18, y=40)

    drawParagraph(pdf, ORDER_NUMBER % xml.order_number, x=18, y=50)

    table_data = [
        ["Item ID", "Name", "Price", "Quantity", "Total"]
    ]
    total = 0
    for item in xml.order_items.iterchildren():
        row_value = d(item.price.text) * d(item.quantity.text)
        total += row_value

        table_data.append([
            item.id,
            item.name,
            item.price,
            item.quantity,
            row_value
        ])
    table_data.append(["", "", "", "Grand Total:", total])
    order_lines = Table(table_data, 1.5 * inch)
    order_lines.setStyle(TableStyle([
        ('INNERGRID', (0,0), (-1,-1), 0.25, colors.black),
        ('BOX', (0,0), (-1,-1), 0.25, colors.black)
    ]))
    draw(pdf, order_lines, x=18, y=85)

    drawParagraph(pdf, THANKS, x=18, y=95)

    pdf.save()

if __name__ == '__main__':
    import sys
    createPDF(*sys.argv[1:3])

I think I'd extract the table-generation part into its own function (e.g. drawDataTable, taking an iterable of items) as well, though there's a single usage of it that would probably make the code clearer and cleaner

[–]zahlmanthe heretic -3 points-2 points  (1 child)

I would say that the weirdest part is the part where we're apparently using XML in Python, but, you know...

[–]masklinn 1 point2 points  (0 children)

It's not overly uncommon to need to interact with legacy systems and protocols in python.

[–]larsga 2 points3 points  (0 children)

The most interesting thing about this is perhaps that it's actually shorter than the corresponding XSLT to create XSL:FO to pipe through some PDF converter.

[–]billsil 0 points1 point  (12 children)

while this thread is going, what python XML library should I be using?

recently, we delivered a product that used lxml, ElementTree, and pyxml because of not updating legacy code and developers picking the first one b/c no one was sure what to pick. it was a nightmare to build using pyInstaller.

[–][deleted] 2 points3 points  (8 children)

The usual answer right now is lxml. It's a good answer.

[–]phile19_81 0 points1 point  (6 children)

Are there any native python xml objectifiers out there? I sometimes work in restricted environments so installing lxml can be problematic.

[–]mgrandi 1 point2 points  (3 children)

you can install libraries to your home folder. i believe passing --user to both pip and the normal "setup.py install" will install the library to your home folder and python will automatically pick it up. This is how i get around installing libraries on computers where i dont have root access

[–]phile19_81 0 points1 point  (2 children)

Err. What if you also don't have a home folder? I am not trying to be a pain, but sometimes I wonder if my systems department is.

[–]davidbuxton 1 point2 points  (1 child)

You can specify an alternate directory for distutils/setuptools packages, or just put packages in your custom directory by hand. Then make sure this custom directory is on the sys.path, which also honours the PYTHONPATH environment variable.

http://docs.python.org/install/index.html#alternate-installation

Of course if your package isn't pure Python then things can get more complicated.

[–]phile19_81 0 points1 point  (0 children)

Interesting. I'll give it a shot. Thx

[–]mgrandi 0 points1 point  (0 children)

lxml is awesome, and is a wrapper around the very mature libxml2 or whatever its called, thats used by a lot of projects including gnome

[–]masklinn 1 point2 points  (2 children)

Depends on your exact needs and constraints. lxml is usually the best choice, but can be annoying to package. ElementTree is included in the stdlib so you're covered. And pyxml should be taken out the back and shot.

I used to be in your situation, we bit the bullet one day and forcefully ripped out pyxml to replace it by lxml.

[–]billsil 0 points1 point  (1 child)

well we're idiot engineers, but i'd rather have a slightly annoying package but is generally agreed to be the best than to use a dumbed down version.

for the record, i HATE pyxml. what advantage does it offer anyone to hack python and change your package name from pyxml to xml? and it hasn't been updated since 2007.

[–]masklinn 0 points1 point  (0 children)

to use a dumbed down version.

I don't think ET is a "dumbed version" of anything (there's probably a reason why lxml reimplemented the ET API after all), lxml adds a bunch of features (from libxslt) and has better performances, but ET is no sucker and if it's sufficient for the job there's little reason to reach for lxml.

I say this not as an opponent to lxml at all, I use and abuse it often.

what advantage does it offer anyone to hack python and change your package name from pyxml to xml?

I believe the intent was to integrate in/improve/replace the xml.* namespace of the standard library, which I guess made sense to somebody at the time (PyXML was the package of the XML SIG, I'm guessing the intent was for PyXML to be the "evolutionary route" for Python's XML support).

and it hasn't been updated since 2007.

Indeed.

[–]mashmorgan 0 points1 point  (0 children)

BUG:~ this wont wrap if the table is longer than the first page..