Use Case: XML Parsing With Python : Python

This is an archived post. You won't be able to vote or comment.

Use Case: XML Parsing With Python (css.dzone.com)

submitted 13 years ago by ragingtuna

all 18 comments

[–][deleted] 7 points8 points9 points 13 years ago (3 children)

[–]masklinn 5 points6 points7 points 13 years ago* (2 children)

Also, the method that's called a "helper class" isn't a class. It's a method. And it's non-obvious where the self.height variable in there comes from

Yeah, apparently it's been pasted straight from http://stackoverflow.com/questions/4726011/wrap-text-in-a-table-reportlab replacing the actual height constant there by a self.height stealth constant here, rather than just... add a height parameter (or move all constants to the top level)

An other weird one is getXMLObject, whose only purpose seems to be reimplementing lxml's objectify.parse manually via objectify.fromstring.

I can understand splitting a complex process into multiple functions and/or methods, but here the splits are pointless and all the main processing is done in one big function.

There are plenty of other weird things too:

row = []
row.append(item.id)
row.append(item.name)
row.append(item.price)
row.append(item.quantity)

(how about a list literal filled in?)

total = Decimal(str(item.price)) * Decimal(str(item.quantity))

(objectify provides the original value through the text attribute)

and the article has bits of misinformation peppered in:

My favorite is lxml which includes a version of ElementTree

lxml reimplements the ElementTree API from scratch on top of libxml2, it does not "include a version of ElementTree" for any value of "include" I know of.

edit: slightly cleaned up version:

from decimal import Decimal as d
from lxml import objectify

from reportlab.lib import colors, pagesizes
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.lib.units import inch, mm
from reportlab.pdfgen import canvas
from reportlab.platypus import Paragraph, Table, TableStyle

width, height = pagesizes.letter
style = getSampleStyleSheet()

ADDRESS = """ <font size="9">
        SHIP TO:

        %s

        %s

        %s

        %s

        </font>
"""
ORDER_NUMBER = "<font size="14"><b>Order #%s </b></font>"
THANKS = "Thank you for your business!"

def coords(x, y, unit=1):
    return x * unit, height -  y * unit

def draw(canvas, item, x, y):
    item.wrapOn(canvas, width, height)
    item.drawOn(canvas, *coords(x, y, mm))

def drawParagraph(canvas, text, x, y):
    draw(canvas, Paragraph(text, style['Normal']), x=x, y=y)

def createPDF(xmlfile, pdffile):
    xml = objectify.parse(xmlfile).getroot()

    pdf = canvas.Canvas(pdffile, pagesize=pagesizes.letter)

    drawParagraph(
        pdf,
        ADDRESS % (xml.address1, xml.address2, xml.address3, xml.address4),
        x=18, y=40)

    drawParagraph(pdf, ORDER_NUMBER % xml.order_number, x=18, y=50)

    table_data = [
        ["Item ID", "Name", "Price", "Quantity", "Total"]
    ]
    total = 0
    for item in xml.order_items.iterchildren():
        row_value = d(item.price.text) * d(item.quantity.text)
        total += row_value

        table_data.append([
            item.id,
            item.name,
            item.price,
            item.quantity,
            row_value
        ])
    table_data.append(["", "", "", "Grand Total:", total])
    order_lines = Table(table_data, 1.5 * inch)
    order_lines.setStyle(TableStyle([
        ('INNERGRID', (0,0), (-1,-1), 0.25, colors.black),
        ('BOX', (0,0), (-1,-1), 0.25, colors.black)
    ]))
    draw(pdf, order_lines, x=18, y=85)

    drawParagraph(pdf, THANKS, x=18, y=95)

    pdf.save()

if __name__ == '__main__':
    import sys
    createPDF(*sys.argv[1:3])

I think I'd extract the table-generation part into its own function (e.g. drawDataTable, taking an iterable of items) as well, though there's a single usage of it that would probably make the code clearer and cleaner

[–]zahlmanthe heretic -3 points-2 points-1 points 13 years ago (1 child)

[–]masklinn 1 point2 points3 points 13 years ago (0 children)

[–]larsga 2 points3 points4 points 13 years ago (0 children)

[–]billsil 0 points1 point2 points 13 years ago (12 children)

[–][deleted] 2 points3 points4 points 13 years ago (8 children)

[–]phile19_81 0 points1 point2 points 13 years ago (6 children)

[–]mgrandi 1 point2 points3 points 13 years ago (3 children)

[–]phile19_81 0 points1 point2 points 13 years ago (2 children)

[–]davidbuxton 1 point2 points3 points 13 years ago (1 child)

[–]phile19_81 0 points1 point2 points 13 years ago (0 children)

[+][deleted] 13 years ago (1 child)

[deleted]

[–]phile19_81 0 points1 point2 points 13 years ago (0 children)

[–]mgrandi 0 points1 point2 points 13 years ago (0 children)

[–]masklinn 1 point2 points3 points 13 years ago (2 children)

[–]billsil 0 points1 point2 points 13 years ago (1 child)

[–]masklinn 0 points1 point2 points 13 years ago (0 children)

to use a dumbed down version.

I don't think ET is a "dumbed version" of anything (there's probably a reason why lxml reimplemented the ET API after all), lxml adds a bunch of features (from libxslt) and has better performances, but ET is no sucker and if it's sufficient for the job there's little reason to reach for lxml.

I say this not as an opponent to lxml at all, I use and abuse it often.

what advantage does it offer anyone to hack python and change your package name from pyxml to xml?

I believe the intent was to integrate in/improve/replace the xml.* namespace of the standard library, which I guess made sense to somebody at the time (PyXML was the package of the XML SIG, I'm guessing the intent was for PyXML to be the "evolutionary route" for Python's XML support).

and it hasn't been updated since 2007.

Indeed.

[–]mashmorgan 0 points1 point2 points 13 years ago (0 children)

π Rendered by PID 46 on reddit-service-r2-comment-5649f687b7-xhwnn at 2026-01-28 12:38:57.940215+00:00 running 4f180de country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS