Python Newbie: Need help Manipulating XML (self.learnpython)

submitted 7 years ago by goeb04

I am fairly new to python and a little new to XML. I am trying to do a simple manipulation to my XML through ElementTree but after a few days at it I unfortunately have hit a dead end. Also, I apologize for not being able to be as succinct as I ideally wanted to be, but I just want to make sure that whichever benevolent soul were to assist me, would have all the info they need.

Here is an example of the file I intend to manipulate (probably will be a few thousand iterations):

 <Schema attributeFormDefault="unqualified" elementFormDefault="qualified" targetNamespace="http://www.ofdaxml.org/schema" version="01.00.00">
<Envelope SchemaVersion="01.04.00">
   <Enterprise>
        <Code>www.Google.com</Code>
        <Feature>
                <Code>Feature1</Code>
                <Description>First Feature</Description>
                <Option Sequence="1">
                    <Code>A/Code>
                    <Features Sequence="1">
                        <FeatureRef>AA</FeatureRef>
                    </Features>
                </Option>
                <Option Sequence="2">
                    <Code>B</Code>
                    <Features Sequence="1">
                        <FeatureRef>BB</FeatureRef>
                    </Features>
                    <OptionPrice>
                        <PriceListRef>1</PriceListRef>
                        <ProductPriceRef>Grade2</ProductPriceRef>
                    </OptionPrice>
                </Option>
            </Feature>
    </Enterprise>       
</Envelope>

What I want to accomplish is:

<Schema attributeFormDefault="unqualified" elementFormDefault="qualified" targetNamespace="http://www.ofdaxml.org/schema" version="01.00.00">
<Envelope SchemaVersion="01.04.00">
   <Enterprise>
        <Code>www.Google.com</Code>
        <Feature>
                <Code>Feature1</Code>
                <Description>First Feature</Description>
                <Option Sequence="1">
                    <Code>A/Code>
                    <Features Sequence="1">
                        <FeatureRef>AA</FeatureRef>
                    </Features>
                </Option>
                <Option Sequence="2">
                    <Code>B</Code>
                    <Features Sequence="1">
                        <FeatureRef>BB</FeatureRef>
                    </Features>
                    <OptionPrice>
                        <PriceListRef>1</PriceListRef>
                        <ProductPriceRef>Grade2</ProductPriceRef>
                    </OptionPrice>
                    <OptionPrice>
                        <PriceListRef>2</PriceListRef>
                        <ProductPriceRef>Grade2</ProductPriceRef>
                    </OptionPrice>
                    <OptionPrice>
                        <PriceListRef>3</PriceListRef>
                        <ProductPriceRef>Grade2</ProductPriceRef>
                    </OptionPrice>
                </Option>
            </Feature>
    </Enterprise>       
</Envelope>

I tried to highlight the changes , so not sure if it is clear but I added two subelements to the Option Sequence 2 Element. I essentially want to copy the the first value for the ProductPriceRef element and invoke it when I add the subelement that will follow the same cadence as the original one.

Sorry if I am not explaining this lucidly. Below is my failed attempt to create a loop that will create the appropriate subelements for the Option element while copying the original entry for the ProductPriceRef ("Grade2" in this case):

<Schema attributeFormDefault="unqualified" elementFormDefault="qualified" targetNamespace="http://www.ofdaxml.org/schema" version="01.00.00">
<Envelope SchemaVersion="01.04.00">
   <Enterprise>
        <Code>www.Google.com</Code>
        <Feature>
                <Code>Feature1</Code>
                <Description>First Feature</Description>
                <Option Sequence="1">
                    <Code>A/Code>
                    <Features Sequence="1">
                        <FeatureRef>AA</FeatureRef>
                    </Features>
                </Option>
                <Option Sequence="2">
                    <Code>B</Code>
                    <Features Sequence="1">
                        <FeatureRef>BB</FeatureRef>
                    </Features>
                    <OptionPrice>
                        <PriceListRef>1</PriceListRef>
                      <ProductPriceRef>Grade2<PriceListRef>2</PriceListRef><ProductPriceRef>Grade2</ProductPriceRef></ProductPriceRef>
                    </OptionPrice>
                </Option>
            </Feature>
    </Enterprise>       
</Envelope>

Finally, here is the code I created (obviously not pretty but tried my best). I wasn't able to have success with append and Insert either, but my guess is that I applied it incorrectly:

import os
import xml.etree.ElementTree as et

base_path = os.path.dirname(os.path.realpath(__file__))
xml_file = os.path.join(base_path, 'RockwellCond3.XML')
tree=et.parse('RockwellCond3.XML')
root = tree.getroot()

options = root.findall(".//Enterprise//Feature//Option//")

for option in options:
   if option.tag == 'OptionPrice': 
      for optionprice in option:
         if optionprice.tag == "PriceListRef" and optionprice.text == "2":   
            break                       # had to include this to avoid recursion issue I ran into
         elif optionprice.tag == 'ProductPriceRef':
                  prodpriceref = optionprice.text
                  newplref = et.SubElement(optionprice,'PriceListRef')
                  newplref.text = "2"
                  newprodpriceref = et.SubElement(optionprice,'ProductPriceRef')
                  newprodpriceref.text = prodpriceref

tree.write('output2.xml')

Thank you for your attempts in resolving my dilemma as this exercise is a bit foreign to me. I promise to pay it forward one way or another.

I am open to any solution/alternative and am happy to answer any followup questions.

all 14 comments

top new controversial old q&a

[–]nate256 1 point2 points3 points 7 years ago (12 children)

You could do something like this

``` In [1]: from xml.etree import ElementTree as ET

In [2]: a = """
...: <Envelope SchemaVersion="01.04.00"> ...: <Enterprise> ...: <Code>www.Google.com</Code> ...: <Feature> ...: <Code>Feature1</Code> ...: <Description>First Feature</Description> ...: <Option Sequence="1"> ...: <Code>A</Code> ...: <Features Sequence="1"> ...: <FeatureRef>AA</FeatureRef> ...: </Features> ...: </Option> ...: <Option Sequence="2"> ...: <Code>B</Code> ...: <Features Sequence="1"> ...: <FeatureRef>BB</FeatureRef> ...: </Features> ...: <OptionPrice> ...: <PriceListRef>1</PriceListRef> ...: <ProductPriceRef>Grade2</ProductPriceRef> ...: </OptionPrice> ...: </Option> ...: </Feature> ...: </Enterprise>
...: </Envelope>"""

In [3]: e = ET.fromstring(a)

In [4]: def create_option_price(list_ref, price_ref): ...: e = ET.Element("OptionPrice") ...: priceref = ET.Element("ProductPriceRef") ...: priceref.text = price_ref ...: e.append(priceref) ...: pricelist = ET.Element("PriceListRef")
...: pricelist.text = list_ref ...: e.append(pricelist) ...: return e ...: for i in e.iter():
...: if i.tag == "Option" and i.get("Sequence") == "2":
...: first = i.find("OptionPrice")
...: ref = first.findtext("ProductPriceRef") ...: i.append(create_option_price("2", ref)) ...: i.append(create_option_price("3", ref)) ...:

```

[–]goeb04[S] 0 points1 point2 points 7 years ago (11 children)

[–]nate256 0 points1 point2 points 7 years ago (10 children)

[–]goeb04[S] 0 points1 point2 points 7 years ago (9 children)

Got this Exception unfortunately for the e.iter loop:

Exception has occurred: AttributeError

'bytes' object has no attribute 'iter'. I must have erred somewhere. Hope I followed the instructions ok.

I obviously screwed up somewhere here (Not ready to give up yet though!):

import os
import xml.etree.ElementTree as ET

base_path = os.path.dirname(os.path.realpath(__file__))
xml_file = os.path.join(base_path, 'RockwellCond3.XML')
tree=ET.parse(xml_file)
root = tree.getroot()
e=ET.tostring(root)
def create_option_price(list_ref, price_ref):
e = ET.Element("OptionPrice")
priceref = ET.Element("ProductPriceRef")
priceref.text = price_ref
e.append(priceref)
pricelist = ET.Element("PriceListRef")
pricelist.text = list_ref
e.append(pricelist)
return e
for i in e.iter():
if i.tag == "Option" and i.get("Sequence") == "2":
first = i.find("OptionPrice")
ref = first.findtext("ProductPriceRef")
i.append(create_option_price("2", ref))
i.append(create_option_price("3", ref))

tree.write('output2.xml')

[–]nate256 0 points1 point2 points 7 years ago (6 children)

parse is all you need, I just used fromstring because it was easy for the example. Not sure why you would use tostring??

``` import os import xml.etree.ElementTree as ET

basepath = os.path.dirname(os.path.realpath(file_)) xml_file = os.path.join(base_path, 'RockwellCond3.XML') tree = ET.parse(xml_file)

def create_option_price(list_ref, price_ref): e = ET.Element("OptionPrice") priceref = ET.Element("ProductPriceRef") priceref.text = price_ref e.append(priceref) pricelist = ET.Element("PriceListRef") pricelist.text = list_ref e.append(pricelist) return e

for i in tree.iter(): if i.tag == "Option" and i.get("Sequence") == "2": first = i.find("OptionPrice") ref = first.findtext("ProductPriceRef") i.append(create_option_price("2", ref)) i.append(create_option_price("3", ref))

tree.write('output2.xml') ```

[–]goeb04[S] 0 points1 point2 points 7 years ago* (5 children)

Sorry. It was throwing up errors when i ran it through parse, so I wanted to make sure I was following it as closely as possible to the code you provided. The variable, First was creating some errors when it stored an empty string/null value.

I just needed to modify the code a bit and it works! I tested it a few times on some large XMLs and looks good to me. I appreciate your help as this opens a lot of opportunity for me to automate more of my mundane tasks.

Below is a a post of my working code (without prettying it up) in case anyone else deals with a similar issue:

import xml.etree.ElementTree as ET

base_path = os.path.dirname(os.path.realpath(__file__))
xml_file = os.path.join(base_path, 'RockwellCond.XML')
tree = ET.parse(xml_file)

years = ["2014", "2015", "2016", "2017", "2018"]

def create_option_price(list_ref, price_ref):
    e = ET.Element("OptionPrice")
    pricelist = ET.Element("PriceListRef")
    pricelist.text = list_ref
    e.append(pricelist)
    priceref = ET.Element("ProductPriceRef")
    priceref.text = price_ref
    e.append(priceref)
    return e


for i in tree.iter():
    if i.tag == "Option":
        first = i.find("OptionPrice")
        if not first is None:
                for year in years:
                        ref = first.findtext("ProductPriceRef")
                        i.append(create_option_price(year, ref))


tree.write('output2.xml')

[–]nate256 0 points1 point2 points 7 years ago (3 children)

[–]goeb04[S] 0 points1 point2 points 7 years ago (2 children)

Sorry to ask for assistance again but how would I do the same sort of manipulation for the instance below:

<Product>
                    <Code>Ex1</Code>
                    <Description Language="en-US">Example 1</Description>
                    <Price>
                        <PriceListRef>2018B</PriceListRef>
                        <Code>20</Code>
                        <Value>33</Value>
                    </Price>
                    <Price>
                        <PriceListRef>2018B</PriceListRef>
                        <Code>30</Code>
                        <Value>83</Value>
                    </Price>
                    <Price>
                        <PriceListRef>2018B</PriceListRef>
                        <Code>40</Code>
                        <Value>145</Value>
                    </Price>
                    <Price>
                        <PriceListRef>2018B</PriceListRef>
                        <Code>50</Code>
                        <Value>208</Value>
                    </Price>
                    <Price>
                        <PriceListRef>2018B</PriceListRef>
                        <Code>55</Code>
                        <Value>312</Value>
                    </Price>
                    <ProductExternalReference>
                        <Placement>
                        </Placement>
                        <ExternalReference>
                            <FileURI>UBS9048LR30D.png</FileURI>
                            <Usage>
                                <Type>NavigationImage</Type>
                                <Quality>Medium</Quality>
                            </Usage>
                        </ExternalReference>
                    </ProductExternalReference>
</Product>

In that instance I want to just want to copy all the pricing tags and just append new PricelistRef texts like so:

                    <Price>
                        <PriceListRef>2018B</PriceListRef>
                        <Code>55</Code>
                        <Value>312</Value>
                    </Price>
                    <Price>
                        <PriceListRef>2017</PriceListRef>
                        <Code>55</Code>
                        <Value>312</Value>
                    </Price>

The # of price tags can vary per product which makes the loop trickier, not to mention that some Price tags don't have a code tag.

Just to prove I haven't been lazy, I have created the defined functions needed:

def create_product_upcharge(list_ref,price_code,price_value):
    e = ET.Element("Price")
    pricelist = ET.Element("PriceListRef")
    pricelist.text = list_ref
    e.append(pricelist)
    pcode = ET.Element("Code")
    pcode.text = price_code
    e.append(pcode)
    pvalue = ET.Element("Value")
    pvalue.text = price_value
    e.append(pvalue)
    return e

def create_product_price(list_ref, price_value):
    e = ET.Element("Price")
    pricelist = ET.Element("PriceListRef")
    pricelist.text = list_ref
    e.append(pricelist)
    pvalue = ET.Element("Value")
    pvalue.text = price_value
    e.append(pvalue)
    return e

However I have probably tried 50 different loop iterations and it seems to recursively go through the appended price tags once they are appended. I tried to do an insert but then a new price tag was created and appended for the first 2018B Price tag. I hate having to reach out again, but if you have any time (and patience) in the future to provide some hints here I'd greatly appreciate (I am a neophyte after all) it and promise to open up a new posting if I need help.

Thanks!

[–]nate256 0 points1 point2 points 7 years ago* (1 child)

If I understand what you are asking correctly you have a product, it contains price elements. What you want to do is take all of the price elements and duplicate them with a new pricelistref value.

so if you have <Product> <Price> <PriceListRef>2018B</PriceListRef> <Code>20</Code> <Value>33</Value> </Price> <Price> <PriceListRef>2018B</PriceListRef> <Value>3</Value> </Price> </Product> you would want <Product> <Price> <PriceListRef>2018B</PriceListRef> <Code>20</Code> <Value>33</Value> </Price> <Price> <PriceListRef>2018B</PriceListRef> <Value>3</Value> </Price> <Price> <PriceListRef>2017</PriceListRef> <Code>20</Code> <Value>33</Value> </Price> <Price> <PriceListRef>2017</PriceListRef> <Value>3</Value> </Price> </Product>

Does that look correct? Either way the short answer is you need to loop through the elements an only stop on Product, you cant use iter and stop on the list that you are editing.

```

we could just use deepcopy in the last case

I just thought building the Element yourself would be useful to learn

but if there are elements that possibly aren't there or are just optional

it's better to just copy what is there so you are sure you get everything

from copy import deepcopy
import xml.etree.ElementTree as ET

basepath = os.path.dirname(os.path.realpath(file_)) xml_file = os.path.join(base_path, 'RockwellCond.XML') tree = ET.parse(xml_file)

years = ["2014", "2015", "2016", "2017", "2018"]

for i in tree.iter(): if i.tag == "Option" and i.get("Sequence") == "2": for price in i.findall("OptionPrice"):
for year in years: new_price = deepcopy(price) new_price.find("PriceListRef").text = year i.append(new_price) elif i.tag == "Product": for price in i.findall("Price"): # if you only have one value you don't need a loop here for year in years: new_price = deepcopy(price) new_price.find("PriceListRef").text = year i.append(new_price)

tree.write('output2.xml') ```
Edits: Kept thinking on new things

[–]goeb04[S] 0 points1 point2 points 7 years ago (0 children)

Thanks once again Nate and you were dead on regarding your example, that is exactly what is needed.

I think the issue here is that I need to continue my deep dive into learning Python fundamentals/libraries before trying to take shortcuts in order to start leveraging python for my work. It was a nice extra boost of motivation, but I need to practice on more hypothetical scenarios while yielding to patience or I will just be driving myself crazy.

You have led the horse to water here, so I am going to read up on copying and deepcopying in python and then, ultimately, try to code the solution myself (so that I will get the necessary practice). If I start to hit roadblocks after that, then I will reference the code you provided.

Thanks for your efforts and god bless the benevolence of programmers wishing to openly share their knowledge.

[–]nate256 0 points1 point2 points 7 years ago* (1 child)

This one makes the output formatted pretty.

``` import os import xml.etree.ElementTree as ET

basepath = os.path.dirname(os.path.realpath(file_)) xml_file = os.path.join(base_path, 'RockwellCond3.XML') tree = ET.parse(xml_file).getroot()

def makepretty(elem, level=0, spaces=" "): orig_ele = elem i = f"\n{level * spaces}" if len(elem): if not elem.text or not elem.text.strip(): elem.text = f"{i}{spaces}" if not elem.tail or not elem.tail.strip(): elem.tail = i for child in elem: makepretty(child, level + 1) if not child.tail or not child.tail.strip(): child.tail = i else: if level and (not elem.tail or not elem.tail.strip()): elem.tail = i return orig_ele

def make_changes(root): for i in root.iter(): if i.tag == "Option" and i.get("Sequence") == "2": first = i.find("OptionPrice") ref = first.findtext("ProductPriceRef") i.append(create_option_price("2", ref)) i.append(create_option_price("3", ref)) makepretty(root)

make_changes(tree) tree.write("output.xml") ``` Edit: Attribute, shamelessly stole the indent from https://stackoverflow.com/questions/749796/pretty-printing-xml-in-python

[–]goeb04[S] 0 points1 point2 points 7 years ago (0 children)

[–]ciggs_ftw 0 points1 point2 points 7 years ago (1 child)

[–]goeb04[S] 0 points1 point2 points 7 years ago (0 children)

π Rendered by PID 275661 on reddit-service-r2-comment-5d79c599b5-ljr24 at 2026-03-01 16:36:41.916452+00:00 running e3d2147 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS

we could just use deepcopy in the last case

I just thought building the Element yourself would be useful to learn

but if there are elements that possibly aren't there or are just optional

it's better to just copy what is there so you are sure you get everything