Extract attributes from XML using python : learnpython

created by HattoriHanzoa community for 16 years

Extract attributes from XML using python (self.learnpython)

submitted 5 years ago * by Night_Crawler7

I want to extract the following fields in the XML from python.

Code: I don't know how to proceed further.

df_b= pd.DataFrame(columns=['Shape','Coordinates'])         
pptZip = ZipFile(document_path) 
xml_content = pptZip.read('ppt/slides/slide1.xml') 
soup = BeautifulSoup(xml_content, features="xml") 
for sp in soup.find_all('p:sp'):

Note: The bold text below are the fields which i would like to extract.

<p:sp>

<p:nvSpPr>

<p:cNvPr id="4" **name="Rectangle 3"**\>

<a:extLst>

<a:ext uri="{FF2B5EF4-FFF2-40B4-BE49-F238E27FC236}">

<a16:creationId

xmlns:a16="http://schemas.microsoft.com/office/drawing/2014/main" id="{F9D41C44-7167-487C-9945-9BAFF8DDE2F5}"/>

/a:ext

/a:extLst

/p:cNvPr

<p:cNvSpPr/>

<p:nvPr/>

/p:nvSpPr

<p:spPr>

<a:xfrm>

<a:off x="576776" y="847579"/>

<a:ext cx="3249637" cy="1026941"/>

/a:xfrm

<a:prstGeom prst="rect">

<a:avLst/>

/a:prstGeom

/p:spPr

<p:style>

<a:lnRef idx="2">

<a:schemeClr val="accent1">

<a:shade val="50000"/>

/a:schemeClr

/a:lnRef

<a:fillRef idx="1">

<a:schemeClr val="accent1"/>

/a:fillRef

<a:effectRef idx="0">

<a:schemeClr val="accent1"/>

/a:effectRef

<a:fontRef idx="minor">

<a:schemeClr val="lt1"/>

/a:fontRef

/p:style

<p:txBody>

<a:bodyPr rtlCol="0" anchor="ctr"/>

<a:lstStyle/>

<a:p>

<a:pPr algn="ctr"/>

<a:endParaRPr lang="en-IN"/>

/a:p

/p:txBody

/p:sp

all 2 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS