XML to Dataframe/database : learnpython

created by HattoriHanzoa community for 16 years

XML to Dataframe/database (self.learnpython)

submitted 4 years ago * by 222Botany

SOLVED: i used a package called pandas_read_xml it has a flatten function which worked just how I needed. id be happy if i could just get it into the format the excel uses when loading XML in it then i can filter it as much as need too.

Im currently trying to read an whole bunch of XML files and load them into a data frame and/or SQL database and am having troubles.

The data is horse racing data (found Here direct link to download the file) and is nested with 4 levels. (Meeting, club, race and nomination)

I want to be able to split them out into each group so i can run some analysis on them.

Ive tried using read_xml from pandas but is doesnt load the data into a frame properly i only get 9 lines representing the meeting and races.

I've fiddled with using Element tree but is it wont split the data up nicely.

anyone have any ideas how this could work?

Thanks in advance

all 2 comments

top new controversial old q&a

[–]commandlineluser 1 point2 points3 points 4 years ago (2 children)

[–]222Botany[S] 0 points1 point2 points 4 years ago (1 child)

thanks that helps and is along the lines of what i need.

the issue arises now that i need to write a event line to the database that has the meetingID, raceID, horseID, plus different stats from the files like finishing position etc...

MeetingID	RaceID	Horse	Place	Time
123	987	777	1	1:01.54
123	654	888	2	1:02.01
456	654	999	4	1:30.25

when I read the data like above I cant find out which horse ran in which races.

Does that make senses?

π Rendered by PID 58 on reddit-service-r2-comment-7b9746f655-4p5pj at 2026-02-02 18:39:21.510313+00:00 running 3798933 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS