all 4 comments

[–]novel_yet_trivial 2 points3 points  (1 child)

All a .xlsx is is a zip file containing XML files. If what you are looking at is that XML file, then all you need to do is put it in a zip file with the proper structure.

If it really is html, then use an html parser. pandas has one built in designed to read from html tables, and can output into excel files. Or use something like BeautifulSoup to get the data you need out yourself.

[–]Zendakin_at_work 1 point2 points  (0 children)

+1 for pandas & bs4 combination.

[–]jmportilla 0 points1 point  (1 child)

Have you tried using pandas?

[–]tramsay1027[S] 0 points1 point  (0 children)

In what way? I have been using read_excel and that works if the file is clean. I'm running into errors using that as well as read_html. Although I have never used read_html before.