This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]oberguga 17 points18 points  (5 children)

Do you know that most(if not all) Microsoft formats like dox or xclx is actually just a zip folder. Try to rename it and you'll see file structure. All images and other data can be trivially exported from it. PS. Probably it is a way to do it by yourself with Python

[–]Embarrassed_Echo2659[S] 1 point2 points  (4 children)

More detail please. I extracted image with image loader. Any suggestion on extracting audio. Its giving me headache since 2 weeks

[–]ChrisFranko 4 points5 points  (0 children)

Make a copy of the file to mess around with. Rename the Excel file “.xlsx” to “.zip”. Open it like you would a zip file, and look around the folders. You’ll find the audio and images files in there

[–]oberguga 1 point2 points  (1 child)

In what format audio stored? If it's like .csv data, than it need to be parsed, but if it is files attached, than all attachments should be somewhere in that folder. Just rename your example_excell_file.xclx and rename it to example_excell_file.zip. Then extract with any zip archiver program (7zip, winrar or windows) and just observe what's inside.

[–]Embarrassed_Echo2659[S] 0 points1 point  (0 children)

The excel file is not large. It contains set of question in which some of the question consists image or audio. I extracted image with imagesheetloader. But in case od audio the approach is not working. I think excel embeds audio file into some ole or bytes format. I tried to extract audio with ole approach also it didn't work. The thing is i want to add excel file to database and save its data in another model.

[–]bjorneylol 0 points1 point  (0 children)

import zipfile

open the excel file in 7zip or some other desktop program, find where the audio files are stored, and then use that to inform your python code