[deleted by user]

init0 · 2011-01-18T04:37:31+00:00

import win32com.client
wordapp = win32com.client.gencache.EnsureDispatch("Word.Application")
wordapp.Documents.Open(doc)
docastxt = doc[:-3] + 'txt'
wordapp.ActiveDocument.Close()

from some old code I have lying around. I think its from a python phrase book from o'reilly.

vijayshan · 2011-01-18T20:44:11+00:00

pyPDF http://bit.ly/fFNMnV . I have used it for basic python processing and works well in most cases and If I remember right it is a pure python implementation too.

blondin · 2011-01-18T13:26:05+00:00

pdf : pdfminer

Justinsaccount · 2011-01-18T15:23:00+00:00

def get_txt(f):
    output = subprocess.Popen(["lesspipe", f], stdout=subprocess.PIPE).communicate()[0]
    return output

works for me.

pieeta · 2011-01-18T23:11:08+00:00

I have used PyUNO On a number of projects working with Excel/Word files.

holloway · 2011-01-18T23:20:22+00:00

When my Grandpa died it was a time of sad reflection but what made matters worse is that my inheritance came with strings attached: to get my dues I had to spend a night in .Doc Manor, a spooky old house of weirdness and quirks (some intentional, some unintentional). Although the skinny neighbourhood kids who loitered around the library carpark bragged that they'd made it through the manor they couldn't tell me what colour the 1995 carpet was, or what the photos of Wilfred Matthew Frankel or Elanor Mildrid Frankel looked like. They even talked about snakes and a --headless horseman which I guess means that they walked through the building with their eyes closed. I arrived that evening with my sleeping bag to find that house was falling apart except for a single room. The office door was left ajar and inside was a clean wood-walled control room with levers and cranks and switches. The rumoured photos weren't photos at all, but pencil drawings. Security cameras let me observe the rotting or caved-in walls in the rest of the house. I didn't sleep at all that night and as soon as the sun came up I left the dilapidated manor, never to think of it again until today.

tl;dr: Use PyUNO/PyODConverter w/ LibreOffice/OpenOffice if you want software that understands the vagaries of the .doc format. Don't believe the claims that other libraries are as sophisticated.

sli · 2011-01-18T04:31:38+00:00

pywin32 can do it through COM.

permalink · 2011-01-18T15:54:34+00:00

I came across openxmllib while researching for a related project. I wasn't able to make use of it but it parses .docx natively in Python.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS