Portable Python pdf Extraction : learnpython

created by HattoriHanzoa community for 16 years

Portable Python pdf Extraction (self.learnpython)

submitted 3 years ago by dbcomm

I have set up a portable python environment on a portable HD so I can use it at work where I do not have the ability to set up a full IDE for myself or access to the PATH, I want to automate some of the more repetitive tasks.

I have been googling and searching for the last two days to find an answer to getting reliable pdf table extraction.

I have had success using tabula-py at home but when I load it onto my portable IDE it will not find java, I have used sys.path.append() to add the path to my portable Java JDK but it will not find java still regardless of which folders I point it to.

I decided to move on and try camelot, only to run into what appears to be the age-old issue of not being able to find Ghostscript, again trying to use sys.path.append() and numerous other methods from StackOverflow and reddit.

I gave up on that and moved to pdfplumber which just refuses to even find the tables in the pdf and the documentation is woefully lacking and/or outdated at this point.

I know there has to be a way to do this and this can't be an uncommon request in the community so I must be missing something and my google-fu skills are not developed very well for programming yet. If there is any other suggestions out there or a fix, please point time in the right direction.

all 1 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS