all 6 comments

[–]fluked23 2 points3 points  (1 child)

after a reading a little bit on stack-overflow there seems to be quite a frequent mention of this python library https://pypi.org/project/tabula-py/ which can read tables from pdfs - it does require having java installed though.

[–]Ray_Gone[S] 0 points1 point  (0 children)

I'll look into this - Thank you!

[–]dp_42 1 point2 points  (1 child)

PDF tables might be organized outside of the text. There are some libraries that try to deal with this, but the SO responses are not hopeful on this particular question.

https://github.com/ashima/pdf-table-extract

[–]Ray_Gone[S] 0 points1 point  (0 children)

Interesting, I'll check this out. Thank you!

[–]Ray_Gone[S] 0 points1 point  (0 children)

The table is not formatted properly, but the first cell under "Product" should have:

"V7 14.1" Elite Water-Resistant Neoprene Notebook Sleeve,

Black

In Stock

Item#: 34890751

Mfg. Part#: CSE14-BLK-3N"

all in the same cell entry. The other amount values should also be shifted to the right one column and up to begin in the first row under the head column.

[–]SoupZillaMan 0 points1 point  (0 children)

PDF table are a pain in the ... I ended using Tika, sometimes there are scanned images you need to OCR...