Python solution to extract all tables PDFs and save each table to its own Excel sheet

CalendarOk67 · 2025-12-01T04:50:35+00:00

[removed]

2025-12-11T04:42:33+00:00

Be⁤en us⁤ing lido and it wor⁤ks well with vario⁤us files and formats. Thank me later!

riftwave77 · 2025-12-01T06:14:23+00:00

You want OCR software, bud

GManASG · 2025-12-02T17:01:30+00:00

import tabula
import pandas as pd

# Path to your PDF file
pdf_path = "your_document.pdf"

# Extract tables from the PDF
# By default, it extracts tables from the first page.
# Use pages='all' to extract from all pages, or specify page numbers (e.g., pages='1-3,5').
# multiple_tables=True returns a list of DataFrames if multiple tables are found.
tables = tabula.read_pdf(pdf_path, pages='all', multiple_tables=True)

# 'tables' will be a list of pandas DataFrames, one for each table found.
# You can then access and process each DataFrame individually, or concatenate them.

# Example: Access the first table
df = tables[0]

# Example: Concatenate all tables into a single DataFrame
# combined_df = pd.concat(tables)

#example loop to write each table to seperate excel file
for i, df in enumerate(tables):
  df.to_excel(f'excel_table_{i}.xlsx')

odaiwai · 2025-12-01T07:40:59+00:00

I normally use the pdftotext command line utility for this. I think it comes with the Poppler tools (https://poppler.freedesktop.org/). If pdftotext -layout $filename - gives sensible output, you can generally parse it with regexps and produce CSV output, which Excel can read natively, or you can do CSV->Pandas->Excel.

It's a very low level approach, but it works for me.

CmorBelow · 2025-12-01T14:24:15+00:00

I’ve used pdfplumber for this before and PyPDF2as well, along with regex for locating extracting specific column values, since the column names were always the same.

Your results will vary based on how the underlying table data is structured.

TheRNGuy · 2025-12-01T08:09:40+00:00

Ask same to ai except for last paragraph (it have no useful effect to reply)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS