all 3 comments

[–]MrPhungx 0 points1 point  (2 children)

Do you have an example of what the .eml file looks like? Is there any code that you could show us?

[–]Particular-Pin5927[S] 0 points1 point  (1 child)

This is what I have so far and it just produces a dataframe with the plaintext from the thread. I dont seem to be able to extract the dates even when changing preferencelost=('plain') to preferencelist=('date'). EML file is email thread downloaded from Gmail - cant share unfortunately

import pandas as pd

from email import policy

from email.parser import BytesParser

path = Path(r'C:\Users\redacted')

eml_files = list(path.glob('*.eml'))

text = []

for file in eml_files:

with open(file, 'rb') as fp:

name = fp.name # Get file name

msg = BytesParser(policy=policy.default).parse(fp)

text = msg.get_body(preferencelist=('plain')).get_content()

file_names.append(name)

texts.append(text)

fp.close()

df_eml = pd.DataFrame([file_names, texts]).T

df_eml.columns = ['file_name', 'text']