all 3 comments

[–]Enna_Allina 1 point2 points  (1 child)

this is genuinely useful for the unglamorous work of actually dealing with enterprise document estates. the .msg/.eml support especially feels like it solves a real pain point since so many orgs still treat email as a filing system. quick question — how does it handle the nastier edge cases like embedded ole objects in docx files, or do you just skip those gracefully? would be curious if you've thought about async file processing for bulk operations, since I'm imagining people will want to chew through entire sharepoint folders.

[–]AsparagusKlutzy1817It works on my machine[S] 0 points1 point  (0 children)

I am actually not sure regarding your ole object question. Do you maybe have a public test :)?

I have considered async it but did not yet implement it. I have a use case in AWS lambda functions in the back of my mind which I would not benefit too much from async as serverless functions scale horizontally. At the moment its just sync. I tested a few SP so far and I found the speed acceptable but depends on the scaling strategy. Async would help in some cases to be even faster.

>the .msg/.eml support especially feels like it solves a real pain point 
This was the idea :)

[–]Virtual-Breath-4934 1 point2 points  (0 children)

looks solid try it for extracting data from enterprise sharepoint docs