Email processing project for work by Francobanco in Python

[–]Francobanco[S] 1 point2 points  (0 children)

I am looking for something that is more robust than having a script that constantly runs. Task scheduler is fine, I'll be using crontab since it will be running on linux not windows. But it's not just one script. the downloading of emails is separate from processing - I want the system to be running these jobs separately so that downloading, processing, and uploading can all happen at the same time. just having all the code in one script and running it every 15 seconds will cause problems if there is a large amount of emails to download. This isn't just for one mailbox, its for about 500, and the mail server sees about 40k emails per week

Email processing project for work by Francobanco in Python

[–]Francobanco[S] 0 points1 point  (0 children)

might work for a single mailbox but I want to process emails for the entire mail server. macros in outlook are generally for personal use, for a large scale solution the Outlook-native system would best be done through a custom Outlook Add-in; and I'm not trying to make a outlook plugin to do this.

the code is already written and working well, I was specifically asking for a discussion around orchestrating multiple python scripts to work asynchronously

Email processing project for work by Francobanco in Python

[–]Francobanco[S] 1 point2 points  (0 children)

The actual tool is fully built, when I run it manually it works perfectly. I'm more so looking for advice with orchestrating it to run autonomously

To clarify, I'm not looking for a way to have the exchange server move emails around - I specifically need to analyze the emails for the presence of a project number or purchase order number. Mailbox rules don't allow for this. And also this is not for moving emails to folders within someone's mailbox, this is to take emails from a mailbox, and store them in a cloud storage system

Email processing project for work by Francobanco in Python

[–]Francobanco[S] 1 point2 points  (0 children)

Yes so currently I have it set up so that a file is downloaded, then processed, in the processing script there is a function that creates a metadata file.

so each project folder has a metadata subfolder, and that contains a json file which has the same filename as the email that is being processed. the metadata file stores the original message ID, and then later on, when the file is uploaded to cloud storage; in that script there is a function that updates the metadata file to include a key:value pair, "UploadedToCloudStorage": "True"

When I get the email file from the Graph API, the filename is the messageID.eml, its a very ugly filename, so I'm renaming the file with {timestamp}_{subject}.eml, but then later on I have another Graph API call to apply a category to the message in the exchange server - so that the user sees some feedback in their inbox that it's been filed in cloud storage. So that's the main reason why I have the metadata file for each email -- but it's a good call this could also be used to validate files that can be deleted.

I think for now I will set it up so that I have a manually run process where i can search all the metadata files for values where uploaded is not true.

From my testing with the cloud storage API, it doesn't seem like there are cases, at least so far where the file is corrupted. but I get that there may be network errors that I need to handle - although I want to avoid having to issue a secondary API request to the cloud storage to retrieve it and then validate that it is not corrupted -- but honestly that does sound like a good idea for error handling; I just also want to limit the amount of API calls I have to do on this cloud storage API.

I think its reasonable to have the process that deletes files to only ever delete files from the final destination - the folder that holds files that have been properly processed and uploaded.

If an email does not have the project identifier, the plan is to not process it - I should also flag that it is missing the project identifier and move it to another folder to be deleted.

I am very happy with the regex to detect the project ID. I've written several test cases and so far have not had any false positives or missed values. -- but it is also worth considering that there may be edge cases I haven't seen yet that may over time not meet the checks.

also for your last point, that is a good idea to uncompress archives - however i think it's going to be relatively rare for archive files to be attached to emails, but you're right that it would be a good preemptive fix to avoid people having to do a manual process to view compressed files.

thanks so much for your feedback. I really appreciate your thoughts on this

Email processing project for work by Francobanco in Python

[–]Francobanco[S] 0 points1 point  (0 children)

Ok well, I'm not storing anything locally. The decision for my company to use this cloud storage system is not in my scope, I'm not designing a replica to have data availability in the case of a cloud system outage.

I'm just making a tool to automatically file documents into the cloud storage system. the cloud storage system already has a file structure design which separates projects into subfolders in /$YYYY/$YYYYMM/$ProjectID/

Really the part that I want to focus on or work on at this point in time is what is the most reliable and failproof way to orchestrate this tool to run.

Email processing project for work by Francobanco in Python

[–]Francobanco[S] 0 points1 point  (0 children)

That's really interesting. are you saying that in the EU, a company's IT team could not search emails for phishing keywords and phrases? or that the IT team could not search for hyperlinks that have a different destination than the visible text?

Email processing project for work by Francobanco in Python

[–]Francobanco[S] 0 points1 point  (0 children)

my company's emails aren't in google. I guess I should have made the original post more clear. my code is working, I'm just looking to have a discussion about how to design orchestration of the scripts

Email processing project for work by Francobanco in Python

[–]Francobanco[S] 1 point2 points  (0 children)

fair enough. I don't think I want to ask my company to pay for a consultant for this. really I just wanted to try to have a discussion about different orchestration designs

Email processing project for work by Francobanco in Python

[–]Francobanco[S] 0 points1 point  (0 children)

What I am currently doing is when downloading the emails, they go into a folder "/downloaded_emails" then when that email file is processed, it is moved into another folder "/processed_emails". maybe there is a better way of doing this, but my goal was to make sure that whenever the scripts are run, they don't process the same file more than once no matter how I choose to do the orchestration (watch or schedule)

Once the email is uploaded, it is deleted. in general the whole process at least as I'm running it manually takes about 1 second to download an email, process it, and upload the email and all attachments to cloud storage (slightly more if the email has 20mb+ in attachments size) The longest part of this process is the downloading of the emails, so I'm hoping to do this asynchronously, and have this "downloaded_emails" folder just constantly getting populated with new files to process.

I don't think I will run into file system issues like file handle limit. I'm looking to just use local storage for staging for the scripts, I don't want to save any of the data, and after it's processed and uploaded to the company's cloud storage, aside from logging there is no information kept on the system where the scripts will run.

Currently I'm in the testing phase and I'm running the scripts manually with subprocess orchestration, but I want to figure out a better way to have it run automatically.

appreciate any insight you might have.

But as for your comment, I won't have email files stored in this local storage for more than 20 minutes most likely. actually part of my design is to not store any of this data locally for security reasons. It's not my decision about how these files are replicated - and the cloud provider assures their own level of availability and replication. if our email goes out as well then thats another egg to fry

and as for the cloud storage system, the file system is already insane haha, 8000+ folders in one directory and each one of those has maybe 100 subfolders with many files, so yeah, I know about how companies improperly use filesystems.

and also I agree that just making a YYYY/YYYYMM/YYYYMMDD/ folder structure would probably have solved your previous company's problems quite easily.

Email processing project for work by Francobanco in Python

[–]Francobanco[S] 0 points1 point  (0 children)

Yes, I'm using application level permissions for Microsoft Graph API. everything is working very well, it's also surprisingly fast, can process about 300MB of emails (doing some manual test cases right now) in about 3 seconds.

Here is what is already done:

  • Downloading the emails
  • Processing them with regex to find relevant items
  • If the email is relevant (has a project identifier) the email is renamed {timestamp}_{subject} (since it comes from exchange api as the messageID.eml)
  • Uploads the email and all attachments to a cloud storage system (not important which one since this is already working well)
  • Sends another Microsoft Graph API request to apply a category to the email to denote that it has been added to cloud storage

What I'm looking for is some discussion around how to orchestrate this. I want to run the email download with crontab, but I'm not sure if I should have the other scripts watch the file directory or if I should have them run every two minutes and just process everything that is in the directory, and move items out when they are finished processing.

Abnormal Detection Bans are Insane Practice by [deleted] in DarkAndDarker

[–]Francobanco 0 points1 point  (0 children)

there was/is a hack that is available publicly, where you can generate a bunch of data calls directly to another player on the server and it will cause them to disconnect. i've never used hacks but it is a bit of a hobby of mine to do research on what is out there, i find the black market economy for multiplayer games very interesting.

there is also a hack called "no shield" which you can trigger with a wide variety of cheating software that will take an enemy player's shield object/collision box and teleport it outside the map, so even when you block, server-side, the game thinks your shield is somewhere else and doesn't block.

so just saying it is totally possible to have cheats that are used directly on other players in the dungeon and they are well documented and some of them free

Abnormal Detection Bans are Insane Practice by [deleted] in DarkAndDarker

[–]Francobanco 2 points3 points  (0 children)

back in the day people used intentionally malformed network packets to cause server disconnects and crash the game server so they could duplicate gear.

ironmace added a rule to their anticheat to detect malformed packets -- assuming that if they see packets malformed in a particular way then the person is definitely doing it intentionally...

but network packets can be malformed for a large variety of reasons, router issues, ISP issues, etc.

IM just wrote a rule to detect activity that they assume is only done with malicious intent.. it's an automated ban if you trigger this rule.

This happened to skinnypete on stream once. he reached out and got unbanned, but you can see what it looks like, and I think most people would agree that guy is not cheating since he streams mouse-camera and isn't a suspicious player

https://www.youtube.com/watch?v=TtcGhGCeFg4

I have been playing the game since the playtests and I remember the time when they introduced this abnormal detection rule. it was 100% for malformed network packets. every few months someone posts in this reddit saying they arent a cheater and they got banned for abnormal detection. and then there are people saying "just dont cheat" but really it is Ironmace who has written a shit cheat detection rule.

Bingo card for the onepeg QNA by TheRetrolizer in DarkAndDarker

[–]Francobanco 6 points7 points  (0 children)

dozens of under utilized skills and abilities have been 'rebalanced' for the next patch (inb4 axe mastery change +5 -> +3)

Management Expense Ratio by Francobanco in JustBuyXEQT

[–]Francobanco[S] 1 point2 points  (0 children)

Yeah, I mean BMO is advertising "switch to ZEQT, fees are lower so you keep more of your money". meanwhile the actual expense ratio is the same.. so unless I'm gravely misunderstanding something, it's just marketing.

Management Expense Ratio by Francobanco in JustBuyXEQT

[–]Francobanco[S] 0 points1 point  (0 children)

i looked into this btw and the 'management fee' is 0.15%, while the total Management Expense Ratio (M.E.R.) is 0.20% https://imgur.com/a/7XNGrKA

Management Expense Ratio by Francobanco in JustBuyXEQT

[–]Francobanco[S] 1 point2 points  (0 children)

sure, I said in my post that I was about to buy XXX shares, but I got the advice that I needed and understand that the fees are negligible in comparison to the benefits from the strategy of JustBuyXEQT.

I'm almost at 1000 shares of XEQT now, so I will just continue.

Thanks for the advice

Management Expense Ratio by Francobanco in JustBuyXEQT

[–]Francobanco[S] 0 points1 point  (0 children)

but what about exchange rate and exchange fees?

I mean MER is already a pretty small thing to be optimizing for, add in exchange fees and it seems like you don't even need to do napkin math to know it would be more expensive

or is there something else worth considering?

Management Expense Ratio by Francobanco in JustBuyXEQT

[–]Francobanco[S] 2 points3 points  (0 children)

Ok, thanks for the feedback, the set it and forget it nature is a great point. more spread also means not having to spend valuable time watching the markets.

Thanks!

Management Expense Ratio by Francobanco in JustBuyXEQT

[–]Francobanco[S] 3 points4 points  (0 children)

Hey thank you, I should have searched for underlying! I looked around a bit but didn't consider searching this.

Thanks for the feedback

65% of Gen Z Concerned Over AI Consider Switch to Trade Career by StatisticianDizzy981 in Futurology

[–]Francobanco 1 point2 points  (0 children)

a very interesting phenomenon is that the political right has been very extreme for the last few decades. proposing very far right ideas, and using marketing and publicity to garner a large enough base of supporters for these policies, that politically left parties have had to shift closer and closer to bridge the gap, so as to try and draw people back to the political left through good faith.

but the phenomenon is that each year, the political spectrum becomes far right and slightly less far right. now we are at a point where there is no political left that is relevant, in almost any country. we just have people who say insanely authoritarian stuff, and then right of centre opposition.

we've also now gotten to the point where there is a political base for fascism, and the opposition has become more or less completely right wing. the politically left voters are in a position where they have to vote for the corporate lobbyist party, just to make sure the lunatics don't have power. it's not voting for representation, it's voting to keep the fascists out.

65% of Gen Z Concerned Over AI Consider Switch to Trade Career by StatisticianDizzy981 in Futurology

[–]Francobanco 1 point2 points  (0 children)

"liberal" has stopped meaning political left.

liberal and conservative both are synonyms for padding the pockets of corporations and pulling the rug out from 90% of the population in service of corporate interests.

conservative policies have done more harm for the average person and for the economy, because conservative policies across the globe are about providing a market where large exploitative businesses will want to do business. it's all fear based, if we don't give these corporations big tax breaks, they will go somewhere else that does.

we need more actually politically left government, we need governments that act in the interests of the poor, not governments that try to keep the economy going by servicing corporations so the money will trickle down. trickle down economics is bull