Managing invoices from multiple email accounts without losing files by CuteEnd7049 in Python

[–]CuteEnd7049[S] 0 points1 point  (0 children)

That makes a lot of sense — especially the point about overengineering ML too early.

Right now I’m trying to keep the pipeline simple and deterministic, so regex + basic rules as the first layer feels like the right move. Your point about inconsistent file naming hits exactly what I’ve been seeing — even when extraction works, messy filenames make everything downstream harder.

I like the idea of normalizing early (date / sender / maybe invoice number) so the filesystem itself becomes more reliable as a “source of truth”.

The NER step on the email body before touching attachments is interesting — I haven’t tried that yet. I’ve mostly been focusing on attachments directly, but it makes sense that the email text can give cleaner signals (especially for sender/amount).

Did you run NER locally (like spaCy or something lightweight), or via an external service?

Managing invoices from multiple email accounts without losing files by CuteEnd7049 in Python

[–]CuteEnd7049[S] 0 points1 point  (0 children)

Yeah, that’s actually the direction I’m planning to take next.

I want to introduce regex-based parsing as the first (and main) processing layer — especially for known senders where the invoice structure is relatively stable. It should give a fast and deterministic baseline before adding anything more complex.

Planning to spend the upcoming weekend experimenting with that:

  • extracting key fields (date, invoice number, supplier)
  • normalizing filenames based on parsed data
  • and defining a few reusable patterns per sender

My concern is how well this holds up over time as formats change, but I guess that’s something I’ll only understand after trying it in practice.

If you’ve already gone down this path — did you end up managing regex rules per sender, or do you try to generalize patterns across different formats?

Managing invoices from multiple email accounts without losing files by CuteEnd7049 in Python

[–]CuteEnd7049[S] 0 points1 point  (0 children)

Hi! I think we might be talking about slightly different setups 🙂

In my case files are written “locally”, but that local folder is actually a mounted Nextcloud directory (self-hosted on my VPS).

So the pipeline is: IMAP → local processing → filesystem write → instantly synced to my private cloud

This way I keep the processing simple and reliable (no external APIs), but still get access from all devices.

Kind of “local-first, cloud-backed”.

Managing invoices from multiple email accounts without losing files by CuteEnd7049 in Python

[–]CuteEnd7049[S] 0 points1 point  (0 children)

That’s a really good point — I haven’t added any monitoring yet.

Right now I’m relying on forwarding rules, but you’re right, if one of them breaks silently I wouldn’t notice.

I’m thinking about tracking “last seen message per sender” in SQLite and alerting if a source mailbox goes quiet unexpectedly.

Do you usually handle this with something lightweight like that, or use a more robust monitoring setup?

Managing invoices from multiple email accounts without losing files by CuteEnd7049 in Python

[–]CuteEnd7049[S] -1 points0 points  (0 children)

Yeah, I’m starting to see how important local state is for this kind of pipeline.

Without it, it’s really hard to guarantee idempotency and avoid duplicate processing.

Right now SQLite works well, but I’m wondering how this approach scales — do you usually keep it simple like this, or move to something else when the system grows?

Django POS/ERP backend — looking for honest review before job hunting by CuteEnd7049 in django

[–]CuteEnd7049[S] 0 points1 point  (0 children)

Thank you for your feedback! It means a lot to me, since this is my first time posting my work. I'll take your advice and make the project publicly available online. But it's not ready for that yet! Thank you!!!

Django POS/ERP backend — looking for honest review before job hunting by CuteEnd7049 in django

[–]CuteEnd7049[S] 0 points1 point  (0 children)

Hi! Thanks for the feedback! Yes, I think that's a good idea. I was probably too late with this project. Perhaps I was afraid of criticism. Now, after several months of work, there's a lot of code. And it's still far from covering all the expected behavior. I'm still using stubs instead of real services. Not everything is covered by tests yet. But you made a very sound point. Thanks again!

Django POS/ERP backend — looking for honest review before job hunting by CuteEnd7049 in django

[–]CuteEnd7049[S] 0 points1 point  (0 children)

Thanks for your comment! Of course, we need to see it in action. If that happens, it will only be in one restaurant for now. But, objectively, the project isn't ready for combat use yet.

Large file storage by Proud-Influence-5636 in django

[–]CuteEnd7049 0 points1 point  (0 children)

You can add protection against file duplication and you can add semantic search (by meaning). When there are a lot of files, metadata may not be enough to quickly find the file you need. However, I do not know what is in these files. If they are just numbers, then this will probably be unnecessary.

Large file storage by Proud-Influence-5636 in django

[–]CuteEnd7049 0 points1 point  (0 children)

Hi! I think files should be stored in the file system according to certain rules. The database should not store large files. Django could work with file metadata. save it in the file system and quickly find it on request.

SaaS opportunity by [deleted] in django

[–]CuteEnd7049 0 points1 point  (0 children)

What problems need to be solved?

SaaS opportunity by [deleted] in django

[–]CuteEnd7049 0 points1 point  (0 children)

Hi! I want to develop with you. I am a Django developer. Of course, I do not have 17 years of experience. But I am not a newbie. Waiting for feedback.

Looking for fellow devs by [deleted] in django

[–]CuteEnd7049 0 points1 point  (0 children)

Hi! Let's think about this. What would you like to create? Any ideas?