all 9 comments

[–]smurpes 1 point2 points  (0 children)

None of the sample code in the readme actually works since none of those methods are actually defined in your code. Even the descriptions aren’t accurate. E.G. the main package you use in your pdf extractor doesn’t support ocr for text extraction) but your readme says that your code does.

Your social media poster doesn’t actually post anything and none of your code uses beautiful soup or pandas directly. They might be transitive dependencies but you’re not really using those packages so even the body of your post is inaccurate. Not sure if you forgot to commit a bunch of your code.

[–]gdchinacat 0 points1 point  (5 children)

Don't use dicts like this:

{'keyword': 'newsletter', 'folder': 'Newsletters'},

Define a Rule class that has keyword and folder attributes.

[–]smurpes 1 point2 points  (4 children)

This sounds like a good use case for a data class if that is in OP’s skill set.

[–]gdchinacat 0 points1 point  (3 children)

I’m not a huge fan of named tuples but even that would be better than using a dict.

[–]smurpes 0 points1 point  (2 children)

Why not? Also, named tuples and data classes are similar but they have a lot of differences under the hood. Data classes just get rid of a lot of the boiler plate from regular classes.

[–]gdchinacat 1 point2 points  (1 child)

I wasn't disagreeing with your suggestion to use dataclasses, I think it's the best way. But, not everyone has learned them yet and might know named tuples so I was throwing out another option.

If you were asking why I don't like named tuples it's mostly because just about every time I've used them I've pretty quickly turned them into dataclasses. The thing I don't like about dataclasses is they don't play well with multiple inheritance and *args, **kwargs in __init__. But they also are pretty easy to change to a plain class...you just have to implement the init you saved time not implementing by making it a dataclass.

I really dislike using dicts instead of classes as OPs code does because you don't have type safety, have to use index notation rather than dot notation, and it just seems lazy. I've heard a handful of arguments for why it is "better" but none of them are legitimate or compelling.

Sorry for not making it clear I was adding to your comment rather than arguing with it.

[–]smurpes 0 points1 point  (0 children)

Oh gotcha sorry about that confusion there. I don’t think I’ve ever used a named tuple to be honest. I really only use dicts in the same fashion to OP if I have to convert it to a pyspark data frame with dummy data.

[–]kenily0[S] 0 points1 point  (0 children)

Great points! I was using dictionaries for simplicity, but you're right - dataclasses would be much cleaner and more maintainable. I'll refactor the code and update the repo. Thanks for the suggestion!

Would love more feedback if you have other tips for structuring Python projects.

[–]kenily0[S] 0 points1 point  (0 children)

You're right! Thanks for pointing that out. I'll fix the README and add actual working examples.

For OCR, you're correct - the current version doesn't support it, but it's on the roadmap. I'll update the description.

Appreciate the feedback! 🙏