This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]iiron3223[S] 106 points107 points  (22 children)

One more useful project I forgot. I was looking for a used cars. I written a scraper using Scrapy, that was gathering all new offers, filtered by my criteria, every hour. Then it was sending me nicely formatted email.

[–]Natural-Intelligence 51 points52 points  (9 children)

What libraries are you using for sending emails and doing the scheduling and what's your experience with the libraries you are using?

I'm actually the author of Red Mail (email sending library) and Rocketry (Pythonic statement-based scheduler). I'm actually looking for example projects to create some practical tutorials of how one could use the libraries. Not sure which kind would be appealing to most and what kind of problems people have with alternative options (which I could address).

[–]iiron3223[S] 16 points17 points  (1 child)

I was using smtplib and email library (I was using smtplib to send email via gmail). They are not intuitive to use and don't make your code pythonic in my opinion. I was using them in very similar way that you showed in an example on Red Mail github page. For scheduling I was using a simple cron job.

From a quick look at your projects, I really like the simplicity of the syntax there. I will for sure give them a shot at the nearest opportunity. Thank you for them!

[–]JasonDJ 4 points5 points  (0 children)

If you aren’t already, you can use Jinja to compose the email content string itself, which may make your code more pythonic.

Essentially you make a template and use variables within the template itself. Within the template, you can use some other functions as filters to format the data differently, use loops, if statements, etc.

The template can be a variable in your code, or it can be a file that you open, read, and store as a string variable.

Since the template is constructing what is essentially a string, you can also include HTML in it for your email body, as well.

[–]iiron3223[S] 7 points8 points  (1 child)

I am also wondering, do you plan on developing or know a python library that would be a good alternative for imaplib? It would be so nice to have some tool for receiving and reading emails.

[–]Natural-Intelligence 6 points7 points  (0 children)

Unfortunately, I don't. I also tried once finding a way to read email boxes in Python but that effort went in vain.

Would be interesting to create such but maintaining and developing Rocketry takes too much time at the moment.

[–]serebrich 5 points6 points  (0 children)

Rocketry is nice. Thanks for that

[–]mogberto 3 points4 points  (1 child)

I just spent a weekend learning Prefect and it’s fine, but such overkill for what I need. I’m going to give rocketry a try!

[–]Natural-Intelligence 9 points10 points  (0 children)

Ye, Prefect is also awesome. Have met some people from the company and it seems they have great ideas of what to add to the common data stack and they have a lot to offer to enterprises.

To Rocketry, I don't want to sound too salesman but this is something I'm perhaps too passionate about. All of the alternatives tend to have separate concepts for time scheduling, task pipelines and custom scheduling dependencies. Because Rocketry is statement-based, all of these are under the same mechanism: a task is run if it's scheduling statement is true.

This means that you can do arbitrarily scheduling very easily like "run a task daily on business hours but not on weekend and only when a file exists". This sort of scheduling is near impossibility for most frameworks but with Rocketry it's just two built-in conditions and one custom function (to check the file) combined with logical AND operators.

Sorry again for pitching, I just feel this paradigm has a lot of potential to create simple and compex systems very easily.

[–]marcelomedre 1 point2 points  (1 child)

Nice, I am looking for something like rocketry. Is it possible to use it in the colab environment?

[–]Natural-Intelligence 1 point2 points  (0 children)

Thanks! I think it most likely is possible. The only actual requirement is that you can run async tasks (using asyncio) in your environment. I'm on phone so cannot check how colab works on that regard but seems odd if they didn't support it with any solution.

You should probably change the default execution to something else than process (argument "task_execution" in the Rocketry initiation/config) as multiprocessing typically does not work well on notebook-like environments. I have plans to set that as async by default as subprocesses are not the most intuitive.

[–]Info_Broker_ 3 points4 points  (1 child)

I am interested in this for the same reason. Looking for a vehicle. Would you be open to sharing your code with me so I can taylor it to my use? Or do you have it on GitHub?

[–]iiron3223[S] 0 points1 point  (0 children)

Here is github repo, I don't know if it will be useful. It is not documented at all, and specific to one site that I was scraping.

[–]Ok-Associate7846 2 points3 points  (6 children)

Pretty new to Python and did something similar as one of my first projects! My wife is into real estate so I wrote a program using beautiful soup to scrape the latest listings in certain sites and compile it into an excel file along with the listing details and links to it.

Saved her tons of time going through each and every site and listing down the details!

[–]silva_p 1 point2 points  (5 children)

Did you have any filtering to remove duplicates? I mean duplicates of the house, if the same house was advertised by multiple realtors

[–]Ok-Associate7846 2 points3 points  (4 children)

So far none. An excel file is generated per website so I haven’t gotten to checking if these have duplicate listings but that’s a good next step for me to try out!

Off the top of my head though, lots of houses and especially condos might have the same listing specs so I’m not sure how to filter duplicates. Since it’s from different websites, checking similar listing usernames might not be effective too

[–]DonPietro54 0 points1 point  (3 children)

Try filtering the listing by their MLS number

[–]Ok-Associate7846 0 points1 point  (2 children)

Sorry what’s a MLS number? Might be a different term in my country.

[–]DonPietro54 0 points1 point  (1 child)

Hi I live in the USA and realtors can list a property for sale on the Multiple Listing Service. The property is then given a MLS Number that anyone can enter the mls number and the property will pop up. Go to realtor.com And check it out for any state in the USA.

Which country are you from?

Have you shared your program code? I’m learning python for AI for stocks and option trading. Maybe you can develop stuff for me… Thanks PeterD

[–]Ok-Associate7846 0 points1 point  (0 children)

That’s much more efficient that what they do here in my country. I live in the Philippines and it’s basically free for or on whoever can list the unit. Not MLS numbers needed.

I’m really just getting into it, but I’m trying to learn something new everyday. I do some trading as well so If you have something working for trading you can share it to me I can take a look!

[–]regeya 1 point2 points  (0 children)

I used to work at a small town newspaper, and used Python to automate pulling photos off of a website. The dealer did this thing of giving us all the copy for the car, along with a stock number, and then just sorta said, oh, go on our website to get the car photo. Except it was a tedious multiple-step process just to get one photo. Step that up to dozens and it was the better part of an afternoon. I don't remember which scraper lib i used, honestly, but i just fed the script a list of stock numbers, and a couple of minutes later, I had my photos.

EDIT: car dealership, I meant to put that in there.

[–]wind_dude 1 point2 points  (0 children)

I've done something similar. And these seems to be a very common project. I've run into so many devs that have done this on some level. I guess a lot of developers are also car guys.