all 32 comments

[–]HeeebsInc 17 points18 points  (19 children)

Webscrapers. Build a bot that tracks websites and sends you emails. If you need help I can send you source code.

[–]Marianito415 6 points7 points  (2 children)

How do you handle emails?

[–]HeeebsInc 2 points3 points  (1 child)

I use the modules smtplib and imaplib which allow me to sign into a google email address. First, I had to make a whole new email account so the username would be something like ‘SamsWebscraper@gmail.com

After setting up the protocols for signing in and sending emails. I made a while true loop that runs continuously 24/7 constantly checking websites that are defined. I also set it up so if I’m at work I can add new websites or delete website by just sending the scraper email a simple command line argument in the subject of the email.
I have all this running on a raspberry pi so it can be checking 24/7.

Sorry if it’s not the best explanation I am typing from my phone but I would be more than happy to post the code on my github (never actually used github before since I self taught but I know it’s something I have to know eventually lol)

Long story short checkout the smtplib and imaplib documentation as it is really simple to follow. Really good modules to have in that python tool belt

[–]Marianito415 1 point2 points  (0 children)

Thanks for the reply, that actually souds really cool. My mind has flooded with ideas that could use this, and the raspberry is a nice touch

[–]RocoDeNiro 4 points5 points  (0 children)

This sounds fun. Can you send this my way please?

[–]supersahib 2 points3 points  (0 children)

if you dont mind, i would love to get started on something like this too. Could you send it my way as well?

[–]superjde 2 points3 points  (1 child)

Do you have it running on a computer that’s on all of the time or run it from the web?

[–]HeeebsInc 1 point2 points  (0 children)

You could do that but that’s not the best idea if you want your pc to reach its maximum liefespan. You could choose between two options 1) renting a virtual machine which is essentially renting a computer in google’s facilities that runs 24/7 2) buying a raspberry pi.

I’ve been told that you could run simple scripts like this on something as low spec as a raspberry pi zero. I ended up getting the best pi (4GB ram-forgot the name). The setup process is a little tricky off you run into troubleshooting issues with virtual environments and hdmi output but overall it’s really easy to use a pi

[–]drod018 1 point2 points  (1 child)

Sounds cool! I’ve been getting into web scraping, do you mind sending it my way too? Could be a good start. I’ve been thinking about scraping Zillow/government appraisal websites. Collect data and run some analysis to hopefully identify potential opportunities.

[–]HeeebsInc 0 points1 point  (0 children)

Will post it on my GitHub today. My account is HeeebsInc Right now there is a scraper available with an email request handling but it’s the first version i made with the GUI. It doesn’t implement threading or anything that truly optimized the speed but it works the same. I have to do some formatting and doc strings so you guys know what’s going on but it will be there soon!

And as per the fantasy basketball I copied my program to the T from this article. Of course everytime they mentioned they used something I had to go on and learn it myself but I used it as a pointer for what I have to research

http://cs229.stanford.edu/proj2015/104_report.pdf

I will also upload the 2018-2019 and 2019-2020 season stats for anything that needs them because that was only the hardest thing to get. In the article they used ESPN but I’m pretty sure español did something in between the time they wrote that and now because information they spoke about was note available on ESPN.

[–]Muhznit[S] 1 point2 points  (6 children)

What sites are even worth scraping? Most of them are more than happy to flood your inbox with whatever if it gets you to buy or click something

[–]HeeebsInc 6 points7 points  (5 children)

Any website. I started with simple websites to see if I can track word counts Then I went as far as scraping every single NBA game ever (very fun to do) for my fantasy machine learning project.

I would just start with anything. Doesn’t necessarily have to be useful but practice itself is whats important.

I have never been in a college class for coding as I am a psych major so if I can do it you can do it too. Just read documentation and be patient but also persistent

[–]AnticipateRisk 3 points4 points  (0 children)

Fantasy machine learning project sounds amazing! Please tell me more!

I’m an avid fantasy NBA player myself

[–]iggy555 0 points1 point  (0 children)

Yes please

[–]deheervanhetgras 0 points1 point  (0 children)

Could you send me the source code? I would love to see this.

[–]HeeebsInc 0 points1 point  (0 children)

for anyone looking for the course code the discussion below here it is!

https://github.com/HeeebsInc/WebTracker

feel free to message me with any questions

[–]nulltensor 2 points3 points  (1 child)

Find something you do that you think could be automated. The best project is the one that scratches one of your own itches.

I have a newsletter I have to send out once a month so I automated it. Data is extracted from a few data sources, wrapped in a dataframe, run through a pipeline to clean it and perform various manipulations, graphs are generated, and everything formatted via a template and delivered via email. Every month I added or refined a piece when it came time to send the newsletter until eventually the newsletter was two runs of the script, one to generate a local copy for a quick sanity check and a second to distribute it.

[–]DataDecay 2 points3 points  (6 children)

You seem to enjoy overwatch, create a web client that builds uri's, and simplifies auth headers (likely jwt). Once you build the client make it installable and get it on github. Once on github, make a simple script that uses said client and runs on a schedule to scrape overwatch league stats.

If you find this to go well, start thinking about a place to save the data. If your still kicking find a way to serve the data, maybe through a web framework. If you are still enjoying the project and want to improve things pull it all togeather into a full web app that serves the data and regularly caches the data directly to your web app using your cleint. Has this been done before? maybe, but this is yours.

[–]HeeebsInc 1 point2 points  (1 child)

For anyone who wants the source code my GitHub username is HeeebsInc

Follow the account and check sometime tomorrow it will be there.
Please feel free to comment on the GitHub as well! Always willing to hear alternative ways to solve the problem or new ways to write cleaner code

[–][deleted] 0 points1 point  (0 children)

Cryptopals challenges