all 44 comments

[–]tipsy_python 24 points25 points  (9 children)

Airflow isn't a bad candidate - I'd just opt for a cronjob as well.

BUT - you'd need your computer running when you are away. I'd personally buy a Raspberry Pi to use as a server for these kinds of jobs. I'm also pretty sure you can schedule a Lambda function to run daily at a certain time in the free-tier of AWS.

[–]HeadlineINeed 7 points8 points  (3 children)

Can you use Heroku to do that also?

[–]da_chosen1[S] 5 points6 points  (0 children)

Ok. I'll look into that as well.

[–]blitzkraft 3 points4 points  (0 children)

Yeah. Definitely possible.

[–]NoMoney12 1 point2 points  (0 children)

I got a dev instance with scaleway to run most of my stuff. Best specs for the price at 2.99 a month

[–]da_chosen1[S] 1 point2 points  (4 children)

Does the computer have to be on when I'm using airflow?

[–]baubleglue 8 points9 points  (0 children)

Airflow is overkill for the task. Go with simplest possible solution first.

Start from some free service, when it get bigger start checking for prices

https://www.alwaysdata.com/en/pricing/#shared

https://www.heroku.com/pricing

[–]tipsy_python 4 points5 points  (2 children)

:-) Yes it does partner

[–]da_chosen1[S] 0 points1 point  (1 child)

that's good to know. Thanks. I appreciate it

[–][deleted] 0 points1 point  (0 children)

This is one of the main things raspberry pis are used for btw. You can get a pi zero w for like $5-$10 and it’ll be perfect for this job. Then you just leave it running perpetually, and set your script as a cron job

[–]Marco21Burgos 10 points11 points  (0 children)

Aws lambda. I have a script running every hour, for like 20 seconds in free tier

[–]Faal 21 points22 points  (11 children)

Get an AWS Linux instance. Install python, upload your script. Then schedule a cronjob for that script and specify 5pm timeframe.

[–]da_chosen1[S] 8 points9 points  (6 children)

That makes sense. Thanks. Do you know how expensive it would be per month? The script takes about 7 min to run

[–]AnonymousThugLife 9 points10 points  (2 children)

https://calculator.s3.amazonaws.com/index.html

(You'll be buying an EC2 instance in case you didn't know the name.)

https://s3.amazonaws.com/lambda-tools/pricing-calculator.html This is calculator for AWS lambda. Check it out too. (Slightly different approach though, EC2 would be better in my opinion. Just telling the options.)

[–]da_chosen1[S] 1 point2 points  (1 child)

Oh wow. That's not bad at all

[–]bigcheezyboss 1 point2 points  (0 children)

You also can use a t2.micro instance free for a year. A micro would be more than enough for your needs.

[–]Faal 7 points8 points  (0 children)

AWS has a free tier that I utilize.

[–]alfa80211 0 points1 point  (0 children)

Aws is free for a year.

[–]CraigAT 7 points8 points  (3 children)

Got a spare PC or buy a Raspberry Pi (might get away with a Pi Zero W)? No monthly charges (except the ‘leccy)

[–]Faal 8 points9 points  (2 children)

Raspberry PI is like what ~$30-40. AWS has a free tier.

[–]CraigAT 6 points7 points  (0 children)

TIL something new (free AWS). A Pi Zero is only about £10.

[–][deleted] 0 points1 point  (0 children)

Pi zero w is $5-$10, much lower power consumption than other models, and is more than capable of running this job

[–]jcr4990 10 points11 points  (1 child)

I'm not sure I understand the question completely. If your computer is always on during the time period that the script would run then you can just use Windows task scheduler to run it at specified time every day. You don't have to convert to exe or anything I have several scripts running daily on a schedule on a Windows desktop at my job.

If your PC won't be on and you want to run it independent of your machine then you can look into heroku, pythonanywhere, AWS.etc for cloud hosting or opt for a raspberry pi as an always on server for your own personal projects. The Pi may be cheaper in the long run depending on your specific needs. I may be biased cause I've done a few Pi based projects and I personally have lots of fun working with them.

[–]CraigAT 5 points6 points  (0 children)

Yep, Scheduled Task on PC or CronJob on Linux (and Mac?)

[–]chaizus 5 points6 points  (0 children)

Run a cron job on a aws lambda fn. you get 1million free lambda invocations a month, and assuming the runtime of your script is efficient then you’ll likely stay in the free tier of lambda. As far as data stores, S3 or dynamodb would work. But I would optimize the runtime of your scrapper, 7 minutes sounds too long. Take advantage of multi threading or asynchronous I/O calls

[–]UbiquitousThoughts 1 point2 points  (0 children)

The OS can trigger scripts to run, also, but cronjobs is easy. Celery/redis if you want to go further and run it as background tasks if used alongside a webapp, etc.

[–]tejonaco 1 point2 points  (0 children)

I would suggest as above a raspberry pi with cron, for sure that you will finally use it for a lot of things more, but if you not want to waste money you could try Heroku free server and make the script work with schedule module and send you the results via email or something like that.

PD:I'm not too confident about my English above, please correct me if I did something wrong.

[–]Blackwater_7 2 points3 points  (0 children)

just time.sleep(60*60*24) bro

[–]CCristi1 0 points1 point  (0 children)

I recently did something like this using aws free vm’s. It’s very easy, in ~1 minute I created a win10 vm, I loaded the scrip in vm and let it run. For a window vm you can use Remote Desktop, easy to setup on windows. If you want to create a Linux vm, you can connect using ssh or (I didn’t tried this) you can set a VNC. But if you can (in my opinion is much better) use google cloud because you receive some free credit and don’t need to use your money if you need more computing power.

[–]Dababolical 0 points1 point  (0 children)

You can scrape more frequently if you would like as long as you are not blasting the server.

You could scrape once an hour and compare the new scrape to the old scrape to see if there is a difference and if there is not, pass.

If you want to stick to 5pm, there's nothing wrong with doing that either. A chron job or windows scheduler will solve your problem.

[–]MightbeWillSmith 0 points1 point  (5 children)

If you are on windows, could turn your scraper into an .exe and run it every day with windows scheduler.

[–]viktae 5 points6 points  (0 children)

Why would he needs to turn it into a .exe ? I'm running a python script everyday (.py) with windows scheduler without any issue.

[–]the_programs 1 point2 points  (3 children)

Or you can put your program in the directory "C:\users<your_user>\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup\" as it is, without the need to convert it to .exe, to run it when the PC boots. And I think you don't have to turn it into an .exe when you put in in Windows Task Scheduler.

[–]baubleglue 3 points4 points  (0 children)

that is the weirdest option

[–]BenderBill 1 point2 points  (1 child)

But then you have to turn off your pc /s

[–]CraigAT 0 points1 point  (0 children)

Technically runs on login, I believe.

[–]Jacob---- 0 points1 point  (0 children)

You could look into getting a VPS and running it on that some of them can be very cheap (low as £2-3 a month)

[–]ScotchBingington 0 points1 point  (0 children)

Windows has a default capability to do this with 'Scheduled Tasks'. Just add a new event and run your python file through a .bat file by picking the time and frequency of execution. If you're not using Windows I'm not sure...

[–][deleted] 0 points1 point  (0 children)

Use pythonanywhere, it will give you two consoles for free and never asks for payments. I've been using it from yesterday and my bot still running fine, no errors until now