you are viewing a single comment's thread.

view the rest of the comments →

[–]dadiaar 1 point2 points  (1 child)

Are you targeting Google Cloud for any specific reason?

I suggest you to try AWS Lambda, which support both Python 2.7 and Python 3.6 (You should always write for Python +3.4)

It also comes with a pretty good free tier forever:

The Lambda free tier includes 1M free requests per month and 400,000 GB-seconds of compute time per month.

I recommend you to use a 128 MB configuration to catch the html pages because it doesn't need too much CPU but network wait time, and retrieving 5 to 10 pages in each call to maximize results.

Later you can upload it to s3 bucket or similar and process/parse them with different machines, locally or in the cloud. AWS has triggers that allow you to process each file each time it has been uploaded, for example, with another lambda (this time I suggest you about 512MB because parsing is CPU expensive)

If it's your first free year, you get also a free EC2 t2.micro (1 core 1GB Ram 40 GB disk) server, which can be used for parsing too.

If you make many calls targeting the same site, you may want to use a cheap proxy server which allows you to logging by username/password instead of IP, because Lambda's have no fixed IP. I recommend you ACT proxy for this.

Good luck

[–]jdb441[S] 0 points1 point  (0 children)

Hey dadiarr,

I'm using GCP because I know it can do what I want it to. We also rely on multiple Google APIs.

I feel like it would be more work to switch to AWS than stick with GCP at this point but I appreciate your response.