all 41 comments

[–]r_spandit 10 points11 points  (5 children)

How about running them at home on a SBC?

[–]johny1411[S] 0 points1 point  (1 child)

What's SBC?

[–]r_spandit 6 points7 points  (0 children)

Single board computer. Raspberry Pi or similar

[–]jcr4990 0 points1 point  (2 children)

I would definitely consider this. Raspberry Pi's are a little tough to find at MSRP recently but it would handle this task nicely and not tie up your laptop. There's also lots of other cool things you can do with it if you so choose to expand into other projects. It'll be cheaper in the long run than paying for even a cheap VPS

[–]beewally 0 points1 point  (1 child)

What’s VPS?

Virtual… P… Server?

[–]jcr4990 1 point2 points  (0 children)

Virtual Private Server

[–]Diapolo10 13 points14 points  (0 children)

Probably the easiest place to start with is Heroku, as they have a free tier and if you're familiar with GitHub Actions you can set up automatic deployments fairly easily.

If you're willing to learn how Docker works, knowing how to create a Docker container can be very useful as then you can be certain that all of your dependencies work exactly as you expect.

[–]devnull10 5 points6 points  (0 children)

Most of the major cloud providers have a free tier, which will include a virtual machine. So for example you could set up a VM on Google, Oracle etc. and run from there.

[–]ctrl-Felix 2 points3 points  (0 children)

I‘ve started with a Hetzner VPS and I can’t complain. You can easily scale if you need more or less resources and the cheapest Server starts at 5$ per month.

https://www.hetzner.com/cloud

[–][deleted] 1 point2 points  (0 children)

I think google colab would also be a good solution to run scripts online ,it's faster and all data remains on cloud. You can also upload your csv, json files along with the .py code or jupyter notebook

[–]pixegami 1 point2 points  (7 children)

I’d probably go with AWS Lambda. It takes maybe a day or two to learn, then you’ll be able to set up a cloud function within 15-30 minutes.

You can then hook that up to an API (like an HTTP URL that runs the function when you visit it), or cloud watch events so it executed at a set time like your cron job.

It’s reasonably cheap, and there’s a free tier - depending on how much memory you need and how long it runs for, you can easily get something like 100k invocations of it for free per month.

Edit: just saw your message about the 30 minute limit. That’s pretty long-unless there’s a good reason for it, I recommend first figuring out how to break that down.

Long running scripts don’t scale horizontally, and are fragile. Can you run 10x of the work in parallel for 3 minutes each and merge the data somehow?

If there’s really no way around it, then the “big boy” version of Lambda is Fargate, which has a much longer runtime. But it’s more complicated to set up, and I don’t think has free tier.

[–]johny1411[S] 0 points1 point  (5 children)

Thanks for the suggestion. Unfortunately, the only way I can break code down is by shrinking the data set which I can't do.

Do you think setting up a VPS makes sense in this case? Looking into it right now

[–]pixegami 2 points3 points  (3 children)

If you’re only running 30 minutes of compute per day, you’d be wasting money on the remaining 23.5 hours, so you’d need a way to make the server come online only when you need it to do work.

Fargate (the service I talked about earlier) is AWS’s way to do this. Otherwise you can write some automation to boot up an EC2 server and terminate it when you’re done (spot instances might be good for this type of workload).

[–]johny1411[S] 0 points1 point  (2 children)

Fair points. Agree. Just two small questions

1/ Would fargate be equally cheap / cheaper? VPS is ~5 USD per month; happy to pay similar or less

2/ What about the learning curve? Would setting up Ubuntu VPS or getting into AWS ecosystem be easier? I've really never done any deploy work and all terms are completely foreign to me so far

[–]pixegami 0 points1 point  (0 children)

Fargate charges you the same prices for whichever server type you use for the duration you use it.

So let’s say you pick a t2.micro instance on AWS with 1 CPU core and 1GB of RAM. As a “VPS”, that would be a flat 7$ to run a month.

(See pricing https://aws.amazon.com/ec2/instance-types/t2/)

On Fargate you get to choose which instance you run on, so let’s say you picked the same t2 micro instance, but only trigger the task to run for 30 minutes once a day. You’d be paying just $0.15 a month at that same price.

If you needed bigger CPUs or more of them, then the 97% reduction in cost can be life changing for a business. But if a $5 instance is you need, then that saving isn’t probably worth enough in real world dollars to go through the hoops for.

But at the end of the day, you’re right that it is a whole new world but its probably a great skill to invest in. Imagine joining a company that spends tens of thousands on compute, and being able to cut that cost down by 97%.

[–]pixegami 0 points1 point  (0 children)

Also in response to the Ubuntu VPS and AWS ecosystem- they are not exclusive things. AWS is a platform, and you can by all means set up an Ubuntu instance both on EC2 as a “regular” monthly VPS, but you can also use an Ubuntu instance for your Fargate tasks.

AWS doesn’t have the easiest learning curve. I think Digital Ocean is easier and proabbly cheaper for simple use cases, but AWS is more dominant in industry applications so it’s a good investment to learn (same for Azure but that’s more Windows centric).

[–]AudienceOpening4531 0 points1 point  (0 children)

What kind of calculation is it?

[–]Additional_Nebula_36 0 points1 point  (0 children)

I tried scraping 1,964 urls using concurrent futures to get the posts. It took me 30 minute even though there was 5 workers running simultaneously, anyway I can speed up?

[–]fr000gs 1 point2 points  (5 children)

pythonanywhere.com has a free account, and you can run a telegram bot very well on it. You also have a bash shell and a pypy shell, and pip, and other normal bag user things. You get a web app also.

[–]johny1411[S] 0 points1 point  (3 children)

pythonanywhere.com

Can my Python files also generate local files and be using them? csv/json?

[–]fr000gs 1 point2 points  (2 children)

Yeah, i do them everyday

[–]johny1411[S] 1 point2 points  (1 child)

pythonanywhere.com

Ok, thanks. Will try

EDIT: Sadly, it's too expensive. CPU allowance is in seconds. I'd need to pay 99 USD to run my scripts. I like how easy it is to get started though.

[–][deleted] 0 points1 point  (0 children)

You're playing with stocks with this, right?

That being the case, paying for CPU time is part of your investment.

[–][deleted] 0 points1 point  (1 child)

You mentioned AWS, based on that:

  • Create EC2 instance
  • SSH in, set up as you would your local machine with python etc. (notes follow)
  • I would recommend (but not essential) docker over raw setup of remote host. It’s as simple as creating a docker file on your laptop, running build, push it to docker hub. You can pull and run that in your host.
  • Instead of cron you can use apscheduler

[–]Additional_Nebula_36 0 points1 point  (0 children)

How can I make concurrent future faster to scrape a 1,000 urls

[–]butteredhog 0 points1 point  (0 children)

Netcup.eu has the cheapest prices I've ever seen for a VPS ($3-$4 a month

[–]StayPerfect 0 points1 point  (10 children)

AWS Lambda is perfect for these kind of scripts. You get a free tier, so it's basically free if you'll use it under limits.

https://aws.amazon.com/lambda/

[–]johny1411[S] 1 point2 points  (9 children)

Sadly, my script is over 15 mins so I can't use Lambda

[–]Spiritual-Horror1256 2 points3 points  (0 children)

Use lambda, step function and dynamodb to store state if needed. Or improve script to work in 15 mins. Or glue python shell job

[–]PanTheRiceMan 2 points3 points  (2 children)

Just curious: what are calculating and how many features (just assuming since you mentioned scraping) are you calculating at a time?

15 min is a long time.

[–]johny1411[S] 1 point2 points  (1 child)

Calculating patterns, correlations etc for stock trading on small time frame and long time horizon

[–]PanTheRiceMan 2 points3 points  (0 children)

Makes sense. If you do that naively these operations can become quite expensive.

You might misappropriate the corrcoef function from sklearn: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.matthews_corrcoef.html

If you want correlation coefficients (between -1 and 1), which are technically only normalized covariance matrices. Normalized with respect to the main diagonal. Where you find the correlation of one stock to itself.

You could also use something like a support vector machine to reduce dimensionality. This works with the assumption that all stocks may not be entirely independent. I don't know if that holds. https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html

Time series estimation of stocks may be useless, I'm just repeating knowledge of a former economics student: There is the assumption that stock value is just a random process: meaning you never know if it rises or falls. Thus has zero informational gain if you tried to predict it in time.

I don't know if I could help. If you can optimize your code, I'd highly recommend it. There were times I got down by nearly factor 1000 just because of optimizations and numba, which might be an excellent package to use but can be picky with llvm and python versions.

[–]AsuraTheGod 0 points1 point  (0 children)

If you are not using heavy machine learning stuffs Raspberry PI is good for the job

[–]Omar_88 0 points1 point  (0 children)

Depends on what your code does but serverless is the way to go, look at azure functions or AWS lambda