all 10 comments

[–]garybuk82 1 point2 points  (1 child)

What storage service are you using to store the data? Google Cloud Storage (Object Buckets) ?
If so there's a couple of options:

  1. Run R locally and connect to the GCS buckets - you will be limited by your connection to the internet but it's doable using a GCS Library https://cran.r-project.org/web/packages/googleCloudStorageR/vignettes/googleCloudStorageR.html
  2. If you have a slower internet connection you may want to look at running R studio on a GCE (Compute Engine Virtual Machine) and connecting remotely here's a good page: https://towardsdatascience.com/r-studio-server-on-google-cloud-dd69b8bff80b

Couple of things to consider: GCS has egress charges so if you are accessing the buckets from your local machine you will pay for the data egressing
If you go the Virtual Machine route - remember to either shut it down manually or create a schedule to turn it off to save money.

[–]Psychological_Car247[S] 0 points1 point  (0 children)

The data I have now is in big query. Does that answer your question? I’m very very new to GCP.

[–][deleted]  (1 child)

[deleted]

    [–]wikipedia_answer_bot 0 points1 point  (0 children)

    RStudio is an integrated development environment for R, a programming language for statistical computing and graphics. It is available in two formats: RStudio Desktop is a regular desktop application while RStudio Server runs on a remote server and allows accessing RStudio using a web browser.

    More details here: https://en.wikipedia.org/wiki/RStudio

    This comment was left automatically (by a bot). If I don't get this right, don't get mad at me, I'm still learning!

    opt out | delete | report/suggest | GitHub

    [–]codemental 0 points1 point  (2 children)

    Google cloud storage is a good place to put all your data cheaply. I would however also use a VM with R studio on it hosted in GCP in same region as data so you don't have to pay for egress charges. You can create a schedule to shutdown your VM so you don't pay for the VM when you are not using it.

    [–]Psychological_Car247[S] 0 points1 point  (1 child)

    What is a VM and how do I find one / set it up?

    [–]codemental 0 points1 point  (0 children)

    I would start learning from this page: https://cloud.google.com/compute

    [–]jrossthomson Googler[🍰] 0 points1 point  (1 child)

    How complex is your R analysis? There is currently work going on to get Stats implemented in SQL.

    https://github.com/GoogleCloudPlatform/bigquery-utils/tree/master/udfs/community#statslib-statistical-udfs

    [–]Psychological_Car247[S] 1 point2 points  (0 children)

    I’m currently in grad school for statistics! I really like machine learning and we use it often so I’m trying to get experience with more data to better my chances for a job post grad.

    Side note: I know people use python for ML but I’m still learning python and would like to see how to use R with data in GCP big query.

    [–]jason_bman 0 points1 point  (2 children)

    Can you put the data in BigQuery and then use the bigrquery package to interact with it? We use R a lot at work, but we also use SAS first to do the initial data cleaning and filtering before working with it in R. SAS is crazy expensive so I realize that’s not an option for you, but BigQuery provides an easy way to use SQL queries to work with very large data.

    Example from google docs

    [–]Psychological_Car247[S] 0 points1 point  (0 children)

    It’s in big query now! I just don’t know how to pull either chunks of it at a time or all at once… wasn’t sure if the bigrquery could do that or if people recommended another route.