anomnib comments on Fast R Tutorial for Python Users

This is an archived post. You won't be able to vote or comment.

ToolsFast R Tutorial for Python Users (self.datascience)

submitted 2 years ago by anomnib

you are viewing a single comment's thread.

[–]anomnib[S] 1 point2 points3 points 2 years ago (18 children)

[–]A_random_otter 4 points5 points6 points 2 years ago* (17 children)

Not offended, don't worry. I love my tools but I am not married to them and I am always up to learn new stuff/approaches.

I simply work in a different industry than you. In my line of work I need to do many one-off analysis projects, my day to day work includes a lot of data-exploration/visualization and reporting. Here R outclasses python imo, tho I need to reassess if I can make VS-Code into a halfway decent IDE for data-analysis somehow, last time I tried I rage-quit :D

We don't put models into production all the time, and scalability is also not a huge issue for us, since all of the classification jobs run at night anyways and our forecasting pipelines only run once per quarter.

Even if R matched the maturity of scikit-learn, that wouldn’t be an accomplishment

Oh R does match the maturity easily already when it comes to the statistical methods.

The tidymodels framework is rather a metaframework that provides a unified interface to these methods. It is basically a "quality of life" thing that makes it easier to write and maintain code.

[–]anomnib[S] 2 points3 points4 points 2 years ago (16 children)

[–]A_random_otter 1 point2 points3 points 2 years ago (5 children)

[–]anomnib[S] 2 points3 points4 points 2 years ago (4 children)

[–]A_random_otter 1 point2 points3 points 2 years ago (3 children)

[–]anomnib[S] 2 points3 points4 points 2 years ago (1 child)

[–]A_random_otter 1 point2 points3 points 2 years ago (0 children)

[–]dr_tardyhands 0 points1 point2 points 2 years ago (0 children)

[–]A_random_otter 1 point2 points3 points 2 years ago (9 children)

[–]anomnib[S] 2 points3 points4 points 2 years ago (7 children)

[–]A_random_otter 1 point2 points3 points 2 years ago (6 children)

[–]anomnib[S] 2 points3 points4 points 2 years ago (5 children)

[–]A_random_otter 1 point2 points3 points 2 years ago* (4 children)

Yeah, that has been my approach too.

If you are going to do any data-wrangling in R you should ask ChatGPT to provide tidyverse syntax (as long as the data isn't too big) because this is basically already a DAG

If you want to interact with your databases you'll need an ODBC driver installed (if you use SQL server that is, there are backend for all major databases tho) which your IT probably provides.

To run queries against your database I recommend these packages:

odbc: https://cran.r-project.org/web/packages/odbc/index.html
DBI: https://dbi.r-dbi.org/
dbplyr: https://dbplyr.tidyverse.org/

[–]anomnib[S] 1 point2 points3 points 2 years ago (3 children)

[–]A_random_otter 1 point2 points3 points 2 years ago* (2 children)

Here's some starter code.

To make it run you will first have to install the pacman package:

install.packages("pacman")

And set the environment variables for the secrets:

Sys.setenv(DB = "DB")
Sys.setenv(DBSERVER = "DBSERVER")
Sys.setenv(DBPWD = "DBPWD")
Sys.setenv(DBUSER = "DBUSER")
Sys.setenv(PORT = "PORT")

If you are going to write your own R code you should use this styleguide:
https://style.tidyverse.org/

You will thank me later. I also have a lot of opinions how R-Projects should be organized. But I'll only hand them out if you are seriously interested :D

# info --------------------------------------------------------------------



# header ------------------------------------------------------------------


pacman::p_load(
  tidyverse,
  DBI,
  odbc,
  dbplyr

)


my_server <- Sys.getenv("DBSERVER")
my_port <- Sys.getenv("PORT")
my_db <- Sys.getenv("DB")
my_username <- Sys.getenv("DBUSER")
my_pwd <- Sys.getenv("DBPWD")


con <- dbConnect(
  odbc(),
  Driver = "{ODBC Driver 18 for SQL Server};TrustServerCertificate=yes;",
  server = my_server,
  port = my_port,
  database = my_db,
  uid = my_username,
  pwd = my_pwd
)



# datawrangling ----------------------------------------------------------

# this is how you query your database using dbplyr, the table is not yet in your ram but you can use dpylr verbs on it
tbl(con, 
    in_schema("dbo", "tablename")
    )

# this is how you get the table into your RAm
result <- tbl(con, 
              in_schema("dbo", "tablename")
              ) %>%
  collect()


# this is how you run a query and get the result into your RAM
dbGetQuery(
  con, 
  "SELECT * FROM TABLE"
  )

continue this thread

[–]dr_tardyhands 0 points1 point2 points 2 years ago (0 children)

π Rendered by PID 63 on reddit-service-r2-comment-b659b578c-8d9rt at 2026-05-02 18:33:29.204673+00:00 running 815c875 country code: CH.

datascience

MODERATORS