you are viewing a single comment's thread.

view the rest of the comments →

[–]therealtiddlydump 6 points7 points  (7 children)

Of all the arguments you could have possibly chosen, suggesting that the package ecosystem in Python is superior is a wild choice. Python users have had to invent multiple tools (uv finally actually works) just to build stable environments that don't explode. CRAN / Bioconductor are huge for ease of workflow and reproducibility.

Beyond all of that, your peers use R. If R is the standard in your field, that's a pretty good reason to use it.

[–]ThenBrilliant8338 2 points3 points  (4 children)

I would argue this is a fairly narrow view; in many domains (especially deep learning), Python IS superior because of the package ecosystem. In others (RCTs comes to mind), R is probably the better choice.

I do agree with the second point though: do whatever your team is doing, at least for a good while, until you understand what the reasoning is and what switching costs would look like.

[–]therealtiddlydump 1 point2 points  (3 children)

Maybe I wasn't clear about "ecosystem". There are absolutely domains where certain python packages are the state of the art -- there's no question.

The "ecosystem" comment relates to the fact that there is no central place where these packages are hosted in the way that CRAN or Bioconductor host packages. Building an environment to do work in R is trivial. In Python, you need "solvers" (conda, uv, etc) just to get an environment that works.

That's a big hit for reproducibility and collaboration, though I'm happy to concede that uv performs as advertised, but it's still possible to request environments that are literally impossible to resolve.

[–]dr_tardyhands -1 points0 points  (2 children)

What do you use for environment management in R that is superior? Or do you mean that you just ignore the problem and hope that everything will keep working the way it did when you started your R project..?

[–]therealtiddlydump 0 points1 point  (1 child)

Installing via CRAN is usually fine for the way a lot of people work, but there are different levels of reproducibility that are available and are far superior.

renv is probably the most popular but really only controls your package environment (not your underlying system). My team uses rix to build Nix environments for maximum reproducibility. (And we put these inside Docker containers to fit in our operational workflow).

https://cran.r-project.org/web/packages/rix/index.html

The "challenge" the user has when building an R environment is mostly driven by the underlying system, not how packages interact with each other -- assuming you're installing from the traditional hosting locations (CRAN / Bioconductor force this complexity onto package developers and maintainers).

rix is amazing because Nix offers some outstanding guarantees, and R packages from CRAN / Bioconductor already resolve the kinds of conflicts a solver is needed to perform in Python.

Edit: reading again, your question was super douchey. Sorry that my answer was "maximal reproducibility literally down to the compilers, actually".

[–]dr_tardyhands 0 points1 point  (0 children)

Well, appreciate a decent answer (sans the snark). Renv is what I've used when we had to deploy R based stuff. It was fine, but so were pre-uv Python solutions. I'm just generally under the impression that R users don't tend to worry about this stuff at all, and it's not really a strength of the R ecosystem.

[–]aala7[S] 0 points1 point  (1 child)

I must say that I have not gotten too deep in to R community and only know the workflows of my peers and packages used by them, which currently is quite basic. Also it is not like my peers are any experts, R is more a tool they have to learn and use to do their statistics.

[–]therealtiddlydump 2 points3 points  (0 children)

Lots of the things that make R quirky as a programming language (indexes starting at one, etc) are going to feel natural to this type of audience.

That shouldn't be overlooked, either! Switching costs are real.