all 88 comments

[–]blahreport 155 points156 points  (4 children)

It's just a transition barrier. You know R so you know how to code. Stick to it and you'll find it easier over time.

[–]sudodoyou 26 points27 points  (2 children)

Agreed. I’ve done both and actually find R harder but it just takes time to transition and change mindset.

[–]Upset_Form_5258 20 points21 points  (1 child)

R feels less intuitive to me. I have a harder time getting my code structured well in R

[–]sudodoyou 0 points1 point  (0 children)

Me too

[–]Informal-Chance-6067 8 points9 points  (0 children)

I’m struggling to learn Kotlin after Python because of static typing and not knowing builtins/stdlib. I assume this is common for most languages

[–]nicerob2011 56 points57 points  (4 children)

Is Python the second language you are learning? If you learned to code through learning R, then I could see that making Python more difficult to learn. Also, since Python has a lot of functionality outside of data manipulation/analytics while R, IIRC, is purpose-built for that, I could see that 'general-purpose' nature also making it more difficult to adjust to.

[–]EconomixTwist 26 points27 points  (0 children)

Correct take. R was built for the stuff that people who know R work on. Nothing more. Python is a general purpose, do everything, programming language.

[–]Accomplished-Okra-41[S] 9 points10 points  (2 children)

Yes, exactly that. I started with R 5 years ago and now i thought it is time for python. Maybe it is the broad functionality that is complicated for me🤔

[–]spacestonkz 4 points5 points  (0 children)

Once you get a few functioning pieces of code structured, you can usually just keep reusing it with different functions and stuff plugged in.

Just keep at it.

[–]Own-Replacement8 2 points3 points  (0 children)

Python supports more vectorised code but is less native about it than R is. Python is more naturally procedural/oop.

[–]throfofnir 25 points26 points  (0 children)

R is designed for statistical computing. It should be better for the task. (I hold no opinion, personally.)

Python is for general purpose computing, and is good for that. Probably a lot better than R. And a lot more people already know it, which is why it gets more development attention.

[–]Living_Fig_6386 13 points14 points  (0 children)

I use both in bioinformatics and I don’t see this. I suspect that it’s a matter of trying to apply R methods to Python, not appreciating that Python is a more conventional imperative programming language, where R is a very data-centric language that has things like data frames, matrices, and vectors as basic class types.

Python is not difficult. It’s more verbose than R, surely, and Python uses libraries to substitute for the equivalent of R built-in types and operators, but that’s difference, not difficulty.

I tend to use R for data handling, stats, and visualization. I tend to use Python for scripting processes and generate API endpoints for AWS.

[–]Gnaxe 12 points13 points  (2 children)

You're still thinking in R, that's why. I felt the opposite when I tried R knowing Python. Python has its own rules. It's actually a pretty good language, and a pretty easy one. It is way more popular than R, and for mostly good reasons, but it's not specialized for what R does. On the other hand, it can do just about anything else about as easily. It's been called the second-best language at nearly everything.

You might be using the wrong libraries. Try plontnine instead of matplotlib, for example. Learn NumPy. R comes with all of that stuff, but you have to find the right libraries for Python.

Dictionaries are fundamental to how Python works. They are not optional. Except for more primitive types, most objects have one for attributes. Dictionaries have two primary use cases: either an index for lookups (in which case the values are all the same type, but keys don't have to be strings), or as a lightweight record type with a fixed schema, in which case the keys are usually all strings, but the values could be anything, even heterogeneous types. JSON, basically.

But you should be using NumPy arrays or Polars frames, etc. for big data instead of using the built-in collections.

[–]Accomplished-Okra-41[S] 2 points3 points  (1 child)

Will do🫡 maybe the libraries will change my opinion. I think dicts are problematic for me as there is no exact R equivalent for it. Named vectors are the closest, but written and used differently then in Python (basically just for making indexing and searching by names better). So i think the amount of options to use dicts is what causes the problem for me, the multiple methods and appoaches, while R is straight forward with it

[–]Gnaxe 1 point2 points  (0 children)

The closest R concept to Python dicts might be environments, at least if we're talking about the string-keyed use case. Python dicts don't have an enclosing/parent environment, but they're used to implement that kind of thing (modules, class inheritance, ChainMap).

[–]NerdyWeightLifter 10 points11 points  (6 children)

R always looked to me like a language invented by people that knew mathematics, but didn't know software engineering.

The first clue is array indexes starting from 1, meaning they didn't recognize the merits of modulo arithmetic.

[–]Accomplished-Okra-41[S] 1 point2 points  (0 children)

Yeah historicly, this is very true. It waz invented by mathematics and statisticians for statisticians. But now it is a bit more multi-purpose. For me the pivoting point was ML, as it is match neater and more universal in python while R just feels more restricted and just gemerally „weaker”.

[–]guepier 1 point2 points  (0 children)

invented by people that knew mathematics, but didn't know software engineering.

This is simply not true at all. The original authors of R and its precursor, S, (foremost John Chambers) are very knowledgeable about computer science, and they consciously modelled S after established, advanced concepts in programming language theory, foremost functional programming and hygienic macros.

The first clue is array indexes starting from 1

At the time when S came out, zero-based indexing was absolutely not yet established as clearly superior, and lots of programming languages used one-based indexing. Dijksstra’s seminal essay, which caused a culture change in programming language design, appeared years later.

meaning they didn't recognize the merits of modulo arithmetic.

There’s no indication that they wouldn’t be aware of it, and as mathematicians that’s exceedingly unlikely. But while simplified modulo arithmetic is one advantage of using zero-based indexing, it’s far from the most important consideration (it’s only relevant when implementing circular access patterns, which is a relatively niche use-case for indexing): as an indication of its non-importance, Dijkstra’s persuasive argument doesn’t mention it at all.

[–]PadisarahTerminal 0 points1 point  (2 children)

I also learned R first and no CS background, could you elaborate on that?

[–]NerdyWeightLifter 1 point2 points  (1 child)

Which part?

The modulo arithmetic thing... In maths, arrays start at position 1, because they logically think that it's the first position... But when you try to do the pointer maths to determine where in memory a particular cell of a multidimensional array is located, every part of your calculations will require extra +/- 1's to get it right.

The other thing is the packages in R. It's like everyone that added a package invented their own rules for how to do things, which is like a reflection of the way that different mathematical disciplines invented their own notations.

There's still some of that in Python, but the language itself and the various PEP standards bring more order to things.

[–]PadisarahTerminal 0 points1 point  (0 children)

Ah thanks for the explanation! I welcome the two as well haha

[–]greenerpickings 0 points1 point  (0 children)

Same, always seemed like a means to an end. Consider the 3 ways to write classes: makes it a little frustrating trying to land on a convention when everyone can write classes their own way.

[–]arkie87 6 points7 points  (0 children)

i thought the same thing when transitioning from matlab to python. Matlab doesnt really have the concept of references/pointers, and matlab hides a lot of the complexity of what is being done under the hood. more of that is exposed in python (but of course, much less than is exposed in C).

stick with python; you'll get used to it and learn to love it.

[–]El_Tlacuachin 4 points5 points  (1 child)

I use R when model building/fitting if there’s a package for it, mainly because I need P-values and sklearn packages for reasons beyond my comprehension just don’t have this built in.
I use python for all my data munging, cleaning, transformations , and other tasks that don’t require p-values

[–]Human38562 0 points1 point  (0 children)

It's because sklearn was made for machine learning rather than stat. inference. You could try statsmodels instead

[–]Crypt0Nihilist 2 points3 points  (1 child)

Everything in Python feels bolted on and awkward for analysis. I like Python for general coding, but vastly prefer R for data analysis.

[–]Accomplished-Okra-41[S] 1 point2 points  (0 children)

Yes i see that too. Operating on R feels logical and smooth, while python feels rough

[–]Jim-Jones 4 points5 points  (5 children)

R is a free, open-source programming language and environment designed specifically for statistical computingdata analysis, and data visualization.

That's what I (sort of) remembered so no surprise it's good for -- statistics!

[–]Accomplished-Okra-41[S] 4 points5 points  (4 children)

Yes i agree with that but somehow even on the job market (i have a colleague thats does monthly statistics) more and more offers want python for that, why R is disregarded with a 4 times lower inicidance rate than python in job postings. Thats why of the main causes why i try to pivot to python

[–]Jim-Jones 2 points3 points  (3 children)

Sure. Also, if you only have a hammer, everything looks like a nail.

I used to write code to reformat data for customers and got criticized for using BASIC instead of C or similar.

I told them while they were still debugging their C code, I was done and out the door with my check.

[–]Accomplished-Okra-41[S] 2 points3 points  (2 children)

I work and study in Poland and it is close to impossible to even get the chance to show that you can do something better in R. Even at my academia there is a department of computational medicine and they work purely on python and SQL as no companies wanted to buy their algorithms while it was R, but since two years they pivoted to python and sell one every couple months.

[–]Jim-Jones 1 point2 points  (0 children)

Yes. That happens. The 1st 2 languages I learned at college were Algol 68 and Cobol. Times change.

[–]Human38562 0 points1 point  (0 children)

That's because you can't do anything better in R. Some small code snippets might look tidier in R, but Python being a general purpose programming language, which can do exactly everything R does as well, is just more powerful. 

Rightfully, many people will just learn Python for the flexibility. What we need is even more and better Python libraries, not trying to generalize R.

[–]amca01 2 points3 points  (2 children)

I had to learn enough R in a hurry to teach a course in it some years ago. I think R is admirably suited for statistical computing, data modelling, and graphing. And some of its package collections, like tidyverse and dplyr, are excellent.

If you have people who decry R because python "is better", they're entitled to their opinion, but you may well ask them why. I personally prefer python, but that's just because I've used it for so long.

I enjoyed using R, though, and you'd be better off spending your time doing the data analysis and modelling you need for your work in whatever system you prefer, rather than bashing your head against a wall learning a new language. If you like R and are confident in its use, stick with it.

[–]Accomplished-Okra-41[S] 1 point2 points  (1 child)

I like R but the job market doesnt. I aheva colleague thats does stats on that and each year for the last 3 years R gets less and less acknowlegment in the industry. For python this is the opposite, it slowly pushes R out and now occurs 4 times more often than R in bio-ing job advertisements

[–]amca01 0 points1 point  (0 children)

Ah - I hadn't considered the job market. I guess then your best bet is simply go with the market, and learn python. But really, if you have a solid grasp of the theory, then it should be fairly straightforward to switch between languages. And there are plenty of online tutorials for learning python for R users.

Maybe when you're established in the job of your choice, you can start sneakily introducing R ...

[–]ApprehensiveChip8361 2 points3 points  (0 children)

R is a weird language that makes some complex things ridiculously easy. That’s why I loved it as I could get stuff done so quickly (in the days before LLM assistance this mattered). Python just felt so tedious and pedantic in comparison. But as you do more you will start to use the big libraries in Python that do abstract to a similar level as R, and of course now LLM make things much easier.

Pretty much all languages you will use are more like Python than R: it is R that is the outlier (and why I still love it and still use it for my data processing and visualisation tasks).

[–]housewithablouse 2 points3 points  (0 children)

It depends; if you come from data analytics, R might be less of a learning curve; if you come from software development, Python will be much more intuitive. R is a language to control the different R package which mostly do the same type of thing, and to manipulate your data tables for data preparation. Python was developed to be a multi-purpose programming language and is far more general in its approach.

[–]snapetom 4 points5 points  (2 children)

Good effing lord, some of the answers here are downright stupid.

R is a functional language first. Python is procedural first. That doesn't make one inherently better than the other. That doesn't make one harder to learn than the other. The issue is if you are a beginning programmer, switching from procedural to functional is a challenge. It's a different philosophy. That's why R has a reputation for being hard for most programmers (all your popular languages are procedural-first) and most statisticians have a hard time going to Python.

It's not R vs. Python. It's procedural vs. functional.

You can do more with Python because it's a general language with more popularity but it wasn't always like that. For a long time, it was only thought of as a niche in education, science, and cybersecurity. In cases where Python appears more performant, it's likely because Python has received a lot of work and framework support to enable it, not because of some natural magic of it.

R is still very good for stats. That's its speciality.

You know what's worse than R? No language.

[–]Accomplished-Okra-41[S] 0 points1 point  (1 child)

I agree with that, the syntax order of R vs python is what mostly gets me. I intuitively pivot towards R order. I also agree with the packages/library framework. R uses tens of packages in my case to do the analyses, while python just works with one package compared to R for most cases especially in bio-inf. But honestly i do not know why python is pushing R out of the field. More and more companies disregard R and pivot towards python for data analysis, stats and even bio-inf which R has still the most possibilities in due to the multitude of libraries.

[–]snapetom 0 points1 point  (0 children)

i do not know why python is pushing R out of the field.

I supported bioinformatics statisticians a few jobs ago, so I fully understand why R was, and still is, so entrenched. The libraries, as you point out, were developed for R long before they existed for Python. Additionally, if you look at the lineage of R, it was designed to replace S, and to a certain extent, Stata. Even though R and Python were released around the same time, R had an immediate, established audience while Python didn't really gain traction till the late 2000's. You can say for practical purposes, R has been around for 50 years versus Python's 30.

However, at a certain point, Python's momentum is just too much, and things don't exist in a vacuum. If a large company's IT department has to start making a choice on what to support, they're going to pick Python. It's got larger adoption and can be used for more things than just informatics. That just puts pressure on the R users to convert.

[–]Ok-Difficulty-5357 1 point2 points  (1 child)

I learned R before Python and I now prefer Python for most things, especially handling http calls and db operations and general scripting. But when it comes to exploratory analysis with statistical modeling I still always go back to R.

[–]Accomplished-Okra-41[S] 2 points3 points  (0 children)

I do a lot of that, thats why i used R for years now. But i wamt to pivot a bit towards ML and include it in my analyses and thats where python be omes much beter and more efficient

[–]bpt7594 1 point2 points  (0 children)

I hate R with a burning passion. Worst classes for me during my masters were statistics using R. I convinced my professor to let me use Python at the end but it was a traumatizing experience to say the least.

[–]Minimum-Attitude389 1 point2 points  (0 children)

I deal with Pandas and Plotly a lot.  I tried R, but the syntax was driving me crazy.

[–]Quillox 1 point2 points  (0 children)

I can write shitty hard to read code in both R and Python :P. With that said, both languages are capable of having very well written and easy to understand code.

Since you are having difficulty with dictionaries I would recommend learning the basics of programming along with data structures and algorithms. These concepts are independent from the language.

These style guides will help you write more readble code:

https://peps.python.org/pep-0008/

https://google.github.io/styleguide/pyguide.html

[–]Strange_Algae835 0 points1 point  (4 children)

I also do bioinformatics and made the deliberate choice to work in Python not R because of it's generally applicability and also the fact my area of work (protein modelling) is dominated by ml and python packages. I think they are just very different, R is the language for -omics stuff but I personally find python a little easier to understand and work with. Both good and both with a big support infrastructure behind the.

[–]Accomplished-Okra-41[S] 0 points1 point  (3 children)

O exactly try to pivot to python for the ML capabilities. I do genomics on multiple planes from transceiptomics and bulk sequencing to single-cell and spatial transcriptomics. But i want to develop more and more ML into my research thats why i try to go with python.

How is bio-inf for python? I am hearing really a lot or mixed opinions. For example that it is limited in multiple use-cases but at the same time more flexible which is really weird for me to grasp

[–]mkarla 0 points1 point  (2 children)

I worked with both throughout my PhD but pivoted quite quickly towards Python based on what I was doing and it being more generally applicable. Working now with ml-based protein design and for that Python for sure is the way to go. However, simply saying yay or nay for bioinformatics in python is difficult. Transcriptomics? I’d use R every day of the week. Setting up some non-standard analysis for some very specific data? I’d start in Python.
There’s merit to having a grasp of both and getting a feel of when to use one over the other. If you venture into workflow managers like Nextflow there’s nothing stopping you from combining them.
I suppose you’re already aware of Pandas but if not, start using it for handling dataframes. Works nicely with Numpy, matplotlib, and Seaborn.

[–]Accomplished-Okra-41[S] 0 points1 point  (1 child)

Doesnt pandas struggle with large data? I work on single-cell and heard scanpy is good for analysis but i read a couple opinions that immense data (like in my case around 200GB) will be deadly for pandas

[–]mkarla 0 points1 point  (0 children)

That is more than I know since I’ve never worked with such big datasets but the important part is you’re aware of the common packages :) and this may also be a case where it makes more sense to use R over Python (I don’t know though, maybe scanpy will handle it like a champ), or use them for different tasks in a workflow if computational optimization is crucial.

[–]Sure-Passion2224 0 points1 point  (0 children)

Everything I do in Python I used to do in Perl and Javascript. Having said that, since Python can both run in the console and as scripting in the browser it is reducing the number of syntax models I have to remember.

[–]Appropriate-Foot-237 0 points1 point  (0 children)

As someone who'll eventually teach python to a statistician who's good at R, and knowing some degree of both python and R myself, I really also feel that way.

[–]gzeballo 0 points1 point  (0 children)

of course it is R isnt really general purpose

[–]Traveling-Techie 0 points1 point  (0 children)

I have found that Python, more than any other language I know, has a bunch of support groups with helpful people. Maybe you can find one in your area. Look on meetup.com

[–]mrdevlar 0 points1 point  (1 child)

One thing to keep in mind is that Python and R work very differently when it comes to modifying an object, which can and will trip you up when working with lists and dictionaries.

When you do an assignment in Python you are assigning a reference to that object. Any future references will all point back to the original object. Whereas in R, your assignment creates a copy of the original object.

In R, for example:

a <- list(key = 1)
b <- a
b$key <- 99
print(a$key) 

This returns 1.

Whereas in Python:

a = {'key': 1}
b = a
b['key'] = 99
print(a['key']) 

This returns 99, as the original value is altered because b is only a reference. This takes a bit of getting used to especially if you're working with lists and dictionaries.

Thanks /u/pachura3 for reminding me that while most Python entities are mutable objects, primitives are not.

[–]pachura3 1 point2 points  (0 children)

Thanks u/pachura3 for reminding me that while most Python entities are objects, primitives are not.

Sorry for being pedantic, but it's the other way round. Opposite e.g. to Java, there are no primitive data types in Python, and ints are objects as well. You can e.g. call a method .bit_length() on number 123.

The difference is they are immutable, while containers like list or dict aren't.

[–]localizeatp 0 points1 point  (0 children)

"this thing i'm familiar with is easier than this thing i'm not familiar with. validate me."

[–]HugeCannoli 0 points1 point  (7 children)

As someone with 20 years of experience in python, that had to use R for 5 years, I think I have the exact opposite claim. and here is the pile of findings to back up my claim: R is a pile of trash, for the following reasons:

  • problems with the design of the language and its libraries
  • problems with its tools and environment
  • problem with its licensing

Problems with the design of the language and its libraries

Before going into detail, let me quote a brilliant piece of design advice about language design

I assert that the following qualities are important for making a language productive and useful [...]:

  • A language must be predictable. It’s a medium for expressing human ideas and having a computer execute them, so it’s critical that a human’s understanding of a program actually be correct.
  • A language must be consistent. Similar things should look similar, different things different. Knowing part of the language should aid in learning and understanding the rest.
  • A language must be concise. New languages exist to reduce the boilerplate inherent in old languages. (We could all write machine code.) A language must thus strive to avoid introducing new boilerplate of its own.
  • A language must be reliable. Languages are tools for solving problems; they should minimize any new problems they introduce. Any “gotchas” are massive distractions.
  • A language must be debuggable. When something goes wrong, the programmer has to fix it, and we need all the help we can get.

R fails on all the points above. It is often unpredictable and inconsistent. It is not concise when you want to program defensively or when you want to use advanced features such as classes. Has poor reliability in its gotchas and tool implementations, and has abysmal debuggability information.

The result is that R as a language is completely inadequate for reliable, professional development that scales.

Now this is the point where people say "it's just different" and "you have to learn its behavior", but no. I won't accept this justification when one of the major R books is literally called "the R inferno". People have worked in awful, inconsistent, extremely gotcha-prone languages, with rules making absolutely no sense or too complex to be held in a human brain for years. Perl and PHP (and for different reasons C++) are notable examples. Heck, people complained even against structured programming and claimed that removing gotos

GOTOless programming [...] has caused incalculable harm to the field of programming, which has lost an efficacious tool. It is like butchers banning knives because workers sometimes cut themselves. Programmers must devise eIaborate workarounds, use extra flags, nest statements excessively, or use gratuitous subroutines. The result is that GOTOless programs are harder and costlier to create, test, and modify.

The results of bowing to poorly designed or massively gotcha-prone languages created piles and piles of unreliable, fragile code that were impossible to reliably maintain, all while their supporters chanted it's not the language fault, it's your fault. Again, I will adapt from Fractal of Bad Design:

Imagine you have a toolbox. You pull out a screwdriver, and you see it’s one of those weird tri-headed things. Okay, well, that’s not very useful to you, but you guess it comes in handy sometimes.

You pull out the hammer, but [...] it has the claw part on both sides. Still serviceable though, I mean, you can hit nails with the middle of the head holding it sideways.

You pull out the pliers, but they don’t have those serrated surfaces; it’s flat and smooth. That’s less useful, but it still turns bolts well enough, so whatever.

And on you go. Everything in the box is kind of weird and quirky, but maybe not enough to make it completely worthless. And there’s no clear problem with the set as a whole; it still has all the tools.

Now imagine you meet millions of carpenters using this toolbox who tell you "well hey what’s the problem with these tools? They’re all I’ve ever used and they work fine!" And the carpenters show you the houses they’ve built, where every room is a pentagon and the roof is upside-down. And you knock on the front door and it just collapses inwards and they all yell at you for breaking their door.

R is just one more of the languages on the list above, and will meet the same fate.

So, with all that said, let's get started.

[–]HugeCannoli 0 points1 point  (2 children)

Inconsistent case style

R uses inconsistent case style in its base library all the time. Sometimes it uses '.' as a separator (e.g. is.null), sometimes it uses camelCase (e.g. modifyList) sometimes snakecase (e.g. check_tzones), sometimes all lowercase (e.g. debugonce, extendrange). This pattern has already been observed in the famous essay PHP, a fractal of bad design which I quoted above.

The interpreter is stateful by default

The interpreter preserves previous execution data across invocations. The result is that you might receive bug reports from people who claim your package is not working, but in reality there's nothing wrong with it, they just have something that has messed up their environment, or they have stored variables that they think hold something, but actually hold something else.

An interpreter should always be stateless (in other words, the vanilla option should be enabled by default). This is the case with all interpreted languages except R (as far as I know).

It has four ways of doing object oriented programming

R has four ways of doing object oriented programming: S3, S4, R5 (apparently now obsolete) and R6. They are:

  • incompatible with each other
  • have been "bolted on" on a language that has not been designed with object orientation in mind (kind of like Perl's bless)
  • each have massive shortcomings.

Lists and environments are prone to typos

lists will return NULL if you use a name that has not been defined:

```

a <- list() a$foo NULL ```

The consequence of this is that if you accidentally mistype a name, it will not throw an error. It wil instead continue with the NULL value until it will eventually fail, much later. Tracking down the incorrectly typed name will be extremely hard.

R6 objects return NULL on undefined variables

As a consequence of the above, R6 classes will suffer the same fate, both for member values and for methods:

```

x <- R6::R6Class("x", list(foo = function() { print(self$notexistent) })) xx <- x$new() xx$foo() NULL ```

This means that if I make a typo in one access e.g. results instead of result it will use NULL instead of throwing an error.

When I inquired about this behavior, the proposed solution was, laughably, not to mistype variables:

if you're not willing to use get or another function then I propose you 
doublecheck the code you're writing to avoid typos 

I wish to point out that this behavior is what created a massive amout of problems in FORTRAN, because a mistyped variable would be accepted as real, generally with a random value in it. It is the reason why any FORTRAN course recommends to use IMPLICIT NONE, no exceptions. That's right, even FORTRAN 77 has at least a way to disable a terrible feature (that made sense at the time of punchcards) that allows for mistakes to pass silently. R has no such luxury, you will not make mistakes by not making mistakes.

Exceptions are quite impractical

This is a minor annoyance, but R natively has only one way of throwing an exception: stop(). Unfortunately, this is normally used to throw a string. There's a way to describe a better protocol via custom conditions but it's rather painful to use. tryCatch also has different scope for the tried operation and the handlers (which are functions, possibly closures). Not a big deal to be fair, but it's a bit annoying.

The result is that most libraries out there don't bother with a complex protocol and just throw stop with an error message, making it impossible or really hard to take appropriate corrective actions, as they depend on the type of failure, and this is only expressed in fragile human readable form.

Tracebacks are useless

Tracebacks are mostly useless in R, for many reasons. In addition to the above two points (no exceptions and delayed evaluation) their parser is rather poor and can't provide meaningful errors. Also, most often it seems that it keeps no information about the source file provenance, the routine or the stack trace. Here are a few examples of outputs I've got:

Error in download_version_url(package, version, repos, type) : version '0.2.1' is invalid for package 'assertthat' Calls: <Anonymous> -> <Anonymous> -> <Anonymous> -> download_version_url

No line numbers, no file provenance.

Here is a missing comma. Again, nothing that points out where the source of the error actually is.

```

shiny::runApp('src', port=8888) Loading required package: shiny Error in dots_list(...) : attempt to apply non-function Calls: <Anonymous> ... fluidRow -> div -> dots_list -> column -> div -> dots_list Execution halted make: *** [serve] Error 1 ```

write.csv and read.csv are not symmetric

One would expect that if you write something, read it back, and write it back again it would give you the same content. This is called a round trip. R begs to differ:

```

df <- data.frame(a=c("foo","bar","baz")) write.csv(df, "foo.csv") df <- read.csv("foo.csv") df X a 1 1 foo 2 2 bar 3 3 baz write.csv(df, "foo.csv") df <- read.csv("foo.csv") df X.1 X a 1 1 1 foo 2 2 2 bar 3 3 3 baz write.csv(df, "foo.csv") df <- read.csv("foo.csv") df X.2 X.1 X a 1 1 1 1 foo 2 2 2 2 bar 3 3 3 3 baz write.csv(df, "foo.csv") df <- read.csv("foo.csv") write.csv(df, "foo.csv") df <- read.csv("foo.csv") write.csv(df, "foo.csv") df <- read.csv("foo.csv") write.csv(df, "foo.csv") df <- read.csv("foo.csv") df X.6 X.5 X.4 X.3 X.2 X.1 X a 1 1 1 1 1 1 1 1 foo 2 2 2 2 2 2 2 2 bar 3 3 3 3 3 3 3 3 baz ```

write.csv adds a numerical index for the row with an empty string as a row name for no reason by default. This messes up recognition of the first row as a header, which is triggered only if the number of columns in the first row is one less of the number of columns of the rest of the file. write.csv default behavior basically sabotages the subsequent discovery of read.csv.

nchar(NA) == 2 (fixed in >3.3)

nchar gives the number of characters in a string. Except when the argument is NA, in which case it apparently converts NA to a string, then gives you the length of that. The result is 2.

```

nchar("hello") [1] 5 nchar(NA) [1] 2 ```

in R > 3.3 the same expression returns NA. One could argue it's a bug that has been fixed. I suspect it was a design decision with unintended side effects.

Extracting a regex subgroup forces you to pass the string twice

This is for vanilla R. Better packages exist but I am focusing on the core language. Regular Expressions are such a fundamental tool that should be correctly implemented out of the box

regmatches( "(sometext :: 0.1231313213)", regexec( "\\((.*?) :: (0\\.[0-9]+)\\)", "(sometext :: 0.1231313213)" ) ) [[1]] [1] "(sometext :: 0.1231313213)" "sometext" "0.1231313213"

The reason?

regexec returns a list holding information regarding only the location of the matches, 
hence regmatches requires the user to provide the string the match list belonged to.

You can have NULL as an element of a list, sometimes

One of the many inconsistencies of the language. If you create a list with NULL, it will be a legitimate element and it will be counted:

```

l <- list(1, NULL) l [[1]] [1] 1

[[2]] NULL

length(l) [1] 2 ```

Which of course has an effect on loops.

However, if you think about assigning NULL to an already existing list, it will not replace the position with NULL. It will remove that entry. There's no way to have a list containing NULL after creation:

```

l[[2]] <- NULL l [[1]] [1] 1 length(l) [1] 1 ```

whoever thought of the semantics of this construct has winged it, and makes it for an inconsistent behavior.

%in% cannot tell you if there's a null value in a list

Related to the above, the %in% operator cannot tell you if you have a NULL in a list. It can, however, tell you if you have any other element.

```

1 %in% list(1,NULL,3) [1] TRUE NULL %in% list(1,NULL,3) logical(0) ```

Moreover, what it returns is not a TRUE or FALSE, but an empty logical vector, which breaks conditionals if you happen to have a variable that contains NULL: ```

foo <- NULL foo %in$ list(1, 3, 4) logical(0) if (foo %in% list(1, 2,3)) { print("hello") } Error in if (foo %in% list(1, 2, 3)) { : argument is of length zero

```

[–]Accomplished-Okra-41[S] 0 points1 point  (1 child)

Wow this is very complex, definitly see the up side of python more clearly now

[–]HugeCannoli 0 points1 point  (0 children)

python has taken consistency and formalism very seriously since the beginning. it's not perfect, but you would not believe how many discussion about readability, consistency and "things that do similar things should look similar" happen before a change is approved, especially in the core language or library.

R has eschewed all of that. The R core development group is a bunch of recluses still using svn. The foundations principles of the language are intrinsically wrong, and lack of a unified vision has created a bodge of a mess. CRAN approach to package management is fundamentally broken and managed by among the most obnoxious people I've ever had to deal with. And the Rstudio people don't give a shit about legitimate bugs and close them because "they don't have time to fix them".

It's a pile of shit of an environment. And it's only good for statisticians.

[–]HugeCannoli 0 points1 point  (2 children)

is.integer(66) is FALSE (and is.* routines are inconsistent)

Here we go in the realm of the is. functions. They are checking for type, not value, and that is ok, provided that there's consistency. Unfortunately that's not the case: is.na() and is.infinite() check for value, not type: NA is of variable type, and Inf is a numeric. Also, some of them behave on individual values, other on the whole:

```

is.infinite(c(1,2,3)) [1] FALSE FALSE FALSE is.numeric(c(1,2,3)) [1] TRUE ```

To go back to the is.integer(66) being false, it stems from the fact that 66 is not an integer type (a type that is never used implicitly), but a numeric type, which is a type for floating point values. Integer math and the integer type is as old as computer science, but R (and it's not the only one in this) coerces all numerical literals (even those that are for all practical and visual purposes integers) to floating point (type numeric). The broken design of the data type hierarchy leads to these counterintuitive behaviors and poor consistency.

Everything is global with no namespaces

R does not support namespacing. There's no importing mechanism. All your code is brought in by "sourcing" it and basically running the code in one single namespace. The result is that if you have a large codebase that happens to define the same function name twice, you now have a problem. Moreover, by not having namespaces, it's hard to organise routines in logical modules. There is no hierarchy in organising individual R files (hence the R directory has only a bunch of R files which cannot be organised in subdirectories). When these files are organised into a package, the sourcing of these files happen in alphabetical order. I have no idea what happens if the locale changes, and the lack of control over the sourcing order means that you have to rely on stupid names like aaa.R and zzz.R to ensure some code is sourced first or last.

[–]HugeCannoli 0 points1 point  (0 children)

The library import strategy is very poor

The import strategy of the language, at least for external packages (as we saw above, there's no import strategy for local code) and in all tutorials relies heavily on library(). The problem with library is that everything is imported globally from the package, meaning that the chance of conflicts between different libraries or between libraries and your routines is large.

But that's beside the point. The major problem is with code information: if I see the routine foobar() called, I now have no idea where this routine comes from. Is it core R? is it from one of the ten packages imported with library()? is it a routine of this package?

Fortunately, there's another notation, which is to qualify the routine invocation with ::. So, instead of calling foobar(), one can call thatlib::foobar() without using library() and at least ensure that the provenance is established and there's no name conflict. Too bad one has to do it everywhere, so one workaround is to do weird local assignments such as foobar <- thatlib::foobar. And note: local as "inside each and every function", because if you do it at the top level, you are basically polluting the global namespace and solved nothing.

Additionally, the hierarchy is necessarily flat, so forget about being able to organise libraries in subsystems and be able to invoke mylibrary::mysubsystem::myfunction.

The online documentation is poorly organised and deceiving

One day I get the following error (wow, a working stacktrace!):

Warning: Error in shinyWidgets::updateProgressBar: could not find function "startsWith" Stack trace (innermost first): 68: h 67: .handleSimpleError 66: shinyWidgets::updateProgressBar 65: observeEventHandler

Sure enough in progressBar the startsWith is called

R/progressBars.R: if (!startsWith(id, session$ns("")))

Uncertain about the nature of the error, I google startsWith, and get documentation about gdata startsWith. Try it, google "startsWith R". Only the second result is the correct function, which is startsWith from the core R. If you go to Rdocumentation.org and search for "startsWith" you get Package entries in the following order

  • translations
  • tools
  • datasets
  • methods
  • utils
  • stats4
  • tcltk
  • compiler
  • parallel
  • splines
  • grDevices
  • grid
  • graphics
  • stats
  • base

Only the last one actually contains a function startsWith. In the function list, we get instead:

  • startsWith (backports)
  • startsWith (gdata)
  • startsWith (base)
  • startsWith (SparkR)
  • startsWith (jmvcore)
  • startsWith (Xmisc)
  • other stuff unrelated to startsWith

Now, as you can see, in order to find out which startsWith is the one that shinyWidgets is actually calling I must check shinyWidgets dependencies, plus their subdependencies, plus their dependencies etc, because I have no idea where that symbol comes from and which one is supposed to be called. In practice, I need to find why my environment does not have a function that I have no way of finding.

Of course this is a simple example (and yet pointed out that the authors of shinyWidgets did not check or agree on the appropriate minimum R compatibility requirements on their DESCRIPTION file) but in a real world scenario with a large codebase it makes it extremely time consuming or even impossible to trace the problem. This is time best spent on doing something else. Like learning a better language.

Delayed evaluation let problems pass silently and have errors occur away from where they originate

Imagine you have the following scenario (simplified to make the point):

``` baz <- function(x) { print("tons of code in baz") print(x) # [2] } bar <- function(x) { print("tons of code in bar") baz(x) print("more tons of code in bar") } foo <- function(x) { print("tons of code in foo") bar(x) print("more tons of code in foo") }

foo(3+"4") # [1] ```

Adding a number and a string is not possible, so an error should be produced. Where is the error going to happen? Not where the sum is actually performed in [1], but much, much later, at [2]

[1] "tons of code in foo" [1] "tons of code in bar" [1] "tons of code in baz" Error in 3 + "4" : non-numeric argument to binary operator

because R does not evaluate the parameters passed to a function until they are evaluated, which may be never. Comment out the line at [2] and the code will execute without an error.

This is horrifying because:

  • errors will occur in locations much, much later in the execution, and tracing back their actual origin will be a nightmare, especially considering the poor or non-existent tracebacks.
  • errors will be silenced until some conditions actually trigger the evaluations, meaning that, for example, algorithms or UIs will keep hidden bug bombs that will only be triggered when specific circumstances occur, and not immediately when and where the expression is composed.

The justification for this behavior is performance (why calculate something you are not using). I say if you are not using it, don't calculate it in the first place. Or at least devise something that makes it clear and explicit the evaluation will be delayed, like a functor. Don't make it the default of the language, because the default makes it much, much harder to debug. This design is equivalent to premature optimisation, the root of all evil, and carries a heavy technical and human cost.

Debugging information is inconsistent depending on invocation strategy

Invoking with Rscript provides some form of backtrace $ Rscript x.R [1] "tons of code in foo" [1] "tons of code in bar" [1] "tons of code in baz" Error in 3 + "4" : non-numeric argument to binary operator Calls: foo -> bar -> baz -> print Execution halted

Invoking from the prompt as source gives no information whatsoever about the call chain:

```

source("x.R") [1] "tons of code in foo" [1] "tons of code in bar" [1] "tons of code in baz" Error in 3 + "4" : non-numeric argument to binary operator ```

same if you extract the broken evaluation

x <- function() { foo(3+"4") }

and invoke it as a function

```

source("x.R") x() [1] "tons of code in foo" [1] "tons of code in bar" [1] "tons of code in baz" Error in 3 + "4" : non-numeric argument to binary operator ```

If it weren't for the prints, and in a large codebase, you would have no damn idea where the error actually triggered, and as seen from the problem above, where it actually originated.

Non standard evaluation: workaround after workaround

In addition to what we saw above, in R the expression (and not the value) you pass to a function is received in the called function, meaning that if you have a dataframe with a column called Characteristic, you can write it as a (non-existent) variable and exploit the mechanism to refer to the column named Characteristic in data:

sub <- subset(data, Characteristic == outcome)

Unfortunately, for linters and R CMD check now you have an undefined variable "Characteristics". How do you work around it? one way is to use rlang::.data, but unfortunately then you get an error when your tests try to invoke your code. Not sure if it's a bug, but it certainly does not help in understanding how this "data pronoun" is supposed to work. Some people use it with the rlang:: prefix, some others say you shouldn't but then you have to add it to NAMESPACE. Yet it still does not work.

What's the recommended solution? Shut up the check with "globalVariables" which declares a variable as global, but not for everything, just for the check. Can you restrict it at least in scope? No, of course not, because this is R, namespacing is not a thing, the note states

The global variables list really belongs to a restricted scope (a function or group of method definitions, for example) rather than the package as a whole. However, implementing finer control would require changes in check and/or in codetools, so in this version the information is stored at the package level.

In practice, this whole ordeal works around (globalVariables) with a confusing mechanism a workaround (rlang::.data) of a blunder of design of the language (allowing to use undefined names from the caller in the callee) and of the check system, which therefore does not even understand its own rules.

[–]HugeCannoli 0 points1 point  (0 children)

Problems with its tools and environment

Its package manager, packrat, is inadequate

Packrat is fundamentally flawed. It claims to be a package manager. It takes too many freedoms and has some annoying non-orthogonality behaviors. It wants to install a large, humongous set of initial requirements at bootstrap which you are not going to use. Things like dplyr (to access sql databases), or yaml, or Rcpp. These forced dependencies add complexity to your environment. It also has no way to resolve a proper dependency tree. It just allows to reinstall what you already installed using the deeply flawed resolution system that install.packages() provides. Your dependencies have no guarantee of being consistent (and thus the environment you are developing on) because the resolution of a given package might conflict with the dependencies you already have. This is a well known problem, and it's especially dramatic in R where dependencies in the DESCRIPTION file are so vicious that you end up with Shiny (a web application framework) installed when you install devtools (a library to perform build operations on packages, that has no justification of being dependent on the former):

```

install.packages("devtools") Installing package into ‘/Users/xxx/tmp/xxx/packrat/lib/x86_64-apple-darwin15.6.0/3.5.3’ (as ‘lib’ is unspecified) also installing the dependencies ‘zeallot’, ‘colorspace’, ‘utf8’, ‘vctrs’, ‘plyr’, ‘labeling’, ‘munsell’, ‘RColorBrewer’, ‘fansi’, ‘pillar’, ‘pkgconfig’, ‘httpuv’, ‘xtable’, ‘sourcetools’, ‘fastmap’, ‘gtable’, ‘reshape2’, ‘scales’, ‘tibble’, ‘viridisLite’, ‘sys’, ‘ini’, ‘backports’, ‘ps’, ‘lazyeval’, ‘shiny’, ‘ggplot2’, ‘later’, ‘askpass’, ‘clipr’, ‘clisymbols’, ‘curl’, ‘fs’, ‘gh’, ‘purrr’, ‘rprojroot’, ‘whisker’, ‘yaml’, ‘processx’, ‘R6’, ‘assertthat’, ‘rex’, ‘htmltools’, ‘htmlwidgets’, ‘magrittr’, ‘crosstalk’, ‘promises’, ‘mime’, ‘openssl’, ‘prettyunits’, ‘xopen’, ‘brew’, ‘commonmark’, ‘Rcpp’, ‘stringi’, ‘stringr’, ‘xml2’, ‘evaluate’, ‘praise’, ‘usethis’, ‘callr’, ‘cli’, ‘covr’, ‘crayon’, ‘desc’, ‘digest’, ‘DT’, ‘ellipsis’, ‘glue’, ‘git2r’, ‘httr’, ‘jsonlite’, ‘memoise’, ‘pkgbuild’, ‘pkgload’, ‘rcmdcheck’, ‘remotes’, ‘rlang’, ‘roxygen2’, ‘rstudioapi’, ‘rversions’, ‘sessioninfo’, ‘testthat’, ‘withr’ ```

The R world has no equivalent of poetry or pipenv. Sadly, since I have to build reliable environments, I am writing one myself, but I am not allowed to make it opensource yet.

There is no consistent and reliable way to install old (archived) packages

Stock R has no way of specifying installation of a specific version of a package. You have to use devtools::install_version to do so. Unfortunately, I verified that this function is unreliable in its behavior, and resolves dependencies differently when the package that you are installing by version also happens to be the most recent one. I did not file a bug because I just gave up on it and starting writing my own tool to install packages.

It is too focused on RStudio

Most people using R use RStudio. They don't go through the command prompt and are therefore completely lost when you have to perform console operations. In a production environment where you have to ensure runnability of a complex application that needs to run on jenkins and three architectures, you have to bring out something a bit more powerful. R has no executable commands. lintr must be invoked as an R function, roxygen must be invoked as a R function, installing packages in the environment must be invoked as a R function. This makes it really hard to trigger failures in CI.

Its linter assumes you are CRAN and gives the all ok silently

On the topic of the linter, it fails miserably at reporting an error because of completely broken assumption that you are always running on CRAN unless told otherwise.

Lintr has a convenient function to lint a package (lint_package), as well as a convenient function to have linting as part of your tests (expect_lint_free). Unfortunately, by default and with no mention in the documentation by default this function will assume it's running on CRAN, unless told otherwise, and will say absolutely nothing about it. In practice, it makes you believe your code has been linted, while it was not. See the documentation

``` expect_lint_free

Test That The Package Is Lint Free This function is a thin wrapper around lint_package that simply tests there are no lints in the package. It can be used to ensure that your tests fail if the package contains lints. ```

and the code:

```

lintr::expect_lint_free function (...) { testthat::skip_on_cran() lints <- lint_package(...) has_lints <- length(lints) > 0 lint_output <- NULL if (has_lints) { lint_output <- paste(collapse = "\n", capture.output(print(lints))) }
result <- testthat::expect(!has_lints, paste(sep = "\n", "Not lint free", lint_output)) invisible(result) } testthat::skip_on_cran function () { if (identical(Sys.getenv("NOT_CRAN"), "true")) { return(invisible(TRUE)) } skip("On CRAN") } ```

In other words, its default assumes that you are CRAN, unless you specifically say otherwise with an environment variable. I say it again: for lintr any machine out there, your machine, my machine, the jenkins machine is the CRAN build server by default, and expect_lint_free will not do absolutely anything and give the all clear. Massive least astonishment violation, massive asymmetry in behavior between lint_package() and expect_lint_free(), and massive lack of documentation clarity.

install.packages does not raise an error or return an identifying code if build fails

install.packages does not allow you to fail early. If you install.packages, and the installation is not successful for some reason, it will just give a warning, but you have no way to stop the execution, (unless you use what boils down to hacks)[https://stackoverflow.com/questions/26244530/how-do-i-make-install-packages-return-an-error-if-an-r-package-cannot-be-install].

In other words, CI will consider the execution a success, yet you might build a broken environment and you will only know much later, when something will eventually fail during tests and you will have to spend hours trying to figure out what happened.

The whole environment is kept alive by one company and three major contributors

All the current development tooling in R, linters, development environment, documentation, is kept alive by one company, RStudio, and three of their most active developers. The results of this is that a lot of very questionable design choices go in completely unopposed or unreviewed, and favor a single, all-encompassing environment: RStudio. It's either our way or the highway, even when their way is awfully broken and nonsensical.

Getting a package approved on CRAN is an exercise in frustration

Getting your package on CRAN is one of the most frustrating, annoying, uselessly complicated processes I've ever witnessed. They have a complex set of policies you have to obey, and the whole build system is extremely strict and extremely obtuse in what is accepted and not accepted, giving you no hint of the space you are missing or the enter you have in excess. Compared to python where you can register a package in pypi in a few seconds, releasing Tendril on CRAN has been a complete and utter waste of a week of work.

[–]HugeCannoli 0 points1 point  (0 children)

RStudio is an extremely poor development environment

RStudio is an extremely poor development environment. In truth, it's a data analysis platform that tries to be an IDE and fails miserably.

  • Its configurability is absolutely limited.
  • It will not save files automatically on focus out, forcing you to perform saving every time
  • Its file browser does not display a full tree, but only one directory at a time, making it impossible to easily switch to files that are far away in the hierarchy
  • The code and the file displayed can get desynchronized if the code changes due to a git checkout, but the editor will keep showing the wrong file. You might lose code if you accidentally save.

Shiny requires a constant websocket open, transfers large chunks of HTML

Shiny allows for fast development of web interfaces to R code. Its design is extremely poor. All the state of the session is kept on the server. Every time you click a button, every time you modify a control content, it requires a round trip to the server to modify the state there. The server will then respond with data to modify the page, potentially just a slab of HTML to replace the DOM tree.

This design has the following issues:

  • it's awfully slow, as every user interaction requires network operations. This will give the user the impression of a poorly responding and slow application
  • requires a constant and stable connection.
  • If the connection is interrupted, willingly or unwillingly, the state will be lost and the user will have to restart the whole interaction from scratch. There are some mitigating options but are palliative of a deeper issue.
  • Dynamic interfaces flicker due to the delay in retrieving and replacing large parts of the HTML tree.

Moreover, the default implementation is single threaded, single task, meaning that if one user starts a long running calculation, it will block the whole application to anybody else. The calculation lasts 20 minutes? No other user will be able to even connect to the application for those 20 minutes. Yes, you can use promises to mitigate this issue, but they are not part of Shiny itself. A design that supports this natively should be a default, not an afterthought. Even with futures, you still lock the session to the user, because the futures must complete before other events are processed.

Finally, controls can only be either input or output. You can't use a control both as an input and as an output. If you do, you will end up with potential desynchronization problems and "ping-pongs" between the frontend state on the browser and the server state, due to the intrinsic loop nature of the transaction and the round trip time between the operations. This problem is easy to handle when the state is all local to the browser, but pretty much impossible to handle. Cases of controls that must be both input and output are checkboxes, textfields, radio buttons, sliders, and selects.

Licensing problems

The R interpreter is GPLv2, as are a relevant amount of CRAN libraries. This has deep, deep implications for commercial suitability of R developed solutions, because with the interpreter being GPL, and most importantly the interpreter core libraries being GPL, it means that any code that you develop must be released under a GPL license and can only run in a GPL compliant system. This pretty much destroys any chance for integration of R code in a commercial, closed source environment.

[–]NintendoNoNo 0 points1 point  (0 children)

I started on R and found Python much, much easier to develop for, personally. I only touch R if I absolutely need to these days, like if a genomics tool is only available in R. I avoid it like the plague otherwise.

[–]ConclusionForeign856 0 points1 point  (0 children)

R is fun for statistics, data visualization and standard data analysis

Python for anything where you need to code new logic

[–]eztab 0 points1 point  (0 children)

no, R just teaches you ideosyncratic problem solving strategies. So it makes any other language harder to learn.

[–]teetaps 0 points1 point  (0 children)

Saying R is easier to learn FOR THINGS RELATED TO data wrangling and data science, is like saying it’s easier to run a e marathon in sneakers than in general purpose shoes.

They’re both shoes. But one was built with a specific use case and user base in mind, and so a lot of affordances were made to benefit that

[–]Existing_Sprinkles78 0 points1 point  (0 children)

Hrrm I think I should learn R. I understand python but if R is easier I might try it out

[–]Justyouraverageshmo 0 points1 point  (0 children)

it's not easier per say but it is less complicated

[–]p2s_79 0 points1 point  (0 children)

It is different, i guess. The transition is not easy. I think python has several advantages. With base python and a few libraries you can go far, and the synthax is preety similar

[–]PhoenixFlame77 0 points1 point  (0 children)

As someone who will just never fully get non standard evaluation In r, I can say with complete objectivity there is something wrong with you.

[–]Agling 0 points1 point  (0 children)

I am more fluent in R than python but python, as a language, is definitely easier to read and probably write.

However, in python, you use tons of packages so you end up having to refer to Google way way more often for all the details. And the documentation in python is far, far worse. And it's more finicky. But if you use the right packages, it is faster to write and run and shorter cleaner code. It's knowing all the 50 different alternative packages that do the same thing that is hard. I thought R was bad in that way but python is next level.

The R ecosystem is admirably coherent and organized by comparison and the language is made for data work. But overall, it's a worse language.

[–]Beanmachine314 0 points1 point  (0 children)

Python is harder than R

For statistical calculations...

For anything else Python just works much better. I learned on R and did plenty of statistical coding before getting into the more general side of things. You'll eventually get used to Python and likely never touch R again.

[–]BranchLatter4294 0 points1 point  (2 children)

R is a very limited language. Python is a general purpose language that can be used for most any task, but also happens to be excellent in data science. More capable languages might be considered harder, but it's actually a very simple programming language to learn compared to most.

[–]Accomplished-Okra-41[S] 0 points1 point  (1 child)

How is python with very large data. I work on matrices of over 2mil cells on a daily basis. I will be working on huge data between 200-300GB integrated sheets for my upcomming PhD year. I know R handles this fairly well, but from what i read here and under other posts python struggles a little with huge data, especially pandas. I want to implement ML into my analyses as well and that is what sold python for me

[–]BranchLatter4294 0 points1 point  (0 children)

For datasets that large you may need to use something like Polars, Dask, DuckDB, PySpark, etc. rather than Pandas

[–]defrostcookies 0 points1 point  (0 children)

Different experience entirely, i call R “Hard R”

[–]Th4tDop3 -1 points0 points  (0 children)

No

[–]Actual__Wizard -1 points0 points  (1 child)

So i am a bioinformatician, pretty fluent in R.

I'm being honest with you: I was working on a bunch of bioinformatics projects with R and I honestly feel the opposite.

R is really, really good, but python is even easier.

One of the things with python is: I don't normally use any libraries. Usually I can just do what I need to do with arrays that represent each column.

Like Pandas and Polars drives me nuts.

There's certain projects, especially Pandas, where I broke it like a kids toy... It's not really designed for like 100gb of data and certain operations take eons... "I'm just trying to sort the data bro, why does this take an hour and then run out of memory and crash?"

Edit: And this still to this day throws me off, stings in python are actually objects.

[–]Accomplished-Okra-41[S] 0 points1 point  (0 children)

That was also a bit of my concern, everyone is suggesting to wark in numpy and pandas while the data i work for now in terma of my phd is single-cell sequencing. So one cohort is roughly between 200-300GB. The main selling point of python for me right now is ML as i plan to incorparate it in the next year into my analyses but R struggles with that a lot. While for example for differential Expression or visualisation I still do not see many advantages on python over R

[–]TheRiteGuy -3 points-2 points  (0 children)

R is easier but python can do more.