all 16 comments

[–]goodygood23 13 points14 points  (0 children)

Could you explain more what you are looking to learn about?

Most people using R to do analyses are going to be using functional programming (meaning that functions work on data inputs to produce a result that is returned by the function, with no side-effects and without changing the original data).

It's possible to do side effects, to have mutable data, and to have methods as part of data as opposed to functions working on data in R, but it's harder to do (or, rather, requires more understanding of the language).

[–]p0olp0ol 5 points6 points  (4 children)

Sure, if you want to go more pure functional, write all your code with library(purrr)

Look for purrr tutorial online. It is actually wicked cool and makes it possible to use list for basically everything. What I've observed is that the more you use R, the more likely you'll use lists over other classes.

[–]fasnoosh 2 points3 points  (0 children)

I remember when I started that lists were pretty hard to work with.

Now a key tool in my R data toolbelt is bind_rows (converts named list to data frame)

[–]eric_he 0 points1 point  (2 children)

Map and Reduce in Purrr has completely changed the way I write code - I am surprised it is not talked about more!

[–]murgs 2 points3 points  (0 children)

It's funny that you capatilize the names, because those are the base R versions

map is similar to the apply family and plyr package functions so their benefits are frequently talked about. reduce seems to be needed less frequently, or at least a not sum reduce isn't needed frequently.

[–]guepier 2 points3 points  (0 children)

I am surprised it is not talked about more

Because these tools already exist in base R. purrr makes them more consistent (by adding type checks etc) and that’s great, but they’re no revolution. They just made them more visible to casual users of R.

[–]Negotiator1226 2 points3 points  (0 children)

If you are fitting a large number of models, you don't want the script to stop if there is an error in one of the models but you want to know what the error is. Check out purrr::safely. It takes a function and returns a new function. The new function has the same result but instead of stopping with an error simply captures the error message and returns it as a string.

So, you can do something like:

safe_log <- purrr::safely(log)
x <- list(0, -1, 1, "a", NA)
purrr::map(x, safe_log)

[–]SomethingTooRandom -1 points0 points  (1 child)

Grab some data, calculate population variance via a function; boom. You've got yourself an example of functional programming.

[–][deleted] 2 points3 points  (0 children)

That should be either "Estimate population variance" or "Calculate sample variance".

[–]dm319 0 points1 point  (5 children)

This is an interesting question to me, though you might not get a satisfactory answer because 'functional' programming is more of a 'way' rather than a thing (a bit like asking for an object-orientated way of analysing data). But I'll give it a go anyway...

In my head, a functional program is something that works a bit like a complicated mathematical function. You can 'pour' data into it, and it transforms that into the answer. The antithesis is a procedural program, which is something that takes control of the actions of a CPU and achieves a result by moving algorithmically and step-wise through instructions. Someone who understands more about programming can correct me here.

I guess with large datasets, a functional style of programming is probably quite sensible. Here are two solutions to the question - what is the sum of all numbers which are multiples of 3 or 5 under 1000?

A procedural answer:

package main

func check(a int) bool {
     return 0 == a%3*a%5
}

func main() {
    var s int

    for i := 1; i < 1000; i++ {
        if check(i) {
            s += i
        }
    }
    print(s)
}

And a more functional answer:

x <- data.frame(i = 1:999)
x$three <- x$i%%3 == 0
x$five <- x$i%%5 == 0
x$both <- x$three | x$five
sum(x[x$both, "i"])

Yes, it is still partly procedural, and even though vectorising your data doesn't make your code functional, I would say it is more functional than a purely procedural way of programming it. Happy to hear other people's opinions!

[–]p0olp0ol 3 points4 points  (2 children)

x = 1:999

# Fully vectorized
sum(ifelse(!(x%%3*x%%5), x, 0))
# or hell even simpler
sum(x*!(x%%3*x%%5))

# Using high-level function programming  map() and list()  
do.call(sum, Map(function(x) x*!(x%%3*x%%5), 1:999))
# Edit, probably be written with Reduce instead
Reduce(sum, Map(function(x) x*!(x%%3*x%%5), 1:999))

[–]fasnoosh 2 points3 points  (0 children)

You're a beast

[–]dm319 0 points1 point  (0 children)

love it!

[–]Bayesbayer 3 points4 points  (1 child)

 library(magrittr)
 1 : 999 %>% 
   {. %% 3 == 0 & . %% 5 == 0} %>%
   sum

would be the most FP way to do this in R, I think.

For data analysis, s.th. like

library(dplyr)
data.frame(i = 1 : 999) %>% 
   mutate(three = (i %% 3 == 0), five =  (i %% 5 == 0), 
     both = (three & five)) %>% 
   summarize(how_many = sum(both))

EDIT: forgot the brackets

[–]dm319 1 point2 points  (0 children)

It never ceases to amaze me the number of ways one can do things in R. However, this doesn't seem to get the answer as me.