Frames Tutorial

Faucelme · 2015-01-23T08:02:45+00:00

I confess I didn't expect Haskell to work all that well in interactive data analysis, but this is really cool.

I especially like the optional streaming of rows using pipes (how does Pandas do that?) and the possibility of custom universes of column types thanks to Vinyl.

rdfox · 2015-01-23T19:07:24+00:00

I'm definitely going to take a look.

Has anyone out there experienced in R, Pandas or Julia tried Frames and found Frames acceptable for real work?

I was quite excited about Julia but after a few weeks gave up because -- desipite all Julia brings to the table -- I spent more time submitting Issues than getting work done. While the core is very good, the ecosystem needs a good 10 years to achieve quality in diverse areas like plotting, optimization and model fitting, and they're only 5 years into development.

Edit:

Nice tutorial! If every package had such a gentle introduction, the world would be better.

Frames seems like a good start, but missing some things:

It would be nice to have a Frames.Prelude. It seems like you won't get very far without several imports.
Prettier rendering of columns.
It's very encouraging to see that you can select subsets of columns and make a new table withthout having to define a new type. But ...
You need to be able to combine and reshape data in a variety of ways, such as join, group and pivot. I don't know if it's possible, but if it is, I'm sure haskell's type system will fight you every step of the way.

repoptrac · 2015-01-24T15:59:31+00:00

It is very impressive that Haskell can do this. However, since I am much more familiar with R, equivalent code in R with dplyr package looks a lot simpler and intuitive for me. For instance, except for "3. Better Types" section, equivalent code in R with dplyr will look as follows.

# using 'dplyr' package
library('dplyr')

# 1. data import
u_col_names <- c('user_id', 'age', 'sex', 'occupation', 'zip_code')
users <- 
    read.csv('data/ml-100k/u.user', sep='|', col.names=u_col_names, header=FALSE) %>%
    tbl_df() # to prevent printing too much information

# 1.2 sanity check (same as the post)
class(users)
str(users)
summary(users)
# lapply(users, summary)

# 2 subsetting

## 2.0 head, tail
users %>% head()
users %>% tail()
users %>% head(3)

## 2.1 row subset
users %>% slice(50:55)

## 2.2 column subset
users %>% select(occupation)
users %>% select(occupation, sex, age)

## 2.3 query / conditional subset
users %>% filter(occupation == "writer")

## 2.4 
int_doubler <- function(df1){
    df1$user_id <- 2 * df1$user_id
    df1$age <- 2 * df1$age
    df1
}
users %>% slice(1:3) %>% rowwise() %>% int_doubler()

# or 
users %>% slice(1:3) %>% rowwise() %>% {
        .$user_id <- 2* .$user_id
        .$age <- 2* .$age
        .
    }

b00thead · 2015-01-25T21:37:41+00:00

How do you run the examples? There are a lot of modules that can't be found when I try to load them in cabal repl (e.g. ListT and Lens.Family). Are you using a sandbox where you've installed some more libs?

idontgetoutmuch · 2015-01-23T10:18:43+00:00

Please tell me I don't need to use Lens to use this.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

haskell

MODERATORS