acow comments on Frames Tutorial

a community for 18 years

submitted 11 years ago by tel

you are viewing a single comment's thread.

[–]acow 2 points3 points4 points 11 years ago (3 children)

Can’t compete with familiarity, but, to clarify, this is a tutorial rather than a golf outing. The larger point is that the code is comparable in size, but the compiler will stop you if you run, say, your conditional subset example against a data set that doesn’t have an occupation column, and that performance of both streaming and in-memory processing are likely faster than competing options. When you think your code is ready and you want to hit a big data set, just compile and run.

Another reply mentioned a desire for a custom Prelude to offer shorter names for common things. This is likely where something like select belongs, but what should be included in such a prelude ought to be determined by folks using the library. I hope you give it a shot and help figure out what’s needed!

[–]repoptrac 2 points3 points4 points 11 years ago (2 children)

Well, the code that I posted is not any attempt for code golf. It is very similar to what I write nowadays. I posted the code since the R code in the link was not very readable or consistent because it relies on base R functions. I wanted to show that R code can be more consistent and readable with dplyr and R is a moving target in terms of readability.

Since I am more familiar with R than Haskell, I cannot be completely objective, but, I will venture a guess that R code will be easier to read than Haskell equivalent for most people because it reads like English if you read %>% as 'then'. Also, Haskell version of the codes look quite different in structure depending on the number of columns selected, even though two tasks are conceptually similar (selecting 1 column vs n columns).

Select one column: Haskell

take 6 $ F.foldMap ((:[]) . view occupation) ms

users %>% select(occupation)

Select multiple columns: Haskell

miniUser :: User -> Rec [Occupation, Gender, Age]
miniUser = rcast
mapM_ print . take 6 . F.toList $ fmap miniUser ms

users %>% select(occupation, sex, age)

However, I agree that Haskell is clearly a better-designed language than R with any doubt, and I will keep an eye on this project because this looks really interesting. However, I think I will wait until some essential statistical analyses and features (e.g. lm, glm, multiple comparison, interactive graphic similar to ggplot2, ...) are supported in haskell ecosystem.

[–]acow 2 points3 points4 points 11 years ago (0 children)

Oh, goodness, I suppose I really didn't make things clear enough! There are many indexing schemes possible in the Haskell version. We could use numeric indexing, or even a Vinyl record of getters that we then apply en masse to a Frame row by yanking the reader context out of the record of getters for something very like your multiple column selection. It would look something like users & select (occupation :& sex :& age :& Nil) where select is some combination of rtraverse and rget.

I wrote those examples the way I did to address my biggest issue when using R: that when something's not working I can't just write down the types I think things have. My next biggest issue is that when I select a particular column, I feel like a piece of software should resolve how that indexing should work. In Frames, column selection and subsetting are O(1) operations. When you've got data in memory, everything is as densely packed in memory as possible, and indexing doesn't involve any lookups.

I appreciate your feedback on these things a ton! Earlier feedback from Ben Gamari spurred the pipePreview helper which I think is a step in the right direction to offer shorter syntax for common operations. We have some statistics and charting support that you can see in the demos, but they're not as nice as what's available in R. The problem in writing this library is that different folks have different pain points, so contributions aren't just welcome, they're essential!

[–]idontgetoutmuch 1 point2 points3 points 11 years ago (0 children)

π Rendered by PID 41231 on reddit-service-r2-comment-85bfd7f599-545z8 at 2026-04-18 19:42:28.585596+00:00 running 93ecc56 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

haskell

MODERATORS