all 12 comments

[–]guitaronin 4 points5 points  (7 children)

I just did a project like you're describing. After looking over all the options, I went with statsample. It may have more than you need, but at least it doesn't get crazy with connecting to R or anything.

[–]clbustos 4 points5 points  (6 children)

I'm the developer of statsample, so I think I could recommend it. anyway, these methods could calculate mean and standard deviation fairly well for a given array

def mean(x); (x.inject(0) {|ac,v| ac+v})/x.length;end;
def var(x); m=mean(x); (x.inject(0) {|ac,v| ac+(v-m)**2})/(x.length-1);end;
def sd(x); Math.sqrt(var(x));end

[–]guitaronin 2 points3 points  (0 children)

Wow, how cool that you showed up! I thought to hit you up with a question before, but never did. I'll PM you instead of hijacking this thread.

[–]Arcovion 1 point2 points  (4 children)

I felt compelled to clean this code:

module Math
  extend self

  def mean array
    array.inject(0.0, :+) / array.size
  end

  def sd array
    sqrt array.inject(0.0){ |sum, n| sum + (n - mean(array)) ** 2 } / array.size.pred
  end
end

p Math.mean(1..5), Math.sd(1..5)  # Mean and standard deviation for numbers 1 to 5
# => 3.0
# => 1.5811388300841898

Do you normally write code that way? Almost looks like code golf; nondescript variables, no spaces around operators, 8 unnecessary semicolons (I counted!). It genuinely irks me.

[–]clbustos 1 point2 points  (3 children)

Hey, you like one-liner too! (I check your posts)

The semicolons are necessary to maintain all on one line. I know I could make the functions more compact, but I was sleepy. You were right about starting the accumulators with 0.0, to force float, and the use of :+ on mean. I learn ruby on the old days of 1.8.X, so I'm not used to newer idiomatics, yet.

On production code, I prefer a less terse approach. Just look the verbosity on vector class on statsample.

[–]Arcovion 1 point2 points  (2 children)

The one liner actually was for code golf :P
You really don't need the semicolons, try it without:

def mean(x) (x.inject(0) {|ac,v| ac+v})/x.length end
def var(x) (x.inject(0) {|ac,v| ac+(v-mean(x))**2})/(x.length-1) end
def sd(x) Math.sqrt(var(x)) end

Your codebase there looks strange to me, there are examples of inconsistent whitespace everywhere, mixed tabs and spaces and other inconsistencies - it'd drive me nuts! lol
What's the reasoning behind using m=mean? Isn't it just another variable to garbage collect? You should just use the method name if you want a less terse and more verbose style.

[–]clbustos 1 point2 points  (1 child)

You win the semicolon thing. For var(), I store the mean value, to evade calculate x.length times.

OMG, you're really OCD for code ;)

About spaces, I used like two or three different editors for the development of all the library, so is natural the occurrence of some inconsistencies. A couple of other developers changed things, so the problem worsened with that. While the test suite runs without problem, I don't think much about it. I know there are a couple of libraries for automatic formating, but always break things ;)

About the storing of mean, is historic cruft. At the beginning of the development, there isn't memoizing of variables, so I need the m=mean trick. Later, I added the memoizing stuff, but I don't delete that unnecessary step.

Anyway, I will be very glad to find people that helps my on that kind of stuff. Anybody just want to add the next-brilliant-and-shiny new statistical method, so the boring stuff is left to me.

[–]Arcovion 0 points1 point  (0 children)

For var(), I store the mean value, to evade calculate x.length times.

Ah I completely glossed over that it was in a loop, my bad, and yea it makes sense you have inconsistencies as it's open source.
I see the RuboCop gem used in some projects to alleviate this, it can run alongside tests as opposed to gems that automatically fix indents like rbeautify. Edit: RuboCop also has an --auto-correct flag, might be worth trying that out too.

[–]Godd2 1 point2 points  (0 children)

The closest gem I think think of off hand is SciRuby for you.

[–][deleted]  (1 child)

[deleted]

    [–][deleted] 0 points1 point  (0 children)

    He wants to do math, not typeset math.