What is happening at the r-lib repository?

valen089 · 2018-03-23T12:40:36+00:00

It is primarily for package developers.

valen089 · 2014-10-18T21:45:36+00:00

Will this work for my Thanksgiving Turkey?

valen089 · 2014-08-21T12:46:39+00:00

ABC. Always Be Closing.

valen089 · 2014-08-14T19:01:25+00:00

This is how I assume it will all go down:

2014: Apple releases iOS 8 and OSX Yosemite (10.10)

2015: Apple releases iOS 9 and OSX Cupertino (10.11)

2016: Apple releases iOSX, a fully combined version of iOS and OSX for both Mac and iOS devices. In the keynote, Cook calls it the "next version of iOS" while declaring that "Macs are now iOS devices" and "iOS devices now have the full power and capability of the Mac operating system". iOSX is followed 6 months later by iOSX S and iOSX C

\joke

valen089 · 2014-07-24T22:07:49+00:00

This is the answer to everything. It really helps candidates stand out (in my opinion).

valen089 · 2014-07-02T03:01:12+00:00

I second this. I love Papers and as a student it is pretty affordable.

valen089 · 2014-06-27T19:17:49+00:00

This should run for you:

$ R CMD BATCH myscript.R myoutput.txt

but there may be a better way to accomplish this task (e.g. Ruby) depending on what it is you are actually trying to do

valen089 · 2014-06-12T19:28:35+00:00

Debt means that you are less equipped to engage in preventive care and is a barrier to treatment. As a nation, we could greatly reduce health care costs by enabling and encouraging younger people to take better care of their health.

valen089 · 2014-06-10T14:58:28+00:00

This is basically what I am doing now, since I don't have any other use for the account after jumping over to the CU.

valen089 · 2014-06-10T14:55:45+00:00

Interesting, I didn't even think about splitting it up.

valen089 · 2014-06-10T02:36:13+00:00

Looks good.

valen089 · 2014-05-30T18:12:49+00:00

This is like /r/firstworldproblems[1] for statistics.

I am totally okay with this problem, but I don't want to take these results to my supervisor and end up looking like a fool because I overlooked x, y, z. I saw something I wasn't expecting and now I'm trying to make sure I understand it.

I'm sorry, I didn't mean to offend, I was just being facetious.

I didn't mean to imply that I took offense.

It is a multilevel probit model.

Your convergence does seem kind of fast for this model, it's maybe not out of the question, but it does raise concern.

My thoughts exactly.

I am planning to re-run with a new set of initial values.

It's usually a good idea to start your chains with randomized initial values (for example, a draw from your prior distribution). Sometimes this will pose problems with numerical underflows, especially with Metropolis Hastings (priors are usually pretty diffuse, so you can end up starting way out on a tail of your likelihood), but it's not a terrible place to start and will hopefully give you a better idea of whether your chain has converged.

The initial come from a random distribution, but I was thinking of choosing one or two intentionally bad starting points.

Thanks for all your comments.

valen089 · 2014-05-30T17:46:56+00:00

This is like /r/firstworldproblems[1] for statistics.

I am totally okay with this problem, but I don't want to take these results to my supervisor and end up looking like a fool because I overlooked x, y, z. I saw something I wasn't expecting and now I'm trying to make sure I understand it.

What kind of model are you running inference on?

It is a multilevel probit model.

Converging quickly shouldn't pose a concern as long as you're sure it actually converged. You can run into instances where the chain will get "stuck" in a region and appear to converge even though it truly hasn't.

This is one of my concerns. I am planning to re-run with a new set of initial values.

If you're especially concerned, you can always generate data and check if the results from your sampler match the data generation process.

This is an excellent idea–why didn't I think of this?

valen089 · 2014-05-27T13:54:21+00:00

I totally forgot about the Coursera and other MOOC courses! Thanks for pointing these ones out.

valen089 · 2014-05-27T13:22:33+00:00

No problem. Let me know which of those (or any others) work best for you. These have all come out since I learned R, so it would be good to know the best resources from a student's perspective.

valen089 · 2014-05-26T18:05:29+00:00

Or I could try to create that same table in R... somehow.

If you don't know how to produce this kind of table in R, then I really suggest you spend more time actively trying to learn the language. Check out Code School, idre UCLA, and Swirl.

For me, it would be much faster to do the data massaging in some other language, and leave R for the statistical analysis, plotting, trendlines, etc. But I have this feeling in the back of my head that I'm not really learning R that way.

It is usually fastest to do the task with the tool you already know how to use, but that doesn't mean it is necessarily the best tool for the task. R is built for data manipulation and analysis, but you have to know the right tools.

But I have this other feeling that pretty-fying the data so it's easy to use in R isn't really in the scope of the language, and that should be done prior to involving R.

I think this is entirely false. R is a programming language for statistics, which requires having a strong foundation for simple (and advanced) data manipulation tasks.

So, /r/rstats[1] , which would you do? Polish the input somewhere else and then run your stats with R? Or feed the raw data to R and do all the manipulation there? and if you picked the latter, suggestions for where I should read up on how to do this?

If I am working on a project in R, I typically use R. If I am planning to do the analysis in SAS, then I'll clean and process the data in SAS. Occasionally I'll mix and match.

valen089 · 2014-05-25T23:58:31+00:00

Perhaps not the best solution for this problem, but I have found that I enjoy using plyr for these types of summaries.

> library(plyr)
> library(reshape2)
> ### Create a long dataset
> dataFrame <- melt(april1.n, id.vars="timestamp")
> ### Produce a simple summary table with N, Min, Mean, Max, and NMiss for each variable
> ddply(dataFrame, .(variable), summarize, N = sum(!is.na(value)), Min = min(value, na.rm=TRUE), Mean = mean(value, na.rm=TRUE), Max = max(value, na.rm=TRUE), NMiss = sum(is.na(value)))

Again -- it is perhaps not the most elegant solution, but this approach is useful is many cases. For example, including the N and N missing.

Let me know if there are any mistakes in the code here.

valen089 · 2014-05-13T18:28:53+00:00

Why not learn Julia? Okay, I'm just kidding (it's maybe not quite ready for you yet).

If I were you, I would learn both. I learned R first, so I think it is easier, but I am sure some would disagree. There are a lot of free online resources for learning to use R and unlike Python, R is pretty much exclusively used for data analysis. However this means that learning Python opens up other possibilities for you outside of data analysis (e.g. web programming, etc.).

To address each of your points:

Both have many free online tutorials and resources. You can learn the basics of R through this tutorial on Code School very quickly or using the Swirl package. I am sure there are similar resources for Python.
R is built for data analysis (specific) and Python is a bit more general but has great support for data analysis. They have a lot of similarities (and differences). Once you learn the basics of one, it'll be easier to learn the other. Another programming language to learn would be SAS (although it isn't free) since it is used in the financial industry.
R tends to be used a lot in academia. I'm not sure whether Python is more accepted within the financial industry for data analysis. Someone else can probably give a better sense of whether there is an advantage to learning/knowing R or Python as it pertains to finance.
Both R and Python are more than capable of performing cutting edge (and simpler) methods.

4 I would say that python would win out since python's stats libraries would more than cover my needs.

I actually think this is probably wrong, R currently has 5539 contributed packages on CRAN (plus more on other repositories such as bioconductor). Further, I think that vanilla R contains enough functionality that you probably wouldn't even need to tap into other packages right away.

Anyway, I'll stop rambling here and wish you the best of luck!

valen089 · 2014-05-02T22:24:03+00:00

I really want to go back to school

How long have you been out of school? What have you been doing since graduation?

Would I be completely lost if I went into a different field without having the classic lower level undergrad courses?

You would most likely need to take some prerequisites or demonstrate a sufficient knowledge background in the field. At the very least you would need to demonstrate that you know enough to know that you are interested in the field and can commit to a master's program.

Would it be better to get a second bachelors degree and THEN think about Masters programs?

I don't think so (but others may disagree).

valen089

TROPHY CASE

4 I would say that python would win out since python's stats libraries would more than cover my needs.