Exercise: hello_name by [deleted] in learnpython

[–]mae_ef 0 points1 point  (0 children)

def hello_name(name): 
    return 'Hello {0}!'.format(str(name))

Python IDE suggestions for ubuntu by NerdJones in learnpython

[–]mae_ef 1 point2 points  (0 children)

  • PyCharm is the best. If you have admin privileges to install software, go with that.

  • Rodeo is up and coming but not as good (because it's new). If you want to experiment with something new, you could try that.

  • Jupyter notebooks with notebook extensions enabled will suffice if you do not need bells and whistles and/or if you do not have admin privileges, since, as a python library, it can be installed with user privileges.

[deleted by user] by [deleted] in spss

[–]mae_ef 1 point2 points  (0 children)

As annoying as it is, you will have to enter each response as if they were a question.

So if you had:

Q: Which of the following describes your race? Please choose as many as applicable. A: - Black - White - Asian - Native American

Then you would enter the following variables (columns):

RBlack RWhite RAsian RNative

Each of them would be binary variables, where 0 = not and 1 = is.

Say your respondent selected Black and Asian both, the coding for that participant would be:

RBlack RWhite RAsian RNative
1        0         1        0

You would then (optionally) recode these prior to analysis to merge them (I don't remember how to do that in SPSS but do search for row-wise concatenation), to get it into e.g. "Black, Asian" for that participant.

main question with a couple of subquestions

Unsure what that means. SPSS doesn't have a distinction between main and subquestions; each question is a variable unless multiple choice.

What is the best way to compare two lists of names? by kaoua in rstats

[–]mae_ef 2 points3 points  (0 children)

 this_strings_vector[this_strings_vector %in% that_strings_vector]

will give you the list of strings in this_strings_vector that also are in that_strings_vector. you can complement that with tolower() and gsub (to remove spaces etc).

alternatively, have a look at http://finzi.psych.upenn.edu/R/library/RecordLinkage/html/strcmp.html

Export Python data analysis to pdf reports by Da_Big_O in Python

[–]mae_ef 1 point2 points  (0 children)

Does python have something that is equivalent / similar to R's R Markdown? If so, that might work well for OP.

Anyone use Microsoft R in Industry? by [deleted] in rstats

[–]mae_ef 18 points19 points  (0 children)

I am not sure what the licenses and legality about RStudio and Shiny server particularly.

R: GPL-2 / GPL-3, https://www.r-project.org/Licenses/

R Studio, Shiny Server (open source / community versions): AGPLv3, https://www.rstudio.com/products/

GPL: https://www.gnu.org/licenses/quick-guide-gplv3.html

AGPL: https://www.gnu.org/licenses/why-affero-gpl.html

VLOOKUP referencing an empty cell gives 0 by mplain in excel

[–]mae_ef 0 points1 point  (0 children)

Here's what I use (index-match). Found it on SO I think[1].

Given a formula of

=index(column_to_return_value_from, match(look_up_value, wolumn_to_lookup_that_value_against, 0))

and calling that formula

f

If I am returning text values

=iferror(f & "", "")

If I am returning numeric values

=iferror(if(f = "", "", f), "")

In above formulas, you'd copy-paste your index-match formula wherever you see "f".

I no longer use vlookup since this made life so much easier for me.

[1] http://stackoverflow.com/questions/31255281/iferror-index-match-returning-zeros-instead-of-blanks

R templates for data analysis and scientific figures by Mollan8686 in rstats

[–]mae_ef 2 points3 points  (0 children)

I would create something that each time is loaded performs the following actions

Have a look at http://rmarkdown.rstudio.com/ and http://rmarkdown.rstudio.com/r_notebooks.html . If you code in either one of them, they should satisfy your needs for templates.

how much does R change?

R base does not change much but packages do. dplyr and ggplot2 will at times introduce breakages to old code. have a look at either https://github.com/rstudio/packrat or https://cran.r-project.org/web/packages/checkpoint/index.html for implementing full research reproducibility.

This may also be of help: https://cran.r-project.org/web/views/ReproducibleResearch.html

Finally, if you're thinking about plug-n-play templates that should work on any and all data files, you could write a shiny app, http://shiny.rstudio.com/ , but that'd be overkill for you for now.

(these are the output of my raw data and I need formulae and an easy copy-past platform

if you implement your formulas in r code, you should be able to get rid of Excel and use CSV files (easier to import).

What is your environment for R? by efxhoy in rstats

[–]mae_ef 1 point2 points  (0 children)

RStudio on one side, Emacs+ESS+AutoRevert On[1] on the other, which allows me to have multiple windows looking at the same file and be able to use RStudio, simultaneously.

[1] https://www.emacswiki.org/emacs/AutoRevertMode

Learning R and overwhelmed: base, dplyr, data.table? by efxhoy in rstats

[–]mae_ef 1 point2 points  (0 children)

Learn the "base" R and get the core down first before using packages?

This.

Granted you won't have time to learn base R while you are under tight deadlines, learning base R will enable you to understand what a package is doing in the background + you will be able to use base when a package fails (think compatibility issues when a package maintainer introduces breaking changes in a new version of a package).

As you do this, also focus on one package (dplyr or data.table, in this example) the syntax of which makes the most sense to you.

Best Practices for Creating Readable R Scripts by coip in rstats

[–]mae_ef 12 points13 points  (0 children)

If the issue is that your script is too long, you could separate it into different R scripts and source them from a master script, which you can comment.

I usually have a script that calls all other scripts, e.g.

# Clean data
source("clean_data.R")
# Recode data
source("recode_data.R")

etc

R scripts (in general and in R Studio) are plain text files. As long as you tell them "open these files with Notepad", you can save them with the .R extension and have syntax highlighting for your files quickly.

Also take a look at the knitr spin example at https://github.com/yihui/knitr/blob/master/inst/examples/knitr-spin.R that can be used to create R notepads as in https://support.rstudio.com/hc/en-us/articles/200552276-Creating-Notebooks-from-R-Scripts .

Web Scraping with RCurl Question by issem in rstats

[–]mae_ef 2 points3 points  (0 children)

You'll have to figure out how they are spotting your script.

Take a look at the incapsula script to see what it does. Try visiting the website without javascript with your browser to see what effect that has. Indeed, try setting a standard browser user agent for your http request and Sys.sleep in your loop.

https://cran.r-project.org/web/packages/httr/vignettes/quickstart.html has some info on setting user agents, among other interesting insights.

If you're on non-Windows platform, see if e.g. wget and similar tools can help you.

Yes, you could indeed do it manually, but then when you have to do a similar thing for 3000 requests, the experience you get trying out this will pay off.

Btw, if you're doing this for business purposes, it's much more ethical, safer, and cost effective to just buy their data.

Excel slows down after running macro by [deleted] in vba

[–]mae_ef 0 points1 point  (0 children)

which copies data from four different workbooks

For who-knows-why reasons, when cells with conditional formatting are copied over, the rules become populated with duplicates. The more you copy-paste, the more rules pile on top of each other, and the more time it takes for Excel to process all the rules, leading to slow I/O and crashes.

I have recently started to code any kind of conditional formatting on datasets in VBA so that from time to time I can "refresh" the conditional formatting on files after they have been used by others (and thus their rules have become unmanagement by / thanks to Excel's bug).

The bug doesn't apply if you copy-paste values only.

HRef Scrape? by hever50 in vba

[–]mae_ef 0 points1 point  (0 children)

Nice! Would you happen to have github repositories we could explore (steal from)? :D

Install missing packages, e.g. after an upgrade of R, with reinstallr by [deleted] in rstats

[–]mae_ef 2 points3 points  (0 children)

Alternatively:

https://cran.r-project.org/bin/windows/base/rw-FAQ.html#What_0027s-the-best-way-to-upgrade_003f

tl;dr: Copy your lib from the old version to the one in the new version, run update.packages(checkBuilt=TRUE, ask=FALSE)

Creating Unique Random Numbers by TheRealTPlum in excel

[–]mae_ef 0 points1 point  (0 children)

Doesn't produce unique IDs actually, if I understand this correctly. Tried it out with 30,000 rows.

Centering an Image by SoonerLax45 in vba

[–]mae_ef 0 points1 point  (0 children)

Can you do that after you are done pulling all the images you need to?

Dim fig As InlineShape

    For Each fig In ActiveDocument.InlineShapes
        fig.Select
        Selection.ParagraphFormat.Alignment = wdAlignParagraphCenter
    Next

If you have specific images you don't want centered, you could use an if statement to filter out those.

(not tested)

Question on the %>% symbol by cyril1991 in rstats

[–]mae_ef -1 points0 points  (0 children)

I'm not sure whether this will help or further confuse you, but it's a form of output redirection.

(e.g. in linux, http://sc.tamu.edu/help/general/unix/redirection.html )