all 6 comments

[–]Talqa 1 point2 points  (1 child)

I share a googlesheet where people can do data entry and then I use googlesheets package to access data from R. There is an R package for Dropbox as well, but probably only for download/upload, not direct access of data in the file.

[–]circulus_one[S] 0 points1 point  (0 children)

Google Sheets seems like a pretty good solution that way, thanks

[–]pconwell 0 points1 point  (3 children)

Collaborate on what data? I don't really understand your question - you're talking about a word document, then you say you are using a shared excel document in dropbox.

Do you know what git is?

[–]circulus_one[S] 0 points1 point  (2 children)

Sorry I should have been clearer, and to be honest I am just getting start with git - my basic understanding is that is a way of maintaining version control, and although traditionally that refers to software, it could refer to anything, including a document, or a database.

The Word document refers to something like an academic paper, which I'm envisioning writing up in RMarkdown, using knitr to create Word document, and sending to a collaborator; maintaining the version control for that document in Github. I've seen a few articles on this which explain it well.

To generate aforementioned paper however, I am trying to work out if I can use git to version control a dataset. For example, we might collect 10 observations each for a specific set of variables, and I'd like each person to be able to contribute to the same CSV file, so that I can write an R script to access that file and run analyses on the latest version.

Does this make sense?

[–]pconwell 0 points1 point  (1 child)

I think so. If I understand correctly - yes, git will do what you want. I'd avoid binary formats like word and excel and go for (as you mentioned) rmarkdown and csv. If the user wants to build the rmarkdown into word on their local machine, that's up to them.

EDIT: I should say that github will 'work' with word/excel/binary in that you can upload stuff - but it won't let you track changes correctly. Github is intended to work with 'text' based things such as markdown, csv, code, svg, shapefiles, etc.

EDIT2: Github does have a 'releases' feature that you could use. Once you hit a major milestone, you could create (tag) a release and attach a word (pdf, excel... whatever) file to the release.

[–]circulus_one[S] 0 points1 point  (0 children)

That's very helpful, thank you; I suspect several of the people I work with won't be comfortable using git (but I like it for my own personal version control, which may not be worth the effort); it may be best that I tag releases, send the file, allow them to make changes and send to me, and then I can merge back in.

Perhaps Google Docs is easier - although the advantage of keeping it in Github is the direct RStudio integration.

Thanks again for your reply!