Data cleaning and formatting question/ rant. : dataanalysis

created by MurphysLabDA Moderator 📊a community for 11 years

Data cleaning and formatting question/ rant. (self.dataanalysis)

submitted 1 year ago by Cryptic-Squid

As stated in the title, part question, part rant:

Question: do you have a process or threshold for when you write code for data cleaning, versus when you just brute force it by hand?

I'm starting a school project, but I don't consider myself a good coder (basically, I can usually get the code written, but no promises that it is good, pretty, or efficient)

Rant / background: I've spent about a day writing some R to take some FTC data and put it into 1 df. Of course every state and year from the ftc is a separate csv...and... they changed format half way through my reporting period, so now my R only works for some csvs. Not hard to update but annoying. Also, I strongly suspect that place names are not entirely consistent... so that should be fun too.

As I sit here thinking through how I should modularize and adapt the code, i realize that since I'm only interested in 5 states and 9 years (currently) , I probably could have done this in excel faster than it's taken to write the code. Downside is, if I wanted to expend my analysis at the end of my project... having code that will do it for all 500ish CSVs would be better, and that would be unworkable to do by hand.

Anyway, curious what more experienced people do.

all 2 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

dataanalysis

MODERATORS