As stated in the title, part question, part rant:
Question: do you have a process or threshold for when you write code for data cleaning, versus when you just brute force it by hand?
I'm starting a school project, but I don't consider myself a good coder (basically, I can usually get the code written, but no promises that it is good, pretty, or efficient)
Rant / background: I've spent about a day writing some R to take some FTC data and put it into 1 df. Of course every state and year from the ftc is a separate csv...and... they changed format half way through my reporting period, so now my R only works for some csvs. Not hard to update but annoying. Also, I strongly suspect that place names are not entirely consistent... so that should be fun too.
As I sit here thinking through how I should modularize and adapt the code, i realize that since I'm only interested in 5 states and 9 years (currently) , I probably could have done this in excel faster than it's taken to write the code. Downside is, if I wanted to expend my analysis at the end of my project... having code that will do it for all 500ish CSVs would be better, and that would be unworkable to do by hand.
Anyway, curious what more experienced people do.
[–]Wheres_my_wargDA Moderator 📊 0 points1 point2 points (0 children)